Content Engineering for the Agent Era

How you structure data for AI consumption determines how agents present it. WAAG over RAG — reasoning at write time, not query time. Freeform content over rigid schemas. And it all starts with the human providing quality data, not the engineer or the agent. Content engineering sits alongside prompt engineering and context engineering as a distinct discipline.

The Discipline

Content engineering is how you structure data so AI agents present it well. Not prompt engineering — that's how you instruct the agent. Not context engineering — that's how you manage state. Content engineering is the data itself: how it's structured, how it's curated, and how the agent consumes it.

These are three distinct disciplines. Prompt engineering tells the agent what to do. Context engineering determines what the agent knows and how it uses state to interact with the user. Content engineering determines the quality of what the agent has to work with. You can write perfect prompts and manage flawless context, but if the underlying data is thin or poorly structured, the agent's responses will be too.

And content engineering starts with the human. Not the engineer, not the agent — the person whose data it represents. Without quality input from the human, the engineer and the agent can only do so much.

Three distinct disciplines: prompt engineering tells the agent what to do, context engineering determines what it knows, content engineering determines the quality of what it has to work with.

WAAG: Reasoning at Write Time

RAG — Retrieval-Augmented Generation — reasons at query time. Unstructured data gets mechanically chunked, vectorised, and stored. When a query comes in, relevant chunks are retrieved and the LLM reasons over raw fragments. Every query re-reasons. Every read pays the full cost of comprehension.

WAAG — Write-time Agent-Augmented Generation — inverts this. An agent reasons over the content once, at write time. It curates, structures, and distils. The result is stored as pre-reasoned content. Every subsequent read benefits from that upfront investment.

The trade-off is straightforward: RAG is cheaper to write and expensive to read. WAAG is expensive to write and cheap to read. For content that's read far more often than it's written — persona profiles, case studies, documentation — WAAG wins. The upfront curation cost amortises across every interaction.

When does RAG still make sense? When you're dealing with massive, frequently changing document collections where the content volume exceeds what can be curated. Enterprise search across thousands of documents, real-time data feeds. WAAG is for bounded, curated content — and that describes most products people actually build.

Why Rigid Schemas Fail

The traditional approach to data modelling forces structure: profile.title, profile.summary, separate cv, separate projects. This seems reasonable until you realise the assumptions don't fit everyone. Students don't have CVs. Freelancers don't have a single title. Some people want a hobbies section. Others want certifications. Every edge case needs a new field.

The agent-native alternative: freeform content. In Mosaic, each persona has three fields — a name, a content blob (markdown), and preferences (also markdown). That's it. The agent reads the entire content for every query and reasons about what's relevant.

This works because agents don't need structure to understand structure. An agent reading well-written markdown with clear headings will find the right information as reliably as one reading from discrete database fields — more reliably, in fact, because it has the full context of surrounding content. Rigid schemas fragment context. Freeform content preserves it.

The preferences field is strictly separated: it describes how to present, never what to present. "Be casual and approachable" is a preference. "I have 10 years of experience" is content. When an owner tries to put facts in preferences, the agent redirects: "That's information about what you did, not how to present it. Let me add it to your content instead."

Agents don't need structure to understand structure. Rigid schemas fragment context. Freeform content preserves it.

Curated Content vs Structured Text

Both produce markdown. The difference is whether intelligence was applied.

Structured text is mechanical conversion. Take a resume, convert it to markdown: headings for each role, bullet points for responsibilities. Same information, different format. No reasoning.

Curated content is what you get when an agent reasons over the full document. Instead of listing each role, it identifies the trajectory — "progression from frontend development into distributed systems, pattern of taking on infrastructure challenges alongside product work." It's better than the input because it's been synthesised, not just reformatted.

The analogy: scanning a book into text vs writing a review of the book. One preserves the content mechanically. The other reasons about what matters and presents it with insight. Every future reader of the review gets the benefit of that reasoning.

This is the core of WAAG. The agent reasons at write time. The curated result serves every subsequent interaction — conversation, visual display, MCP consumption, web discovery. One curation, four surfaces.

The Human Starts It

Content engineering isn't purely technical. It requires the person whose data it represents to provide rich, valuable, intent-driven input. The agent can curate brilliantly, but it can only work with what it's given.

In Mosaic, owners provide content through multiple paths: uploading documents, pasting text, or simply talking to the agent. The agent extracts, infers structure, and proposes an organisation: "This looks like a resume. I've organised it — look right?" The owner approves or adjusts.

Then the owner QAs themselves as a visitor would experience them. "How would you describe my experience with React?" — see the answer — adjust if needed — ask again. This preview loop is how content quality improves: not through engineering, but through the owner iterating on their own representation.

The owner is also writing implicit prompts. How they structure their content — bullet points vs narrative, formal vs casual, detailed vs sparse — influences how every future LLM presents it. Clear headings make sections citable. First-person language encourages first-person responses. Quantifiable details produce specific, credible answers. Vague language produces vague answers.

This extends to eliciting that data — drawing out the right information through conversation design. Most people don't naturally write in a way that's optimised for AI consumption. The editing agent helps bridge this gap, producing well-structured content from natural conversation.

Three Layers of IP

Content engineering isn't one thing. It's three layers that stack.

Instruction engineering — crafting the principles and framing that wrap data when delivered to an LLM. "You are this person. Follow these principles. Here is their content." This sounds simple, but getting it right across different models, clients, and delivery mechanisms is non-trivial. Instructions that work in a system prompt need different phrasing as tool output.

Content transformation — taking raw, unstructured data and reshaping it so agents present it optimally. Not just formatting — understanding which structure, tone, emphasis, and detail level produce the best agent-mediated interactions for a given intent. This is empirical knowledge: which heading structure makes Claude cite sections reliably? Does first-person or third-person produce more authentic responses?

Intent-aware interaction design — helping the consumer get maximum value. Instead of an empty chat box, purpose-built prompts for specific intents: "Evaluate this candidate's fit for a Senior Backend Engineer role." These aren't generic — they're designed to extract the most relevant information from the data for a specific purpose. The consumer's prompting skill shouldn't determine service quality.

Each layer independently improves the experience. Together they compound. Data well-structured (Layer 2), delivered with clear instructions (Layer 1), consumed through expert-designed interactions (Layer 3).

Content engineering is empirical, not theoretical. What works comes from testing — which structure, tone, and detail level produce the best agent-mediated interactions. This knowledge compounds.

The Thread

Content engineering is what makes everything else in an AI-native product work. Trust at the data layer (Article 1) requires data worth trusting — that's content engineering. Designing for multiple AI consumers (Article 3) requires data so well-structured that any model can work with it — that's content engineering. The dual-voice architecture (Article 4) operates on curated content, not raw data. The visual surface (Article 5) renders pre-distilled content. The discovery layer (Article 6) serves it to agents searching the web.

Every surface benefits from upfront curation. The investment is at write time. The returns are at read time — across every interaction, every consumer, every surface.

This site is built on these principles. The content you're reading was engineered for agent consumption — curated at write time, structured for multiple surfaces, designed so the conversational agent can reason over it with depth. The medium is the message.

Content Engineering for the Agent Era

The Discipline

WAAG: Reasoning at Write Time

Why Rigid Schemas Fail

Curated Content vs Structured Text

The Human Starts It

Three Layers of IP

The Thread

Questions to explore

Core Concepts

Practical

Connections