Multi-Surface Discoverability — SEO for the Agent Era
Making web content discoverable by AI agents is the next SEO. Four surfaces, three implementation layers, and a practical framework any website can follow today. The most immediately applicable of these articles — the framework is product-agnostic.
The Missing Surface
Most AI-native products solve three consumption paths. Conversation — humans chat with an agent via a web app. Visual — structured content displayed to humans. Connected — external AI tools access data through protocols like MCP. These three cover every consumer who already knows about you.
But there's a fourth: discovery. How does an agent that doesn't know about you find you?
Discovery is the web layer — making content findable by AI agents searching the open web. It complements connected access: discovery gets you found; connected access provides rich interaction once found. Together they form the discovery-to-engagement funnel.
This is the next SEO. The same principles apply — make your content findable, structured, meaningful — but the consumer has fundamentally changed. Search engines index and rank. AI agents read, reason, and act. The bar is higher because the intermediary is now a reasoning engine, not a matching algorithm.
Search engines index and rank. AI agents read, reason, and act. The consumer has changed — the bar for content quality goes up.
Two States: Retrofitted vs Agent-First
Every website falls into one of two states when it comes to agent discoverability.
Retrofitted — the site was built for human browsers. Agent support is added after the fact: a manually maintained llms.txt, JSON-LD templated separately from content, CDN markdown conversion enabled to mechanically convert HTML to markdown. The content itself was never designed for agents — the signposts point to human-oriented content.
The core problem with retrofitted is drift. When metadata and content are maintained separately, they diverge over time. JSON-LD describes content that was written for humans. The conversion layer produces structured text, not curated content. Quality is limited by the source.
Agent-first — content was engineered for agent consumption at write time. The web layer serves it natively via content negotiation — markdown for agents, HTML for humans. Same URL, same curated data, appropriate format for the consumer. llms.txt is generated from the content architecture. JSON-LD is derived from the same data. Nothing can drift because everything draws from the same source.
The key difference isn't whether you have JSON-LD. It's what the JSON-LD describes. Structured metadata about content curated for agent consumption is fundamentally more useful than structured metadata about content written for human browsers.
Three Layers, Clear Hierarchy
The implementation has three independent layers. They don't overlap, and they're not equally important.
Layer 3: Markdown via Content Negotiation — the primary investment. Serve curated content as markdown directly when an agent requests it. No HTML parsing, no conversion layer. The agent gets exactly what it needs, generated from the same curated data that powers conversation, visual display, and connected access.
Two complementary mechanisms: HTTP content negotiation via the Accept: text/markdown header (standards-compliant for programmatic agents), and a .md URL suffix (discoverable, testable — paste a URL in your browser, add .md, see what agents see). Both serve the same curated content.
Layer 1: Signposting with llms.txt — the front door. A markdown file at your site root that acts as a curated sitemap for AI agents. "Here's what this site is and where to find the good stuff." High return on minimal investment. Include dual-format links pointing to both HTML and markdown versions of key content.
Layer 2: JSON-LD — backward compatibility. Structured metadata for traditional search engines that can't reason over content. In the transition period where both traditional search and agent-based search discover content, JSON-LD serves the former. For agent-first products, it's generated from curated data at request time — a low-cost view of data that already exists.
Same URL, same curated data, appropriate presentation for the consumer. HTML is the human experience. Markdown is the agent experience.
The SEO Parallel
SEO evolved through phases, and agent discoverability is following the same arc.
Keyword stuffing gave way to structural optimisation — proper heading hierarchy, semantic HTML, structured data. Structural optimisation gave way to intent matching — satisfying what the searcher actually wanted, not just matching their words. Intent matching gave way to trust and authority — where Google evaluates the credibility of the source, not just the relevance of the content.
Agent discoverability is on the same path. Phase 1 is content availability — making content accessible to agents at all, not hidden behind JavaScript shells. Phase 2 is content structure — headers, sections, metadata that agents can parse. Phase 3 is content engineering — reasoning about content at write time to produce curated material agents can work with effectively. Phase 4 is agent-to-agent integration — protocols enabling agents to query agents directly.
We're in the transition between phases 1 and 2. Most websites still serve JavaScript shells with minimal server-rendered content. AI crawlers account for roughly 4.2% of HTML requests globally, growing 41% in recent months. The trajectory suggests agent traffic will become a significant share of web traffic — making this framework increasingly relevant.
The Browser Surface: WebMCP
While the three layers above cover discovery — agents finding and reading your content — a parallel standard is emerging for agents acting within your site.
WebMCP (Web Model Context Protocol), co-authored by Google and Microsoft, shipped in Chrome 146 as an early preview in February 2026. It lets websites declare structured tool contracts via navigator.modelContext — a browser API where your site registers functions with defined schemas that browser-embedded AI agents can call directly. Instead of an agent scraping your DOM to figure out which button submits a form, your site declares "this function does X with parameters Y" and the agent calls it.
This is complementary to Anthropic's MCP, not competing. MCP is server-side — external agents connect directly to your backend. WebMCP is browser-side — agents operate within the user's browsing session, using their existing authentication. Together they cover the full spectrum: MCP for headless automation and connected access, WebMCP for in-browser agent interaction.
The relevance depends on what your site does. For content and portfolio sites, the three discovery layers are the investment — agents need to read your content, not click your buttons. For transactional applications — e-commerce, booking, SaaS dashboards — WebMCP is the bigger shift. When a travel site can declare searchFlights(origin, destination, date) as a structured tool instead of hoping the agent finds the right input field, the reliability gap between human and agent interaction closes.
The spec is moving through W3C with formal browser announcements expected by mid-2026. It's too early to implement, but not too early to design for. Clean separation between UI and underlying functions — the same internal APIs your forms call — is the preparation. When WebMCP stabilises, exposing those functions as tool contracts is a thin layer on top of architecture you should already have.
Security: Content as Attack Surface
When you serve content that AI agents process, that content becomes a potential vector for prompt injection — embedded instructions designed to manipulate the agent's behaviour.
The risk profile differs dramatically between the two states. Retrofitted sites convert existing HTML to markdown, exposing content that was never meant to be read as instructions — HTML comments, hidden elements, form fields, CSS content properties. None of these render visually, but they all survive conversion to raw text. The gap between rendered output and raw text is the injection surface.
Agent-first sites assemble markdown from structured data fields. There's no hidden layer — the markdown is the content, built from known fields. The risk is lower but not zero, particularly when content includes user-generated text.
Risk also scales with the consuming agent's capability, not your system. A discovery agent doing web search is low risk — it's just reading. Your own agent in a web app is low risk — worst case is social engineering through the agent's voice. But a connected agent with tool access (filesystem, shell, browser) is medium-high risk — injected instructions could cause the visitor's agent to execute local actions.
Mitigations are architectural: assemble content from typed, validated fields rather than converting freeform HTML. Isolate content per user at distinct URLs. Sanitise aggregation points where multiple users' content appears in a single document. And monitor for content changes that look like instructions rather than information.
Measuring What Matters
This is the equivalent of SEO analytics for agents. The tooling is emerging but already practical.
Cloudflare AI Crawl Control shows which AI services crawl your site, what content they access, and their crawl purpose — training, search, user action. Microsoft Clarity's bot activity dashboard (launched January 2026) identifies bot operators and categorises AI request share. Known Agents provides real-time agent monitoring with LLM referral tracking — measuring human conversions from AI platforms like ChatGPT and Perplexity.
The metrics that matter: are AI agents visiting? Which agents, and which pages? Are they using content negotiation to request markdown? Is llms.txt being accessed? Are people finding you via AI platforms?
But the ultimate test is qualitative: does an agent that finds your content represent you well? Feed your markdown output to an LLM and ask it to describe what it found. Compare that to what you'd want a discovery agent to say about you. Search for yourself on AI platforms. This is the agent equivalent of Googling yourself — and the fix, when the representation isn't right, is in the curated content, not the delivery layer.
Practically Applicable Today
This is the most immediately practical framework in these articles. It applies to every website — not just AI-native products.
A traditional web product can start with Layer 1: create an llms.txt at your domain root pointing to your best content. Add Layer 2: template JSON-LD from existing metadata. Then invest in Layer 3: serve key pages as markdown via content negotiation. The progression is natural and each layer delivers independent value.
For products with content already engineered for agent consumption — the content engineering described in Article 2 — the discovery layer is catching up delivery. The hard work (curating content) is done. Serving it to discovery agents via content negotiation is the natural extension.
The connection between the articles: Article 2 establishes why curated content matters and how to structure it. This article shows how that content reaches agents searching the open web. Content engineering is the investment. Discoverability is the return.
This site implements the full framework. The content you're reading is served via content negotiation — markdown for agents, HTML for humans. Same data, same source, appropriate format for the consumer. The medium is the message.