Published
- 14 min read
How to Make Your Blog Agent-Ready: A Practical Implementation Guide

The web has always adapted to new readers. First it spoke to browsers. Then it learned to speak to search engines. Now it needs to speak to AI agents. The shift from a human-read web to a machine-read web is the biggest architectural change in decades, and the standards that govern it are being written right now.
I recently ran my own blog through Cloudflare’s isitagentready.com scanner and scored a 43 out of 100 a failing grade for a site that publishes technical content AI tools should be able to reference. This is the story of how I took that score to 90+ using a combination of emerging standards, Cloudflare infrastructure, and a custom Pages Function that replicates a Pro-only feature for free.
If you run a content site, a documentation portal, or a technical blog, this guide walks you through exactly what to ship, what to skip, and how to do it without paying for a premium plan.
What to Remember
- Agent-readiness splits into two layers: the page itself (semantic HTML, accessibility) and the protocols around it (robots.txt, Link headers, Markdown negotiation, Content Signals).
- Markdown negotiation is the highest-impact change it reduces token consumption by up to 80% for agents fetching your content.
- You do not need a Cloudflare Pro plan to support Markdown for Agents. A Pages Function middleware can replicate the feature for free.
- Content Signals in robots.txt let you declare whether your content can be used for AI training, search, and runtime input independently of the binary allow/disallow rules.
- DNS-AID records are an emerging IETF draft for DNS-based agent discovery. Adoption is near-zero today, but the spec is stabilizing.
- Not every check matters for every site. OAuth discovery, MCP Server Cards, and commerce protocols are irrelevant for a blog. Focus on discoverability and content accessibility.
The Two Surfaces of Agent Readiness
Before diving into implementation, you need the right mental model. Agents interact with your site at two distinct surfaces, and confusing them leads to wasted effort.
Layer 1: The Page Itself
What an agent gets when it fetches a URL. This includes semantic HTML, stable layout, ARIA roles, accessibility primitives, and the rendered DOM. Agents typically do one of three things with a page: take a screenshot for visual reasoning, read the raw HTML for text extraction, or walk the accessibility tree for structured navigation.
Most of Layer 1 is traditional SEO and accessibility done well. Use <button> for buttons, <a> for links, <nav> for navigation, <main> for main content. An agent reading the accessibility tree gets clear roles. A <div onclick> is invisible to an agent walking that tree. Reserve space for images and dynamic content to avoid layout shifts agents that take screenshots between actions break when the page jumps.
This layer has been technical SEO ground for a decade. If your site is already accessible, it is already partially agent-ready.
Layer 2: The Protocols Around the Page
What is discoverable without rendering. This includes robots.txt, sitemap.xml, Link headers, llms.txt, the entire /.well-known/ directory, Markdown negotiation, OAuth metadata, MCP and A2A discovery, Content Signals, and DNS-AID records.
The page-level surface is solved work. The protocol-level surface is the new work. Most of this guide focuses on Layer 2 because that is where the score gains live.
What Shipping This Stack Actually Does
Before any implementation detail, an honest boundary on what to expect. Shipping llms.txt, Content Signals, or Markdown negotiation does not guarantee AI citations, AI search visibility, or referral traffic. Google’s own AI optimization guide is explicit: “You don’t need to create new machine readable files, AI text files, markup, or Markdown to appear in generative AI search.”
What the protocols demonstrably do, on current evidence, comes down to two things:
- They reduce fetch and parse cost for the agents already coming to read your site. Cloudflare’s published data shows roughly an 80% token reduction when HTML is converted to markdown before being served to an agent.
- They make you legible to the systems deciding whether to cite, train on, or surface you. Chrome’s Lighthouse Agentic Browsing audit now checks for
llms.txtat the domain root, and server logs show ChatGPT-User, ClaudeBot, and OAI-SearchBot fetching these endpoints regularly.
Whether those systems then pick you over the next page in their stack is a content, authority, and entity-infrastructure question not a protocol question.
Layer 2 Implementation: The Protocol Stack
Here is the practical implementation, ordered by impact per minute of effort.
1. AI Bot Rules in robots.txt
Every major AI company publishes an opt-in user agent. The simplest form of agent-readiness is adding explicit rules for them. Most respectful crawlers ask first.
User-agent: *
Allow: /
User-agent: GPTBot
Allow: /
User-agent: ClaudeBot
Allow: /
User-agent: PerplexityBot
Allow: /
User-agent: Google-Extended
Allow: /
If you want to block a specific crawler, switch its Allow to Disallow. Be deliberate blocking everything is the loudest way to disappear from AI search. Pair this with network-level enforcement at your CDN or WAF. Cloudflare, AWS, and Fastly all let you allow or block AI bots at the edge. Think of robots.txt as the polite signal and the CDN config as the enforcement layer. You want both.
2. Content Signals in robots.txt
Classic robots.txt is binary: a crawler can either read your content or it cannot. Content Signals fixes that gap by letting you separately declare your preferences for how AI systems use your content after it has been accessed. The spec lives at contentsignals.org and is authored as an IETF draft.
The three signals:
ai-train=yes|no: whether your content can be used to train AI modelssearch=yes|no: whether your content can appear in AI search resultsai-input=yes|no: whether your content can be used as runtime input to AI systems (RAG, grounding)
Add the directive under your catch-all User-agent block:
User-agent: *
Allow: /
Content-Signal: ai-train=yes, search=yes, ai-input=yes
Most publishers picking it up today set ai-train=no, search=yes, ai-input=yes the “cite me but do not train on me” stance. I set everything to yes because the point of a technical blog is to be cited, absorbed, and integrated. Pick values that match your own stance.
3. Link Headers for Agent Discovery
HTTP Link headers (RFC 8288) let agents discover important resources without parsing your HTML. When an agent fetches your homepage, it sees pointers to your sitemap, your llms.txt, your RSS feed, and your API catalog directly in the response headers.
On Cloudflare Pages, add a _headers file in your public/ directory:
/*
Link: </llms.txt>; rel="service-doc"; type="text/markdown"
Link: </llms-full.txt>; rel="alternate"; type="text/plain"
Link: </sitemap-index.xml>; rel="sitemap"; type="application/xml"
Link: </rss.xml>; rel="alternate"; type="application/rss+xml"
The four agent-aware rel values that actually matter are all RFC-registered: service-doc (RFC 8631) for human-readable docs, service-desc (RFC 8631) for machine-readable specs, api-catalog (RFC 9727) for API discovery, and describedby (RFC 8288) for a description of the site. Avoid invented values like rel="llms" or rel="ai" agent-aware scanners ignore them because they are not in the IANA Link Relations registry.
4. llms.txt
A plain text file at the root of your domain that summarises what your site is, what is on it, and which pages matter most. The spec lives at llmstxt.org. Think of it as a sitemap written for an LLM to read rather than a crawler to index.
A minimal example:
# William OGOU Cybersecurity Blog
> Expert cybersecurity insights, tutorials, and analysis by William OGOU.
## Blog Posts
- [Critical LiteLLM Flaw Exposes AI Gateways](https://blog.ogwilliam.com/post/litellm-vulnerability-cve-2026-49468-auth-bypass/): How CVE-2026-49468 allows unauthenticated Host header injection in LiteLLM.
- [MCP Security Practitioner's Guide](https://blog.ogwilliam.com/post/mcp-security-practitioners-guide/): Defense-in-depth blueprint for securing AI agents against RCE and injection attacks.
## About
- [About William OGOU](https://blog.ogwilliam.com/author/William-OGOU/)
The realistic use case today is narrower than most agent-readiness advice claims. The observable use case is coding agents looking up technical documentation, not chatbots grounding answers about your site at query time. Claude Code, for example, has platform.claude.com/llms.txt hardcoded into its system prompt. When the agent needs to look up something about its own tooling, it fetches that file first.
That said, shipping llms.txt is cheap, the cost of being wrong is zero, and Chrome’s Lighthouse now includes an audit for it. Ship a clean file and calibrate your expectations to “this might pay off later” rather than “this lights up chatbot citations now.”
5. Markdown Negotiation (The High-Impact Change)
When an agent sends Accept: text/markdown, return a markdown version of the page rather than the HTML. Agents parse markdown significantly more efficiently than HTML both in tokens and in extraction quality. Cloudflare measured up to 80% token reduction in some cases, which makes responses faster, cheaper, and more likely to be consumed in their entirety within an agent’s context window.
The Pro plan problem. Cloudflare’s native “Markdown for Agents” feature does this automatically at the edge, but it requires a Pro, Business, or Enterprise plan. If you are on the free tier, the dashboard toggle is locked.
The free replication. You can reproduce the feature identically on a free Cloudflare Pages plan using Pages Functions. A single _middleware.ts file at the root of a functions/ directory intercepts every request:
- If
Acceptdoes not containtext/markdown, pass through unchanged. Zero impact on normal browser traffic. - If
Acceptcontainstext/markdown, fetch the rendered HTML viacontext.next(), convert it to markdown, prepend YAML frontmatter extracted from<meta>tags, append JSON-LD blocks, and return with the correct headers.
The response matches Cloudflare’s spec exactly:
HTTP/2 200
content-type: text/markdown; charset=utf-8
content-signal: ai-train=yes, search=yes, ai-input=yes
x-markdown-tokens: 1294
vary: accept
The output format follows Cloudflare’s documented structure: YAML frontmatter with title, description, and image extracted from <meta> tags, followed by the body markdown with navigation and scripts stripped, and any JSON-LD structured data preserved as a fenced json code block at the end.
This approach has no external dependencies, runs on the free Pages Functions tier (100k requests/day), and can be upgraded to the native Pro feature later with zero code changes Cloudflare’s toggle would simply bypass the middleware.
6. DNS for AI Discovery (DNS-AID)
An IETF draft that defines how agents can discover your agent endpoints through DNS. The spec uses ServiceMode SVCB or HTTPS records under a _agents namespace, for example _index._agents.yourdomain.com.
To publish a DNS-AID record in Cloudflare:
- Go to DNS → Records → Add record
- Type:
SVCB - Name:
_index._agents.blog(for a subdomain) - Priority:
1 - Target:
blog.ogwilliam.com.(with trailing dot) - Value (optional):
alpn="h2"
Cloudflare’s SVCB editor does not support the endpoint custom parameter, but alpn="h2" alone is enough for the scanner to detect the record. Verify with:
curl -s -H 'accept: application/dns-json' \
'https://cloudflare-dns.com/dns-query?name=_index._agents.blog.ogwilliam.com&type=SVCB'
The honest calibration: DNS-AID has near-zero real-world agent adoption today. No major AI agent currently queries _agents DNS records. The standard is an IETF draft that is months or years from broad implementation. Ship it for the scanner score and future-proofing, not for immediate traffic.
What to Skip for a Content Site
Not every check matters for every site. The scanner cannot tell the difference between “missing because I have not got around to it” and “intentionally not implemented because it does not apply to my site.” These are the checks you should deliberately skip on a blog:
- OAuth Protected Resource (RFC 9728): Only relevant if you have authenticated APIs that agents might call on behalf of users. Content sites do not.
- OAuth / OIDC discovery: Only relevant if you run an OAuth authorization server. Content sites do not.
- API Catalog (RFC 9727): Only relevant if you have public APIs to advertise. Content sites do not.
- MCP Server Card: Only relevant if your site exposes an MCP server. Content sites do not.
- A2A Agent Card: Only relevant if your site is itself an agent offering services to other agents. Content sites do not.
- WebMCP: Only relevant if your site exposes tools to MCP clients. Content sites do not.
- Commerce protocols (x402, MPP, UCP, ACP): Only relevant for ecommerce. Content sites are not ecommerce.
Auditing Your Site
Three options, in order of effort.
Manual audit. Curl your own robots.txt, sitemap.xml, llms.txt, and each /.well-known/* URL. Check the Link header on your homepage. Verify Markdown negotiation by sending Accept: text/markdown. Open the accessibility tree in Chrome DevTools and walk it.
Cloudflare’s scanner. isitagentready.com covers the protocol layer robots.txt, sitemap, Link headers, well-known files, Markdown negotiation, Content Signals, DNS-AID. Free, fast, good first pass. For each failing check, it provides a prompt you can give to a coding agent to implement the fix.
Agentchecker. agentchecker.ai covers task-completion and UX whether an agent can navigate the flows on your site, fill forms, and complete actions. More relevant for SaaS and ecommerce than for a blog.
One caveat on scanner results: not every red mark is something to ship. Treat the scanner as a checklist to investigate, not a prescription. A 100 score is not the goal. The goal is to ship the protocols that matter for your site type and ignore the rest.
The Minimum Viable Agent-Ready Blog
Of the dozen components above, five give the most defensible return per minute of effort:
- AI bot rules in
robots.txt5 minutes. Controls who can crawl you. Skipping this is the loudest way to be invisible. - Content Signals in
robots.txt2 minutes. Declares your AI usage preferences. The simplest signal with the clearest intent. llms.txt30 minutes. Cheap to ship. No harm, possible future payoff.- Sitemap and Link headers 15 minutes. Table stakes for discoverability.
- Markdown negotiation 1 hour. The highest-impact change for token efficiency. Worth replicating with a Pages Function if you are on a free plan.
Skip the rest unless your site type demands it. The commerce protocols, Agent Skills, A2A cards, OAuth discovery, API Catalog, and WebMCP are all defensible reasons to wait.
Conclusion
Making your blog agent-ready is not about chasing a perfect scanner score. It is about reducing friction for the AI systems that are already fetching your content. The protocols are emerging, the standards are still being written, and the real-world adoption is early but the cost of implementation is low and the cost of being wrong is zero.
Start with the minimum viable stack: AI bot rules, Content Signals, llms.txt, Link headers, and Markdown negotiation. Audit with isitagentready.com. Skip what does not apply to your site type. The rest is optional future-proofing.
To discuss agent-readiness implementation or AI security strategy for your organisation, contact me on LinkedIn or by email.
Frequently Asked Questions (FAQ)
What does it mean for a website to be agent-ready?
A website is agent-ready when it supports the emerging standards that let AI agents discover, access, and consume its content efficiently. This includes protocol-level signals like robots.txt AI bot rules, Content Signals, Link headers, llms.txt, and Markdown content negotiation.
Do I need a Cloudflare Pro plan to support Markdown for Agents?
No. Cloudflare's native Markdown for Agents feature requires a Pro plan, but you can replicate the behavior for free using a Pages Functions middleware that intercepts Accept: text/markdown requests and converts the HTML response to markdown before returning it.
What are Content Signals in robots.txt?
Content Signals are an IETF draft extension to robots.txt that lets you separately declare whether your content can be used for AI training (ai-train), search results (search), and runtime AI input (ai-input). This gives finer control than the binary Allow/Disallow rules.
Does shipping llms.txt guarantee AI citations?
No. Google's own AI optimization guide states you do not need llms.txt to appear in generative AI search. The realistic use case today is coding agents looking up technical documentation. Ship it because it is cheap and has future option value, not because it guarantees citations.
What is DNS-AID and should I implement it?
DNS for AI Discovery (DNS-AID) is an IETF draft that defines how agents can discover endpoints through DNS SVCB records under a _agents namespace. Real-world adoption is near-zero today, but shipping the record is low-cost and future-proofs your site for when the standard matures.
Which agent-readiness checks should a blog skip?
A content blog should skip OAuth Protected Resource, OAuth/OIDC discovery, API Catalog, MCP Server Card, A2A Agent Card, WebMCP, and commerce protocols (x402, MPP, UCP, ACP). These are designed for APIs, SaaS platforms, and ecommerce not content sites.
Relevant Resources
- Cloudflare Blog: Introducing the Agent Readiness Score
- Cloudflare Docs: Markdown for Agents
- llmstxt.org: The llms.txt Specification
- contentsignals.org: Content Signals Framework
- IETF Draft: DNS for AI Discovery (DNS-AID)
- RFC 8288: Web Linking
- RFC 9727: API Catalog
- Scanner: isitagentready.com