There is a recurring pattern in search practice: a plausible artefact appears, a
plausible story attaches to it, and adoption races ahead of evidence by a year or
more. Meta keywords had it. Authorship markup had it. llms.txt is having it
now. The file is easy to write, the story — “help the models find your best
content” — is intuitively appealing, and by late 2025 the adoption counters read
in the hundreds of thousands.1 What the field skipped, as it usually
does, is the boring middle step: does it work? Not “is it reasonable,” not “did
a vendor bless it,” but does adding the file measurably change what an answer
engine retrieves, prioritises, or cites about your site. This note is my attempt
to not skip that step — and, because the clean version of the test has a
dependency I have not yet met, to do the honest thing and preregister it rather
than pretend I have already run it.
§1 · The prior: what is already known
Four facts are established well enough to build on, each cited to its source rather than asserted.
Provenance. llms.txt was proposed in September 2024 by Jeremy Howard
(Answer.AI) as a Markdown file at a site’s root that lists and describes its
key pages, so a language model with limited context can find and prioritise the
important material without parsing full HTML.2 Note the design
intent from the start: it is a discovery and prioritisation aid — a content
map — never an access-control mechanism. Much of the confusion in the field
comes from treating it as the latter.
Vendor stance. No major model provider has committed to consuming the file.
The most thorough server-log analysis to date — Ahrefs’ study of 137,000 sites —
found that only ~3% of llms.txt files receive any requests at all, and that
among the fetches that do occur, the readers are led by OpenAI’s GPTBot and
Anthropic’s Claude-Code, with the search-grounded retrieval agents that actually
shape citations (ChatGPT, Perplexity) barely registering.3 So the file is
not quite unread — it is read rarely, and mostly by the wrong agents for a
publisher’s purpose. Google was explicit: in mid-2025 its representatives stated
Google does not support llms.txt and has no plans to, with one publicly
comparing it to the long-discredited keywords meta tag.
Observed traffic. The same analysis reports that the overwhelming majority of
sites that ship llms.txt receive no requests for it at all — Ahrefs put the
share receiving zero requests at roughly 97% across 137,000 domains.4 Whatever the exact figure, the direction is
consistent: the file is being written far more than it is being read by
machines.
The structural reason. Underneath the vendor silence is an incentive conflict, not an oversight. Publishers want AI systems to send users to their sites; the platforms increasingly want to answer directly and keep the user in their own surface. A voluntary standard whose main benefit accrues to the publisher, at some cost to the platform’s control of the experience, has weak adoption incentives on the side that would have to honour it. That is the gravitational field every efficacy result will be measured against.
The prior these four facts compose is strong and negative: llms.txt is
probably not doing much, at least not through bulk crawling. But notice what the
prior does not establish. It does not measure whether an on-demand,
user-triggered fetch (the ChatGPT-User / Claude-User / Perplexity-User class of
agent) reads and uses the file when a user references a site. It does not
distinguish “not fetched” from “fetched and ignored.” And it does not test the
one causal claim that matters to a publisher: holding content constant, does
the presence of llms.txt change what an engine cites? Those gaps are what the
protocol exists to close.
§2 · Hypotheses
Stated in advance, mutually exclusive where they can be, each with an explicit falsification condition.
- H0 — Null. The presence and content of
llms.txtproduce no measurable difference in what an answer engine retrieves, prioritises, or cites, versus an identical site without the file. Falsified by any arm showing a statistically reliable citation or retrieval delta attributable to the file. - H1 — On-demand retrieval effect. User-triggered fetch agents read
llms.txtwhen a user references the site and use it to prioritise which pages to ground on, producing a measurable citation delta in on-demand contexts even if bulk crawlers ignore the file. Falsified by on-demand agents showing no fetch of the file and no citation delta. - H2 — Discovery/prioritisation effect. Even absent a direct fetch of
llms.txtat answer time, sites carrying a well-formed file see improved prioritisation of the listed pages in grounded answers, via indirect pathways (e.g. third-party aggregation ofllms.txtcontent into indices the engines do consume). Falsified by no prioritisation difference between listed and unlisted pages holding all other signals constant.
§3 · The protocol
Two arms, one passive and cheap, one controlled and dependency-gated.
Arm A — Passive server-log observation (runnable now). Across a set of
consenting sites that ship llms.txt, instrument the server logs to record every
request for /llms.txt, /llms-full.txt, and /robots.txt, tagged by declared
user-agent and verified against published crawler IP ranges. Over a fixed window,
measure: fetch frequency per agent, the read/write ratio (what share of shipping
sites see any AI fetch), and — the discriminating comparison — the ratio of
llms.txt fetches to robots.txt fetches per agent, which normalises for how
often each crawler visits at all. This arm produces a first-party version of the
“~97% zero requests” figure and, crucially, separates never fetched from
fetched rarely.
Arm B — Controlled canary-claim probe (dependency-gated). This is the arm
that tests causation, and the arm that needs live-model access. Construct matched
page pairs that are byte-identical in content but differ only in whether a
well-formed llms.txt lists and describes the page. Seed each with a distinct,
unique canary claim — a specific, verifiable, otherwise-unattested assertion
that can be searched for in model outputs with no false positives. After a
crawl-and-index settling period, probe a panel of answer engines spanning the
three agent classes — bulk-trained closed-book, search-grounded, and on-demand
fetch — with queries whose correct answer is the canary claim, and classify each
response using the four-class citation taxonomy from the companion audit
(verbatim cite / paraphrase-with-source / silent absorption / contradiction).
The dependent variable is the citation-rate delta between the llms.txt-listed
and unlisted arms of each matched pair. Controls: randomised assignment of which
twin carries the file, counterbalanced canary claims, a no-llms.txt control
cohort, and blinded classification.
What would make each result trustworthy. Arm A is trustworthy if it
replicates across sites and windows and if the agent-normalised ratio is stable.
Arm B is trustworthy only with pre-committed matched pairs, blinded scoring, and
a control cohort — without those, a citation delta could be ordinary content or
authority variance masquerading as an llms.txt effect, which is exactly the
error most informal “I added llms.txt and traffic went up” reports make.
§4 · Why publish the protocol before the data
Two reasons, one principled and one practical, and they point the same way.
The principled reason is that preregistration is the cheapest available
defence against the field’s dominant failure mode — motivated reasoning after
the fact. The llms.txt conversation is already full of post-hoc stories in both
directions: adopters who credit the file for gains it did not cause, and cynics
who dismiss it without a controlled test. Writing the hypotheses, the metric, and
the a-priori prediction down first means the eventual number cannot be quietly
reshaped to fit whichever narrative I would prefer. A wrong prediction I recorded
in advance is a contribution; a right prediction I could only produce after
seeing the data is not.
The practical reason is that the controlled arm has a stated dependency I have not met. Arm B requires probing production answer engines at volume across a model panel — live-model API access with an associated budget — which this project has explicitly flagged as an open, owner-side dependency rather than something already in place. Rather than let that dependency delay the useful part — the design, the prior, and the practitioner guidance — I am shipping those now and deferring only the measurement that genuinely requires the access.
§5 · What a practitioner should do today
The protocol is deferred; the decision is not. A publisher asking “should I ship
llms.txt?” does not need my Arm B results to act rationally, because the
cost-benefit is lopsided enough to decide under the current prior.
- Ship it — the downside is negligible. A well-formed
llms.txtcosts minutes to write, cannot hurt your conventional SEO, and positions you if adoption by the engines does materialise. Expected value is mildly positive even if the current efficacy is near zero, because the option is cheap and the downside is bounded. - Put no enforcement weight on it. This is the load-bearing instruction.
llms.txtis a content map, not access control; it does not stop crawling, training, or citation of anything. Anyone treating it as a way to prevent AI use of their content has made the category error the whole field keeps making — and the crawler-trust gap documented elsewhere on this site shows that evenrobots.txt, which crawlers do read, is a signal rather than a fence. - Pair it with the artefacts that carry more weight. Ship
robots.txtwith explicit AI-crawler rules for the enforcement signal,llms.txtfor the discovery map, and — where crawler abuse is a real risk — server-side controls underneath both. The crawler-policy tool on this site generates the set together for exactly this reason: each artefact does one job, and none of them does the job the others do.5 - Do not credit
llms.txtfor gains you did not isolate. If you add the file and citations rise, resist the story until you have held content constant. The most commonllms.txt“success” report is a confounded one — the file shipped alongside a content refresh, a new page, or an authority gain that did the real work. That confound is the entire reason Arm B is designed the way it is.
The reason to run this test at all — rather than continuing to argue about the file from priors — is that “probably does nothing” and “measurably does nothing in bulk crawling but something in on-demand fetch” are different worlds for a practitioner deciding where to spend effort, and only a controlled probe can tell them apart. Until Arm B runs, the intellectually honest position is the one this note takes: a strong negative prior, a falsifiable prediction on the record, and a design specific enough that someone — me, once the dependency is met, or a better-resourced team before then — can prove me wrong in public.
References
- Howard, J. (Answer.AI) (2024). The /llms.txt proposal. llmstxt.org. — Original specification, September 2024. Frames the file as a Markdown content map for context-limited models — explicitly a discovery aid, not access control.
- Ahrefs (2026). We Analyzed 137K Sites: 97% of llms.txt Files Never Get Read. Ahrefs blog, June 2026. — Basis for §1: ~97% of llms.txt files across 137K domains received zero requests; the few fetches were led by GPTBot and Claude-Code, not answer-engine retrieval agents.
- BuiltWith (2025). llms.txt usage statistics. BuiltWith technology tracking. — Adoption above 844,000 sites by late October 2025 — the belief metric §1 separates from efficacy.
- Illyes, G. & Mueller, J. (Google) (2025). Public statements on Google and llms.txt support. Google Search Central, 2025. — Google's stated non-support; the keywords-meta-tag comparison cited in §1.
- Sasson, G. (2026). A taxonomy of LLM citation behaviour across 14 frontier models. Algoholic, Vol. III, Essay 03. — Source of the four-class citation coding used as Arm B's dependent-variable rubric.
- Sasson, G. (2026). A GEO threat model: the attack surface of machine-mediated citation. Algoholic, Vol. III, Essay 11. — Companion piece; shares the live-model-access dependency that gates this protocol's Arm B.
- Sasson, G. (2026). The AI Crawler Policy Generator. Algoholic tools. — Generates robots.txt / llms.txt / ai.txt as a set; encodes the same content-map-not-access-control framing §5 recommends.
Footnotes
-
BuiltWith’s tracking put
llms.txtadoption above 844,000 sites by late October 2025. Adoption is a measure of belief, not of efficacy; the two are being conflated across most of the practitioner writing on the topic, which is precisely the gap this note targets. ↩ -
Howard, J. / Answer.AI, The /llms.txt proposal (llmstxt.org, September 2024). The specification frames the file as an aid for models with constrained context to locate and prioritise a site’s key documents — a content map, explicitly not a robots-style directive. ↩
-
Ahrefs, We Analyzed 137K Sites: 97% of llms.txt Files Never Get Read (June 2026). Among the ~3% of files fetched at all, named AI tools were roughly a fifth of requests, led by GPTBot and Claude-Code; dedicated retrieval agents (ChatGPT, Perplexity) barely appeared. The load-bearing claim is not that the file is never fetched but that it is fetched rarely and mostly by agents whose fetch does not feed answer-engine citation — which is what makes the enforcement floor plausibly near zero for a publisher’s actual purpose. ↩
-
The “~97% receive zero requests” figure is Ahrefs’ June 2026 finding across 137,000 domains (May 2026 traffic window). I cite it as an order-of-magnitude indicator of the read/write asymmetry, not a precise constant; the preregistered log arm below is designed to produce a first-party version of exactly this number. ↩
-
Sasson, The AI Crawler Policy Generator (algoholic.com/tools/crawler-policy). The tool’s own copy states the same limitation this note argues for:
llms.txtis a content map for language models, not an access control, and does not restrict crawling — userobots.txtplus server rules for that. ↩
