Field note Essay 12 · Vol. III · GEO · AI Search · llms.txt · methodology · Published July 3, 2026

Does llms.txt do anything? A preregistered efficacy protocol.

Over 844,000 sites have shipped an llms.txt file, and an independent 137K-site log study found ~97% of them are never read — Google has said it will not support it, and where the file is fetched at all it is mostly by training and coding agents, not the answer engines that shape citations. That is a strong prior, not a verdict. This note does the thing the debate is missing: it preregisters the controlled test that would actually settle whether llms.txt changes model behaviour, states the hypotheses and the falsification conditions in advance, and publishes the a-priori prediction before the data exists — because the live-model probing the test requires is a stated, unmet dependency, and a prediction dressed as a result is exactly the dishonesty this archive exists to avoid.

There is a recurring pattern in search practice: a plausible artefact appears, a plausible story attaches to it, and adoption races ahead of evidence by a year or more. Meta keywords had it. Authorship markup had it. llms.txt is having it now. The file is easy to write, the story — “help the models find your best content” — is intuitively appealing, and by late 2025 the adoption counters read in the hundreds of thousands.1 What the field skipped, as it usually does, is the boring middle step: does it work? Not “is it reasonable,” not “did a vendor bless it,” but does adding the file measurably change what an answer engine retrieves, prioritises, or cites about your site. This note is my attempt to not skip that step — and, because the clean version of the test has a dependency I have not yet met, to do the honest thing and preregister it rather than pretend I have already run it.

§1 · The prior: what is already known

Four facts are established well enough to build on, each cited to its source rather than asserted.

Provenance. llms.txt was proposed in September 2024 by Jeremy Howard (Answer.AI) as a Markdown file at a site’s root that lists and describes its key pages, so a language model with limited context can find and prioritise the important material without parsing full HTML.2 Note the design intent from the start: it is a discovery and prioritisation aid — a content map — never an access-control mechanism. Much of the confusion in the field comes from treating it as the latter.

Vendor stance. No major model provider has committed to consuming the file. The most thorough server-log analysis to date — Ahrefs’ study of 137,000 sites — found that only ~3% of llms.txt files receive any requests at all, and that among the fetches that do occur, the readers are led by OpenAI’s GPTBot and Anthropic’s Claude-Code, with the search-grounded retrieval agents that actually shape citations (ChatGPT, Perplexity) barely registering.3 So the file is not quite unread — it is read rarely, and mostly by the wrong agents for a publisher’s purpose. Google was explicit: in mid-2025 its representatives stated Google does not support llms.txt and has no plans to, with one publicly comparing it to the long-discredited keywords meta tag.

Observed traffic. The same analysis reports that the overwhelming majority of sites that ship llms.txt receive no requests for it at all — Ahrefs put the share receiving zero requests at roughly 97% across 137,000 domains.4 Whatever the exact figure, the direction is consistent: the file is being written far more than it is being read by machines.

The structural reason. Underneath the vendor silence is an incentive conflict, not an oversight. Publishers want AI systems to send users to their sites; the platforms increasingly want to answer directly and keep the user in their own surface. A voluntary standard whose main benefit accrues to the publisher, at some cost to the platform’s control of the experience, has weak adoption incentives on the side that would have to honour it. That is the gravitational field every efficacy result will be measured against.

The prior these four facts compose is strong and negative: llms.txt is probably not doing much, at least not through bulk crawling. But notice what the prior does not establish. It does not measure whether an on-demand, user-triggered fetch (the ChatGPT-User / Claude-User / Perplexity-User class of agent) reads and uses the file when a user references a site. It does not distinguish “not fetched” from “fetched and ignored.” And it does not test the one causal claim that matters to a publisher: holding content constant, does the presence of llms.txt change what an engine cites? Those gaps are what the protocol exists to close.

§2 · Hypotheses

Stated in advance, mutually exclusive where they can be, each with an explicit falsification condition.

  • H0 — Null. The presence and content of llms.txt produce no measurable difference in what an answer engine retrieves, prioritises, or cites, versus an identical site without the file. Falsified by any arm showing a statistically reliable citation or retrieval delta attributable to the file.
  • H1 — On-demand retrieval effect. User-triggered fetch agents read llms.txt when a user references the site and use it to prioritise which pages to ground on, producing a measurable citation delta in on-demand contexts even if bulk crawlers ignore the file. Falsified by on-demand agents showing no fetch of the file and no citation delta.
  • H2 — Discovery/prioritisation effect. Even absent a direct fetch of llms.txt at answer time, sites carrying a well-formed file see improved prioritisation of the listed pages in grounded answers, via indirect pathways (e.g. third-party aggregation of llms.txt content into indices the engines do consume). Falsified by no prioritisation difference between listed and unlisted pages holding all other signals constant.

§3 · The protocol

Two arms, one passive and cheap, one controlled and dependency-gated.

Arm A — Passive server-log observation (runnable now). Across a set of consenting sites that ship llms.txt, instrument the server logs to record every request for /llms.txt, /llms-full.txt, and /robots.txt, tagged by declared user-agent and verified against published crawler IP ranges. Over a fixed window, measure: fetch frequency per agent, the read/write ratio (what share of shipping sites see any AI fetch), and — the discriminating comparison — the ratio of llms.txt fetches to robots.txt fetches per agent, which normalises for how often each crawler visits at all. This arm produces a first-party version of the “~97% zero requests” figure and, crucially, separates never fetched from fetched rarely.

Arm B — Controlled canary-claim probe (dependency-gated). This is the arm that tests causation, and the arm that needs live-model access. Construct matched page pairs that are byte-identical in content but differ only in whether a well-formed llms.txt lists and describes the page. Seed each with a distinct, unique canary claim — a specific, verifiable, otherwise-unattested assertion that can be searched for in model outputs with no false positives. After a crawl-and-index settling period, probe a panel of answer engines spanning the three agent classes — bulk-trained closed-book, search-grounded, and on-demand fetch — with queries whose correct answer is the canary claim, and classify each response using the four-class citation taxonomy from the companion audit (verbatim cite / paraphrase-with-source / silent absorption / contradiction). The dependent variable is the citation-rate delta between the llms.txt-listed and unlisted arms of each matched pair. Controls: randomised assignment of which twin carries the file, counterbalanced canary claims, a no-llms.txt control cohort, and blinded classification.

What would make each result trustworthy. Arm A is trustworthy if it replicates across sites and windows and if the agent-normalised ratio is stable. Arm B is trustworthy only with pre-committed matched pairs, blinded scoring, and a control cohort — without those, a citation delta could be ordinary content or authority variance masquerading as an llms.txt effect, which is exactly the error most informal “I added llms.txt and traffic went up” reports make.

§4 · Why publish the protocol before the data

Two reasons, one principled and one practical, and they point the same way.

The principled reason is that preregistration is the cheapest available defence against the field’s dominant failure mode — motivated reasoning after the fact. The llms.txt conversation is already full of post-hoc stories in both directions: adopters who credit the file for gains it did not cause, and cynics who dismiss it without a controlled test. Writing the hypotheses, the metric, and the a-priori prediction down first means the eventual number cannot be quietly reshaped to fit whichever narrative I would prefer. A wrong prediction I recorded in advance is a contribution; a right prediction I could only produce after seeing the data is not.

The practical reason is that the controlled arm has a stated dependency I have not met. Arm B requires probing production answer engines at volume across a model panel — live-model API access with an associated budget — which this project has explicitly flagged as an open, owner-side dependency rather than something already in place.

Rather than let that dependency delay the useful part — the design, the prior, and the practitioner guidance — I am shipping those now and deferring only the measurement that genuinely requires the access.

§5 · What a practitioner should do today

The protocol is deferred; the decision is not. A publisher asking “should I ship llms.txt?” does not need my Arm B results to act rationally, because the cost-benefit is lopsided enough to decide under the current prior.

  1. Ship it — the downside is negligible. A well-formed llms.txt costs minutes to write, cannot hurt your conventional SEO, and positions you if adoption by the engines does materialise. Expected value is mildly positive even if the current efficacy is near zero, because the option is cheap and the downside is bounded.
  2. Put no enforcement weight on it. This is the load-bearing instruction. llms.txt is a content map, not access control; it does not stop crawling, training, or citation of anything. Anyone treating it as a way to prevent AI use of their content has made the category error the whole field keeps making — and the crawler-trust gap documented elsewhere on this site shows that even robots.txt, which crawlers do read, is a signal rather than a fence.
  3. Pair it with the artefacts that carry more weight. Ship robots.txt with explicit AI-crawler rules for the enforcement signal, llms.txt for the discovery map, and — where crawler abuse is a real risk — server-side controls underneath both. The crawler-policy tool on this site generates the set together for exactly this reason: each artefact does one job, and none of them does the job the others do.5
  4. Do not credit llms.txt for gains you did not isolate. If you add the file and citations rise, resist the story until you have held content constant. The most common llms.txt “success” report is a confounded one — the file shipped alongside a content refresh, a new page, or an authority gain that did the real work. That confound is the entire reason Arm B is designed the way it is.

The reason to run this test at all — rather than continuing to argue about the file from priors — is that “probably does nothing” and “measurably does nothing in bulk crawling but something in on-demand fetch” are different worlds for a practitioner deciding where to spend effort, and only a controlled probe can tell them apart. Until Arm B runs, the intellectually honest position is the one this note takes: a strong negative prior, a falsifiable prediction on the record, and a design specific enough that someone — me, once the dependency is met, or a better-resourced team before then — can prove me wrong in public.

References

  1. Howard, J. (Answer.AI) (2024). The /llms.txt proposal. llmstxt.org. — Original specification, September 2024. Frames the file as a Markdown content map for context-limited models — explicitly a discovery aid, not access control.
  2. Ahrefs (2026). We Analyzed 137K Sites: 97% of llms.txt Files Never Get Read. Ahrefs blog, June 2026. — Basis for §1: ~97% of llms.txt files across 137K domains received zero requests; the few fetches were led by GPTBot and Claude-Code, not answer-engine retrieval agents.
  3. BuiltWith (2025). llms.txt usage statistics. BuiltWith technology tracking. — Adoption above 844,000 sites by late October 2025 — the belief metric §1 separates from efficacy.
  4. Illyes, G. & Mueller, J. (Google) (2025). Public statements on Google and llms.txt support. Google Search Central, 2025. — Google's stated non-support; the keywords-meta-tag comparison cited in §1.
  5. Sasson, G. (2026). A taxonomy of LLM citation behaviour across 14 frontier models. Algoholic, Vol. III, Essay 03. — Source of the four-class citation coding used as Arm B's dependent-variable rubric.
  6. Sasson, G. (2026). A GEO threat model: the attack surface of machine-mediated citation. Algoholic, Vol. III, Essay 11. — Companion piece; shares the live-model-access dependency that gates this protocol's Arm B.
  7. Sasson, G. (2026). The AI Crawler Policy Generator. Algoholic tools. — Generates robots.txt / llms.txt / ai.txt as a set; encodes the same content-map-not-access-control framing §5 recommends.

Footnotes

  1. BuiltWith’s tracking put llms.txt adoption above 844,000 sites by late October 2025. Adoption is a measure of belief, not of efficacy; the two are being conflated across most of the practitioner writing on the topic, which is precisely the gap this note targets.

  2. Howard, J. / Answer.AI, The /llms.txt proposal (llmstxt.org, September 2024). The specification frames the file as an aid for models with constrained context to locate and prioritise a site’s key documents — a content map, explicitly not a robots-style directive.

  3. Ahrefs, We Analyzed 137K Sites: 97% of llms.txt Files Never Get Read (June 2026). Among the ~3% of files fetched at all, named AI tools were roughly a fifth of requests, led by GPTBot and Claude-Code; dedicated retrieval agents (ChatGPT, Perplexity) barely appeared. The load-bearing claim is not that the file is never fetched but that it is fetched rarely and mostly by agents whose fetch does not feed answer-engine citation — which is what makes the enforcement floor plausibly near zero for a publisher’s actual purpose.

  4. The “~97% receive zero requests” figure is Ahrefs’ June 2026 finding across 137,000 domains (May 2026 traffic window). I cite it as an order-of-magnitude indicator of the read/write asymmetry, not a precise constant; the preregistered log arm below is designed to produce a first-party version of exactly this number.

  5. Sasson, The AI Crawler Policy Generator (algoholic.com/tools/crawler-policy). The tool’s own copy states the same limitation this note argues for: llms.txt is a content map for language models, not an access control, and does not restrict crawling — use robots.txt plus server rules for that.

Version v1.0
Published July 3, 2026
Length 3,192 words · 15 min
Cite as Sasson, G. (2026). Does llms.txt do anything? A preregistered efficacy protocol. Algoholic, Vol. III, Essay 12, v1.0. https://algoholic.com/research/llms-txt-efficacy-protocol
Gilad Sasson

Gilad Sasson

aka Algoholic · גלעד ששון

Gilad Sasson, also known as Algoholic, is an Israeli digital marketing expert, founder & CEO of nekuda Web Solutions, and a pioneer in search engine optimization and data analytics since 1999. Head of internet & search at Zap Group 2002–2006; CMO at Interlogic 2006–2009. Speaker at SMX Israel, TNW Amsterdam, Web Summit Dublin, DMIEXPO.

Read all the way through? Get the next one in your inbox.

Two essays a week — a field note and a working paper, by the same standard you just read. How this site is written →