For most of the twenty-seven years I have worked in search, security and marketing were different departments that met only at the incident review. The marketer wanted to be found; the security team wanted to not be breached; the two goals shared a domain name and little else. Generative retrieval collapses that separation. The mechanism that now decides whether your finding reaches a buyer — a model retrieving passages and grounding an answer in them — is the same mechanism an adversary manipulates to put words in the model’s mouth. The channel and the attack surface are the same surface. A field that has spent two years learning to optimise for machine-mediated citation has been, without quite noticing, learning to operate an asset it has not learned to defend.
This paper is the defender’s document I wanted and could not find. It is deliberately a threat model and not an audit: I am not reporting a measured breach rate across a panel of sites, because the honest measurement — probing production answer engines with controlled poison and watching what they repeat — requires live-model budget and a careful ethics boundary this piece does not pretend to have crossed.1 What I can do rigorously is the structural work security actually runs on: name the assets, the adversaries, and the trust boundaries; enumerate the attack classes and anchor each to the published literature where it exists and to a clearly-labelled constructed scenario where it does not; and derive a control set that follows from the structure rather than from wishful thinking. The roadmap is conventional for a threat model: the frame (§1), the seven attack classes (§2), why the SEO defences fail to transfer (§3), the defender’s playbook (§4), three serious objections (§5), and an accounting of what the model still gets wrong (§6).
§1 · The frame: assets, adversaries, trust boundaries
A threat model is worth nothing until it says three things plainly: what is worth protecting, who benefits from harming it, and where control changes hands.
The asset is the entity-to-claim binding. In a ranked-link world the asset was a position: a URL sitting at rank one for a query, observable in a SERP, defended by relevance and links. In a generative-retrieval world the asset is subtler and more valuable — it is the binding between a claim and your name inside the model’s answer. When a user asks a buying-stage question and the engine answers “according to Sasson’s 2026 analysis, statement-level framing raises attribution roughly threefold,” the asset that just paid off was not a ranking; it was the model’s willingness to route a finding through your entity. Everything an adversary wants to do here reduces to attacking that binding: break it (your claim, no name), transfer it (your claim, their name), or poison it (a false claim, your name).
The adversaries are more varied than “spammers.” SEO trained the field to model one attacker — the low-quality operator gaming a ranking signal. The grounding layer has at least five, and they want different things:
- The competitor wants the contradiction class: your claim surfaced and then beaten by theirs, or your name quietly swapped for theirs on a shared finding. Their goal is attribution share, and it is a zero-sum fight over the same buyer’s first machine-mediated impression.
- The reputation attacker wants a false claim bound to your entity — a fabricated “known issue,” a manufactured controversy, a poisoned biographical fact — surfaced with enough sourcing that the model repeats it as grounded.
- The scam / fraud operator wants the answer engine to emit their payload in a trusted voice: a fake support number, a malicious “official” download, a counterfeit policy. This is the class the published answer-engine-poisoning work targets directly.2
- The data harvester wants your content in a training or retrieval corpus regardless of your stated policy — the crawler-trust gap, below.
- The model-integrity researcher / state actor sits at the tail: interested in corrupting what a population of users believes about a topic at the retrieval layer, for which a commercial site is collateral rather than target.
The trust boundary is the retrieval-and-grounding step, and it is drawn in a place the publisher cannot see. In a classic web request the boundary is your server: you decide what to serve to whom. In generative retrieval the decisive boundary is inside the model’s pipeline — the moment untrusted, publicly authored text (yours, your competitor’s, an attacker’s) is pulled into the context window and treated, by a model with no reliable provenance signal, as material to ground an answer in. Everything downstream of that boundary — which passage wins, whose name survives synthesis, whether an embedded instruction is obeyed — is executed on the model provider’s side, under their heuristics, with no ranking to inspect and no referee to appeal to. The publisher controls the input to that boundary and nothing past it. The entire defensive playbook in §4 is a consequence of that one fact.
§2 · The attack classes
Seven classes cover the surface. They are ordered roughly from cheapest and best-documented to most speculative.
2.1 · Retrieval / corpus poisoning
The best-documented attack, and the most alarming, needs no access to any model or index — only the ability to publish. The published PoisonedRAG work showed that injecting a small number of crafted documents into a retrieval corpus of millions can drive a targeted query toward attacker-chosen output at high success rates — on the order of five injected texts and ~90% success for a specific target question in their setup.3 The mechanism is the asymmetry at the heart of RAG security: an attacker does not need their document to be the best answer or the highest-ranked page. They need it to be among the handful retrieved for the trigger query, because once it is in the context window it competes for the model’s grounding on roughly even terms with far more authoritative material.
For a commercial publisher the defensive reading is specific: your exposure is highest on narrow, high-intent, thinly-covered queries — exactly the bottom-of-funnel questions where a citation is worth the most and where a handful of adversarial pages can dominate the retrievable set. Broad, well-covered topics are comparatively self-defending because authentic coverage crowds the retrieval slots; your defensible niche query is the soft target.
2.2 · Indirect prompt injection through retrieved content
Where poisoning corrupts which passage is grounded on, indirect prompt injection corrupts what the model does once the passage is in context. The foundational demonstration — Greshake and colleagues’ work on compromising LLM-integrated applications — showed that instructions hidden in third-party content an application retrieves can hijack the model’s behaviour without the attacker ever touching the user or the prompt.4 OWASP now catalogues prompt injection as LLM01:2025, the top entry in its generative-AI risk list. For an answer engine grounding on live web pages, the retrieved page is the untrusted third-party content: an instruction embedded in a page — visible text, an alt attribute, off-screen markup — can attempt to steer the synthesised answer, suppress a competitor, or emit an attacker’s payload, to the extent the provider’s defences fail to strip it.
The publisher-side lesson is double-edged. Defensively, your own pages should not be a vector — user-generated content, syndicated blocks, and third-party embeds on your domain can carry an injection that harms your readers when an engine grounds on your page. On the hygiene side, you cannot rely on the engine to sanitise perfectly, and you should assume any field where you accept outside text is a place an attacker can try to reach the model through you.
2.3 · Citation hijacking (attribution reassignment)
This is the class that turns the taxonomy of citation behaviour into a weapon. In the companion audit paper I characterised silent absorption — your claim repeated as fact with the source evaporated — and mis-attribution, where a verbatim lift carries a footnote pointing at the wrong, usually higher-authority, source.5 Citation hijacking is the adversarial exploitation of that same synthesis behaviour: an attacker publishes the same claim you own, wrapped in stronger sourcing signals (a fabricated dataset, a denser reference apparatus, a more authoritative-looking entity), so that when the model synthesises, their name is the one that survives the attribution step. You are not out-ranked; you are out-sourced, and the credit for your own finding is reassigned inside a forward pass you never observe.
2.4 · Entity-graph spoofing
Answer engines increasingly resolve who said something through a knowledge
graph — Wikidata, Wikipedia, schema sameAs links, and the model’s own learned
entity representations. Poisoning that resolution is a durable attack, because a
corrupted entity fact propagates into every answer that touches the entity. The
classes range from the crude (creating a near-duplicate entity to split your
citation share) to the subtle (editing a low-watch knowledge-base field so the
model resolves your name to the wrong affiliation, credential, or claim). This
class is slow but sticky: unlike a poisoned passage, which a re-crawl can
displace, a corrupted entity binding persists until the underlying graph is
corrected — which is precisely why claiming and hardening your own entity
node is a defensive act and not merely a vanity one.
2.5 · Contradiction injection
The mirror image of citation hijacking. Rather than steal your attribution, the adversary manufactures a competing claim engineered to trigger the model’s contradiction behaviour — the mode where, facing two colliding sources, the model defaults to the more institutionally defensible one and yours loses outright. In regulated verticals, where models are tuned to prefer authoritative sources, a well-sourced contradiction is disproportionately effective: it does not have to be true, only more defensible-looking than your version at the moment of synthesis. The defence is unglamorous and is the same across half this list — be the better-sourced side of every contradiction that matters to your funnel, and monitor for the ones you are losing.
2.6 · Training-data capture and IP leakage
Not every attack is against a live answer; some are against the corpus that
trains the next model. Two failure modes matter to a publisher. First,
unconsented training capture: your content absorbed into a training set
regardless of your stated policy, after which your framing can surface as the
model’s uncredited “common knowledge” with no retrievable source to point at —
influence fully divorced from credit, and permanently. Second, extraction:
the published literature has shown that production models can be induced to
regurgitate memorised training text,6 so proprietary material that
enters a training corpus is not safely abstracted away — under the right prompt
it can come back out. The control surface here is weak and mostly upstream
(licensing, Google-Extended/Applebot-Extended-style training opt-outs where
honoured, and the enforcement caveats of §2.7), which is exactly why the
policy layer — the artefacts the crawler-policy tool generates — matters even
though it is only a signal.
2.7 · Abuse of the crawler-trust gap
The final class is not a model attack at all — it is the enforcement gap
underneath every policy control in the previous six. Publishers express crawler
policy through robots.txt, llms.txt, and headers, and those artefacts assume
the crawler identifies itself honestly and obeys. In August 2025 that
assumption was publicly broken: Cloudflare documented that a major answer engine
continued to retrieve content from sites that had explicitly blocked its
declared crawlers, using an undeclared, browser-impersonating fetcher and
rotating IPs, and de-listed it as a verified bot in response.7 The
provider disputed the framing — arguing user-driven, on-demand fetching is
categorically different from bulk crawling — and that dispute is itself the
point: the policy layer is a request, not a fence. A threat model that
treated robots.txt/llms.txt as access control would be wrong; they are
declarations of intent whose enforcement depends on the counterparty’s
good faith and, failing that, on server-side controls (WAF rules, verified-bot
allowlists, rate limits) that live below the policy layer entirely.
§3 · Why the SEO defences do not transfer
Practitioners reach for the controls they know, and almost none of them apply.
Rank monitoring assumes a visible ranking. The grounding layer has no SERP. There is no position to watch, no “you dropped to page two” signal. The nearest equivalent — repeatedly probing answer engines and classifying what they repeat about your claims — is a monitoring discipline that has to be built, not a dashboard you already own. You cannot defend a position you cannot observe, and the position here is only observable by active probing.
The disavow-and-report reflex assumes a referee. SEO defence ultimately appeals to a platform that adjudicates spam and can be petitioned. The grounding layer offers no such counter: there is no “report this hallucinated citation” queue with an SLA, no disavow file for a poisoned passage, no mechanism to contest a mis-attribution inside a model’s synthesis. Redress, where it exists, is indirect — correct the corpus, correct the entity graph, out-source the contradiction — and slow.
Access control assumes enforcement, and §2.7 just dismantled that. The
robots.txt mental model — “I disallowed it, therefore it will not fetch me” —
is precisely the assumption the crawler-trust gap violates. Treating a policy
signal as a control is the single most common category error I see, and it is
dangerous because it produces a feeling of protection with none of the
substance.
The uncomfortable synthesis: the SEO defensive toolkit was built for a world with an observable ranking and an appealable referee, and the grounding layer has neither. The defences that transfer are the ones that were never really about ranking — provenance, entity hygiene, and the boring discipline of being the best-sourced version of every claim you care about.
§4 · The defender’s playbook
What follows is the subset of controls that (a) follow from the trust-boundary structure and (b) a publisher can execute without model-provider cooperation. None of them “secure” the grounding layer — that is not a thing a publisher can do — but together they raise the cost of every attack in §2 and shorten the time to detect the ones that land.
- Win the provenance contest on the claims that matter. Since retrieval selects a short passage set and synthesis rewards the more defensible source, the durable defence against poisoning, hijacking, and contradiction is the same move: be the most sourced, most quantified, most dated, most entity-anchored version of your own claim in the corpus. This is the editorial discipline the companion audit paper quantified as the attribution lever; here it doubles as the primary security control. Provenance is armour.
- Claim and harden your entity node. Against entity-graph spoofing (§2.4),
a clean, well-linked, self-consistent entity — a maintained Wikidata node,
coherent
sameAsgraph, consistent name/affiliation/credential across your properties — is what the model resolves to. An unclaimed or inconsistent entity is an open field for a duplicate or a corrupted fact. - Treat every field that accepts outside text as an injection surface. Against §2.2, sanitise and constrain user-generated content, syndicated blocks, and third-party embeds on your own domain, so your pages cannot become the vector that carries an injection to a reader through an answer engine.
- Put real enforcement under the policy layer. Ship
robots.txt/llms.txtas honest declarations — the crawler-policy tool exists for exactly this — but do not mistake them for a fence. Where crawler abuse is a real risk, the controls that actually bite live below the policy layer: verified-bot allowlists, WAF rules, and rate limits, as the Cloudflare case demonstrated. - Build the monitor you do not yet have. The single largest capability gap for every defender is detection: you cannot see a poisoned passage displace yours, a name get swapped, or a contradiction start winning, unless you are actively probing the answer engines on your high-value queries and classifying the results over time. A recurring probe-and-classify loop on your funnel’s trigger queries is the smoke detector for this entire threat model — and, not incidentally, the honest version of it requires the live-model access this site has flagged as an open dependency.
§5 · Objections, steelmanned
“This is fear-marketing for consulting.” The strongest version: threat models sell services, the attacks above are mostly demonstrated in labs rather than observed at commercial scale, and a publisher’s realistic risk is low relative to the effort of defending. I concede the scale point — I have deliberately not claimed a measured breach rate, because I do not have one — and I concede that for a broad, well-covered topic, authentic coverage is a real defence. What survives the objection is the structure: the attacks are cheap where it matters (thin, high-intent queries), the controls are things a serious publisher should do anyway (provenance, entity hygiene, monitoring), and “low prevalence today” is a weak reassurance for a surface whose economic value is rising. A threat model is insurance reasoning, not a claim that the house is currently on fire.
“The model providers will fix this.” Perhaps — indirect-prompt-injection defences, provenance signals, and retrieval hardening are active research, and some attacks in §2 will be mitigated provider-side. But the publisher’s dependency is exactly the problem: every control that lives on the provider’s side is one the publisher neither operates nor can verify, and the incentives are not fully aligned. Betting your defensibility on someone else’s roadmap is a business decision, not a security control.
“You are overstating the crawler-trust gap from a single dispute.” Fair — one documented case is not a base rate, and the parties disagreed about characterisation. But the structural claim needs only one existence proof: declared-crawler blocks can be evaded, therefore policy artefacts cannot be treated as enforcement. The frequency is an open empirical question; the architectural conclusion is not.
§6 · What this model still gets wrong
An honest threat model names its own gaps. Three matter.
First, it is unmeasured at the point that matters most. The prevalence and per-vertical base rates of every attack in §2 against production answer engines are exactly the numbers a defender most wants and this paper most conspicuously lacks. That is a deliberate honesty, not an omission I can wave away: closing it requires the controlled live-model probing that the companion protocol specifies, and until that runs, the risk ordering here is better-grounded than the risk magnitude.
Second, the boundary between “attack” and “aggressive optimisation” is genuinely fuzzy. Citation hijacking via denser provenance (§2.3) and winning a contradiction with better sourcing (§4.1) are, mechanically, the same move — one framed as offence, one as defence. A model that cannot cleanly separate adversarial from legitimate use of the same lever is telling you something true about the domain: on this surface, the best defence and the sharpest attack are often indistinguishable at the level of the artefact, and only intent differs.
Third, it is a snapshot of a fast-moving target. Retrieval architectures, grounding heuristics, provenance signalling, and crawler-enforcement norms are all in motion; a class that is cheap today may be closed by a provider next quarter, and a class I have under-weighted may dominate. The frame — assets, adversaries, trust boundaries — is durable; the specific cost of each attack is not. Treat the ordering as current-as-of-2026 and the method as the thing worth keeping.
The reason to publish an admittedly-incomplete threat model rather than wait for the measurements is the same reason security teams write threat models before the breach: the value is in naming the surface early enough to defend it, and in being specific enough to be wrong in public and corrected. The practitioner who reproduces one of these attack classes against a production engine and reports that my cost ordering is backwards will have done the field a more useful service than the one who nods along. The archive exists to be corrected.
References
- Greshake, K., Abdelnabi, S., Mishra, S., Endres, C., Holz, T., & Fritz, M. (2023). Not what you've signed up for: Compromising Real-World LLM-Integrated Applications with Indirect Prompt Injection. ACM AISec 2023. — The foundational indirect-prompt-injection paper; names the data-vs-instructions confusion behind §2.2 and answer-engine poisoning.
- Zou, W., Geng, R., Wang, B., & Jia, J. (2025). PoisonedRAG: Knowledge Corruption Attacks to Retrieval-Augmented Generation of Large Language Models. USENIX Security 2025. — Source of the passage-scale poisoning result (order of ~5 texts, high targeted-query success) anchoring §2.1.
- OWASP GenAI Security Project (2025). LLM01:2025 Prompt Injection. OWASP Top 10 for LLM Applications. — Prompt injection catalogued as the #1 generative-AI application risk.
- Cloudflare (2025). Perplexity is using stealth, undeclared crawlers to evade website no-crawl directives. The Cloudflare Blog, August 2025. — Existence proof for the crawler-trust gap in §2.7 — declared-crawler blocks were evaded across tens of thousands of domains.
- Carlini, N., Jagielski, M., Choquette-Choo, C. A., et al. (2024). Poisoning Web-Scale Training Datasets is Practical. IEEE S&P 2024. — Training-corpus poisoning is economically feasible — the upstream analogue of §2.1 and the basis for §2.6's capture concern.
- Nasr, M., Carlini, N., Hayase, J., et al. (2023). Scalable Extraction of Training Data from (Production) Language Models. arXiv preprint. — Memorised training text is extractable from production models — the IP-leakage corollary in §2.6.
- Liu, N. F., Lin, K., Hewitt, J., et al. (2024). Lost in the Middle: How Language Models Use Long Contexts. Transactions of the ACL, Volume 12. — Position effects in context use — why a retrieved passage's placement, not just its presence, shapes grounding.
- Ahrefs (2026). We Analyzed 137K Sites: 97% of llms.txt Files Never Get Read. Ahrefs blog, June 2026. — ~97% of llms.txt files across 137K domains received zero requests; the few fetches were led by GPTBot and Claude-Code, not answer-engine retrieval agents. The enforcement backdrop for §2.7 and the companion protocol.
- Sasson, G. (2026). A taxonomy of LLM citation behaviour across 14 frontier models. Algoholic, Vol. III, Essay 03. — The behaviour classes (silent absorption, mis-attribution, contradiction) that §2.3 and §2.5 weaponise.
- Sasson, G. (2026). Statement-level visibility, or: why ranking a page no longer matters. Algoholic, Vol. III, Essay 04. — The asset defended in §1 — the entity-to-claim binding — defined at length.
- Sasson, G. (2026). The AI Crawler Policy Generator. Algoholic tools. — The policy-artefact tool referenced in §4.4; ships robots.txt / llms.txt / ai.txt as honest declarations, not fences.
Footnotes
-
This matters for how the paper should be read and cited. Every quantified attack figure below is attributed to a named external study run by other researchers under their own methodology; the numbers are theirs, not a measurement I am claiming to have reproduced. The companion protocol — Does llms.txt do anything? — is written the same way, on purpose: a preregistered design published before the data exists, so the claims stay falsifiable and no reader mistakes a prediction for a result. ↩
-
“Answer-engine poisoning” is the practitioner name for indirect prompt injection aimed at the retrieval-and-grounding layer of public answer engines — Google’s AI Overviews, ChatGPT search, Perplexity, Bing Copilot — rather than at one private chatbot. The attacker publishes content engineered to be retrieved and cited, so the model repeats the attacker’s misinformation, scam details, or embedded instructions to whoever asks the triggering question. Unlike a classic SEO spam play, it does not need to rank; it needs to be retrievable. ↩
-
Zou, Geng, Wang & Jia, PoisonedRAG (USENIX Security 2025). The exact figures are setup-dependent — corpus, retriever, and target query all move them — and I cite the order of magnitude, not a universal constant. The load-bearing point for a defender is qualitative and robust across the poisoning literature: retrieval-stage attacks are passage-scale, not corpus-scale, which is what makes them cheap. ↩
-
Greshake et al., Not what you’ve signed up for: Compromising Real-World LLM-Integrated Applications with Indirect Prompt Injection (2023). The paper predates today’s answer engines but names the exact confusion — retrieved data treated as instructions — that answer-engine poisoning exploits at scale. ↩
-
Sasson, A taxonomy of LLM citation behaviour across 14 frontier models (Algoholic, Vol. III). Silent absorption and mis-attribution are described there as outcomes of ordinary synthesis; here they are the target of a deliberate attribution-reassignment attack. ↩
-
Nasr et al., Scalable Extraction of Training Data from (Production) Language Models (2023). The relevance to a publisher is not the headline attack but the corollary: content that enters a training corpus is not reliably “abstracted”; memorisation is real and extractable, so training capture is an IP question, not only an attribution one. ↩
-
Cloudflare, Perplexity is using stealth, undeclared crawlers to evade website no-crawl directives (blog.cloudflare.com, August 2025). Cited here for the structural fact it established — declared-crawler blocks can be evaded — not to adjudicate the parties’ competing characterisations. The defensive corollary stands regardless of who was right: policy artefacts are signals, and signals need enforcement underneath them. ↩
