A GEO threat model: the attack surface of machine-mediated citation.

Abstract

Search optimisation has always had an adversarial edge — link spam, cloaking, negative SEO — but the adversary was always trying to move a ranking. The substrate has changed underneath the fight. When a generative answer engine resolves a question, it does not return a ranked list for a human to judge; it retrieves a handful of passages, grounds a synthesised answer in them, and decides — inside a single forward pass the publisher never sees — which claim to repeat and whose name, if anyone’s, to attach. That decision is now an asset. This paper models it as one. I define the assets worth protecting (the entity-to-claim binding and the retrieval trust it depends on), the adversaries who benefit from corrupting it, and the trust boundaries where the corruption happens. I then enumerate seven attack classes against the retrieval-and-grounding layer, each tied to published work where published work exists — answer-engine and retrieval poisoning, indirect prompt injection through retrieved content, citation hijacking, entity-graph spoofing, contradiction injection, training-data capture, and abuse of the crawler-trust gap. The central argument is defensive and uncomfortable: the controls that protected a ranking do not transfer to this layer, the most effective attacks require no privileged access — only the ability to publish a document that gets retrieved — and a publisher’s realistic security posture is provenance, monitoring, and statement-level hygiene, plus an honest accounting of the surface no publisher controls.

Key claims the scannable version

GEO is an attack surface, not just a growth channel. The retrieval-and-grounding layer that decides which claim an answer engine repeats is a manipulable asset; treating it as pure marketing upside ignores that the same levers that earn you a citation can be used against you.
The cheapest effective attack needs no access — only publication. In the published RAG-poisoning literature, injecting roughly five crafted documents into a corpus of millions drives targeted-query attack success toward ~90%, because a retriever only has to retrieve the malicious passage, not rank it above everything else.
Indirect prompt injection is the #1 catalogued LLM risk, and retrieved web content is a delivery vehicle. OWASP lists prompt injection as LLM01:2025; any answer engine that grounds on live third-party pages can be fed instructions inside the content it retrieves.
Citation hijacking separates influence from credit. An attacker (or an unlucky competitor) can get your claim repeated with someone else’s name attached; the model’s synthesis step is where attribution is silently reassigned.
The crawler-trust gap is now a documented reality, not a hypothesis. The 2025 Cloudflare–Perplexity dispute showed declared crawlers can be blocked while undeclared fetchers retrieve the content anyway — so robots.txt and llms.txt are policy signals, not enforcement.
SEO’s defensive playbook does not transfer. Disavow files, spam reports, and rank monitoring assume a ranking you can observe and a referee you can appeal to; the grounding layer offers neither a visible ranking nor an appeals process.
The defensible position is provenance + monitoring + statement-level hygiene. You cannot firewall the model’s context window, but you can make your claims the most sourced, most dated, most entity-anchored version in the corpus, and you can watch for the moment a poisoned or hijacked variant displaces them.

For most of the twenty-seven years I have worked in search, security and marketing were different departments that met only at the incident review. The marketer wanted to be found; the security team wanted to not be breached; the two goals shared a domain name and little else. Generative retrieval collapses that separation. The mechanism that now decides whether your finding reaches a buyer — a model retrieving passages and grounding an answer in them — is the same mechanism an adversary manipulates to put words in the model’s mouth. The channel and the attack surface are the same surface. A field that has spent two years learning to optimise for machine-mediated citation has been, without quite noticing, learning to operate an asset it has not learned to defend.

This paper is the defender’s document I wanted and could not find. It is deliberately a threat model and not an audit: I am not reporting a measured breach rate across a panel of sites, because the honest measurement — probing production answer engines with controlled poison and watching what they repeat — requires live-model budget and a careful ethics boundary this piece does not pretend to have crossed.¹ What I can do rigorously is the structural work security actually runs on: name the assets, the adversaries, and the trust boundaries; enumerate the attack classes and anchor each to the published literature where it exists and to a clearly-labelled constructed scenario where it does not; and derive a control set that follows from the structure rather than from wishful thinking. The roadmap is conventional for a threat model: the frame (§1), the seven attack classes (§2), why the SEO defences fail to transfer (§3), the defender’s playbook (§4), three serious objections (§5), and an accounting of what the model still gets wrong (§6).

§1 · The frame: assets, adversaries, trust boundaries

A threat model is worth nothing until it says three things plainly: what is worth protecting, who benefits from harming it, and where control changes hands.

The asset is the entity-to-claim binding. In a ranked-link world the asset was a position: a URL sitting at rank one for a query, observable in a SERP, defended by relevance and links. In a generative-retrieval world the asset is subtler and more valuable — it is the binding between a claim and your name inside the model’s answer. When a user asks a buying-stage question and the engine answers “according to Sasson’s 2026 analysis, statement-level framing raises attribution roughly threefold,” the asset that just paid off was not a ranking; it was the model’s willingness to route a finding through your entity. Everything an adversary wants to do here reduces to attacking that binding: break it (your claim, no name), transfer it (your claim, their name), or poison it (a false claim, your name).

The adversaries are more varied than “spammers.” SEO trained the field to model one attacker — the low-quality operator gaming a ranking signal. The grounding layer has at least five, and they want different things:

The competitor wants the contradiction class: your claim surfaced and then beaten by theirs, or your name quietly swapped for theirs on a shared finding. Their goal is attribution share, and it is a zero-sum fight over the same buyer’s first machine-mediated impression.
The reputation attacker wants a false claim bound to your entity — a fabricated “known issue,” a manufactured controversy, a poisoned biographical fact — surfaced with enough sourcing that the model repeats it as grounded.
The scam / fraud operator wants the answer engine to emit their payload in a trusted voice: a fake support number, a malicious “official” download, a counterfeit policy. This is the class the published answer-engine-poisoning work targets directly.²
The data harvester wants your content in a training or retrieval corpus regardless of your stated policy — the crawler-trust gap, below.
The model-integrity researcher / state actor sits at the tail: interested in corrupting what a population of users believes about a topic at the retrieval layer, for which a commercial site is collateral rather than target.

The trust boundary is the retrieval-and-grounding step, and it is drawn in a place the publisher cannot see. In a classic web request the boundary is your server: you decide what to serve to whom. In generative retrieval the decisive boundary is inside the model’s pipeline — the moment untrusted, publicly authored text (yours, your competitor’s, an attacker’s) is pulled into the context window and treated, by a model with no reliable provenance signal, as material to ground an answer in.

Everything downstream of that boundary — which passage wins, whose name survives synthesis, whether an embedded instruction is obeyed — is executed on the model provider’s side, under their heuristics, with no ranking to inspect and no referee to appeal to. The publisher controls the input to that boundary and nothing past it. The entire defensive playbook in §4 is a consequence of that one fact.

§2 · The attack classes

Seven classes cover the surface. They are ordered roughly from cheapest and best-documented to most speculative.

2.1 · Retrieval / corpus poisoning

The best-documented attack, and the most alarming, needs no access to any model or index — only the ability to publish. The published PoisonedRAG work showed that injecting a small number of crafted documents into a retrieval corpus of millions can drive a targeted query toward attacker-chosen output at high success rates — on the order of five injected texts and ~90% success for a specific target question in their setup.³ The mechanism is the asymmetry at the heart of RAG security: an attacker does not need their document to be the best answer or the highest-ranked page. They need it to be among the handful retrieved for the trigger query, because once it is in the context window it competes for the model’s grounding on roughly even terms with far more authoritative material.

For a commercial publisher the defensive reading is specific: your exposure is highest on narrow, high-intent, thinly-covered queries — exactly the bottom-of-funnel questions where a citation is worth the most and where a handful of adversarial pages can dominate the retrievable set. Broad, well-covered topics are comparatively self-defending because authentic coverage crowds the retrieval slots; your defensible niche query is the soft target.

2.2 · Indirect prompt injection through retrieved content

Where poisoning corrupts which passage is grounded on, indirect prompt injection corrupts what the model does once the passage is in context. The foundational demonstration — Greshake and colleagues’ work on compromising LLM-integrated applications — showed that instructions hidden in third-party content an application retrieves can hijack the model’s behaviour without the attacker ever touching the user or the prompt.⁴ OWASP now catalogues prompt injection as LLM01:2025, the top entry in its generative-AI risk list. For an answer engine grounding on live web pages, the retrieved page is the untrusted third-party content: an instruction embedded in a page — visible text, an alt attribute, off-screen markup — can attempt to steer the synthesised answer, suppress a competitor, or emit an attacker’s payload, to the extent the provider’s defences fail to strip it.

The publisher-side lesson is double-edged. Defensively, your own pages should not be a vector — user-generated content, syndicated blocks, and third-party embeds on your domain can carry an injection that harms your readers when an engine grounds on your page. On the hygiene side, you cannot rely on the engine to sanitise perfectly, and you should assume any field where you accept outside text is a place an attacker can try to reach the model through you.

2.3 · Citation hijacking (attribution reassignment)

This is the class that turns the taxonomy of citation behaviour into a weapon. In the companion audit paper I characterised silent absorption — your claim repeated as fact with the source evaporated — and mis-attribution, where a verbatim lift carries a footnote pointing at the wrong, usually higher-authority, source.⁵ Citation hijacking is the adversarial exploitation of that same synthesis behaviour: an attacker publishes the same claim you own, wrapped in stronger sourcing signals (a fabricated dataset, a denser reference apparatus, a more authoritative-looking entity), so that when the model synthesises, their name is the one that survives the attribution step. You are not out-ranked; you are out-sourced, and the credit for your own finding is reassigned inside a forward pass you never observe.

2.4 · Entity-graph spoofing

Answer engines increasingly resolve who said something through a knowledge graph — Wikidata, Wikipedia, schema sameAs links, and the model’s own learned entity representations. Poisoning that resolution is a durable attack, because a corrupted entity fact propagates into every answer that touches the entity. The classes range from the crude (creating a near-duplicate entity to split your citation share) to the subtle (editing a low-watch knowledge-base field so the model resolves your name to the wrong affiliation, credential, or claim). This class is slow but sticky: unlike a poisoned passage, which a re-crawl can displace, a corrupted entity binding persists until the underlying graph is corrected — which is precisely why claiming and hardening your own entity node is a defensive act and not merely a vanity one.

2.5 · Contradiction injection

The mirror image of citation hijacking. Rather than steal your attribution, the adversary manufactures a competing claim engineered to trigger the model’s contradiction behaviour — the mode where, facing two colliding sources, the model defaults to the more institutionally defensible one and yours loses outright. In regulated verticals, where models are tuned to prefer authoritative sources, a well-sourced contradiction is disproportionately effective: it does not have to be true, only more defensible-looking than your version at the moment of synthesis. The defence is unglamorous and is the same across half this list — be the better-sourced side of every contradiction that matters to your funnel, and monitor for the ones you are losing.

2.6 · Training-data capture and IP leakage

Not every attack is against a live answer; some are against the corpus that trains the next model. Two failure modes matter to a publisher. First, unconsented training capture: your content absorbed into a training set regardless of your stated policy, after which your framing can surface as the model’s uncredited “common knowledge” with no retrievable source to point at — influence fully divorced from credit, and permanently. Second, extraction: the published literature has shown that production models can be induced to regurgitate memorised training text,⁶ so proprietary material that enters a training corpus is not safely abstracted away — under the right prompt it can come back out. The control surface here is weak and mostly upstream (licensing, Google-Extended/Applebot-Extended-style training opt-outs where honoured, and the enforcement caveats of §2.7), which is exactly why the policy layer — the artefacts the crawler-policy tool generates — matters even though it is only a signal.

2.7 · Abuse of the crawler-trust gap

The final class is not a model attack at all — it is the enforcement gap underneath every policy control in the previous six. Publishers express crawler policy through robots.txt, llms.txt, and headers, and those artefacts assume the crawler identifies itself honestly and obeys. In August 2025 that assumption was publicly broken: Cloudflare documented that a major answer engine continued to retrieve content from sites that had explicitly blocked its declared crawlers, using an undeclared, browser-impersonating fetcher and rotating IPs, and de-listed it as a verified bot in response.⁷ The provider disputed the framing — arguing user-driven, on-demand fetching is categorically different from bulk crawling — and that dispute is itself the point: the policy layer is a request, not a fence. A threat model that treated robots.txt/llms.txt as access control would be wrong; they are declarations of intent whose enforcement depends on the counterparty’s good faith and, failing that, on server-side controls (WAF rules, verified-bot allowlists, rate limits) that live below the policy layer entirely.

§3 · Why the SEO defences do not transfer

Practitioners reach for the controls they know, and almost none of them apply.

Rank monitoring assumes a visible ranking. The grounding layer has no SERP. There is no position to watch, no “you dropped to page two” signal. The nearest equivalent — repeatedly probing answer engines and classifying what they repeat about your claims — is a monitoring discipline that has to be built, not a dashboard you already own. You cannot defend a position you cannot observe, and the position here is only observable by active probing.

The disavow-and-report reflex assumes a referee. SEO defence ultimately appeals to a platform that adjudicates spam and can be petitioned. The grounding layer offers no such counter: there is no “report this hallucinated citation” queue with an SLA, no disavow file for a poisoned passage, no mechanism to contest a mis-attribution inside a model’s synthesis. Redress, where it exists, is indirect — correct the corpus, correct the entity graph, out-source the contradiction — and slow.

Access control assumes enforcement, and §2.7 just dismantled that. The robots.txt mental model — “I disallowed it, therefore it will not fetch me” — is precisely the assumption the crawler-trust gap violates. Treating a policy signal as a control is the single most common category error I see, and it is dangerous because it produces a feeling of protection with none of the substance.

The uncomfortable synthesis: the SEO defensive toolkit was built for a world with an observable ranking and an appealable referee, and the grounding layer has neither. The defences that transfer are the ones that were never really about ranking — provenance, entity hygiene, and the boring discipline of being the best-sourced version of every claim you care about.

§4 · The defender’s playbook

What follows is the subset of controls that (a) follow from the trust-boundary structure and (b) a publisher can execute without model-provider cooperation. None of them “secure” the grounding layer — that is not a thing a publisher can do — but together they raise the cost of every attack in §2 and shorten the time to detect the ones that land.

Win the provenance contest on the claims that matter. Since retrieval selects a short passage set and synthesis rewards the more defensible source, the durable defence against poisoning, hijacking, and contradiction is the same move: be the most sourced, most quantified, most dated, most entity-anchored version of your own claim in the corpus. This is the editorial discipline the companion audit paper quantified as the attribution lever; here it doubles as the primary security control. Provenance is armour.
Claim and harden your entity node. Against entity-graph spoofing (§2.4), a clean, well-linked, self-consistent entity — a maintained Wikidata node, coherent sameAs graph, consistent name/affiliation/credential across your properties — is what the model resolves to. An unclaimed or inconsistent entity is an open field for a duplicate or a corrupted fact.
Treat every field that accepts outside text as an injection surface. Against §2.2, sanitise and constrain user-generated content, syndicated blocks, and third-party embeds on your own domain, so your pages cannot become the vector that carries an injection to a reader through an answer engine.
Put real enforcement under the policy layer. Ship robots.txt/llms.txt as honest declarations — the crawler-policy tool exists for exactly this — but do not mistake them for a fence. Where crawler abuse is a real risk, the controls that actually bite live below the policy layer: verified-bot allowlists, WAF rules, and rate limits, as the Cloudflare case demonstrated.
Build the monitor you do not yet have. The single largest capability gap for every defender is detection: you cannot see a poisoned passage displace yours, a name get swapped, or a contradiction start winning, unless you are actively probing the answer engines on your high-value queries and classifying the results over time. A recurring probe-and-classify loop on your funnel’s trigger queries is the smoke detector for this entire threat model — and, not incidentally, the honest version of it requires the live-model access this site has flagged as an open dependency.
The monitoring control is the one place where the defence and the measurement of the threat are the same instrument. Until it runs on live models, every prevalence number in this space — including the ones I declined to invent above — stays an estimate. That is a limitation, stated plainly, not a hedge.

§5 · Objections, steelmanned

“This is fear-marketing for consulting.” The strongest version: threat models sell services, the attacks above are mostly demonstrated in labs rather than observed at commercial scale, and a publisher’s realistic risk is low relative to the effort of defending. I concede the scale point — I have deliberately not claimed a measured breach rate, because I do not have one — and I concede that for a broad, well-covered topic, authentic coverage is a real defence. What survives the objection is the structure: the attacks are cheap where it matters (thin, high-intent queries), the controls are things a serious publisher should do anyway (provenance, entity hygiene, monitoring), and “low prevalence today” is a weak reassurance for a surface whose economic value is rising. A threat model is insurance reasoning, not a claim that the house is currently on fire.

“The model providers will fix this.” Perhaps — indirect-prompt-injection defences, provenance signals, and retrieval hardening are active research, and some attacks in §2 will be mitigated provider-side. But the publisher’s dependency is exactly the problem: every control that lives on the provider’s side is one the publisher neither operates nor can verify, and the incentives are not fully aligned. Betting your defensibility on someone else’s roadmap is a business decision, not a security control.

“You are overstating the crawler-trust gap from a single dispute.” Fair — one documented case is not a base rate, and the parties disagreed about characterisation. But the structural claim needs only one existence proof: declared-crawler blocks can be evaded, therefore policy artefacts cannot be treated as enforcement. The frequency is an open empirical question; the architectural conclusion is not.

§6 · What this model still gets wrong

An honest threat model names its own gaps. Three matter.

First, it is unmeasured at the point that matters most. The prevalence and per-vertical base rates of every attack in §2 against production answer engines are exactly the numbers a defender most wants and this paper most conspicuously lacks. That is a deliberate honesty, not an omission I can wave away: closing it requires the controlled live-model probing that the companion protocol specifies, and until that runs, the risk ordering here is better-grounded than the risk magnitude.

Second, the boundary between “attack” and “aggressive optimisation” is genuinely fuzzy. Citation hijacking via denser provenance (§2.3) and winning a contradiction with better sourcing (§4.1) are, mechanically, the same move — one framed as offence, one as defence. A model that cannot cleanly separate adversarial from legitimate use of the same lever is telling you something true about the domain: on this surface, the best defence and the sharpest attack are often indistinguishable at the level of the artefact, and only intent differs.

Third, it is a snapshot of a fast-moving target. Retrieval architectures, grounding heuristics, provenance signalling, and crawler-enforcement norms are all in motion; a class that is cheap today may be closed by a provider next quarter, and a class I have under-weighted may dominate. The frame — assets, adversaries, trust boundaries — is durable; the specific cost of each attack is not. Treat the ordering as current-as-of-2026 and the method as the thing worth keeping.

The reason to publish an admittedly-incomplete threat model rather than wait for the measurements is the same reason security teams write threat models before the breach: the value is in naming the surface early enough to defend it, and in being specific enough to be wrong in public and corrected. The practitioner who reproduces one of these attack classes against a production engine and reports that my cost ordering is backwards will have done the field a more useful service than the one who nods along. The archive exists to be corrected.

In summary the eight points to remember

GEO is an attack surface, not just a growth channel — the entity-to-claim binding an answer engine produces is a manipulable asset, and the same levers that earn a citation can be turned against you.
The cheapest effective attack needs no access, only publication — passage-scale poisoning (single-digit crafted documents in the published literature) can bias a thin-coverage query because a retriever only has to retrieve the passage, not rank it first.
Indirect prompt injection is the top catalogued LLM risk — OWASP LLM01:2025 — and any engine grounding on live third-party pages can be fed instructions inside retrieved content.
Citation hijacking separates influence from credit — an attacker can get your claim repeated under someone else’s name by winning the synthesis step’s attribution slot with denser provenance.
The crawler-trust gap is documented, not hypothetical — the 2025 Cloudflare–Perplexity dispute established that declared-crawler blocks can be evaded, so robots.txt/llms.txt are signals, not fences.
SEO’s defences do not transfer — there is no visible ranking to monitor, no referee to appeal to, and no enforcement under the policy layer.
The defensible position is provenance + entity hygiene + monitoring — be the best-sourced version of every claim that matters, harden your entity node, and build the probe-and-classify loop that lets you see an attack land.

References

Greshake, K., Abdelnabi, S., Mishra, S., Endres, C., Holz, T., & Fritz, M. (2023). Not what you've signed up for: Compromising Real-World LLM-Integrated Applications with Indirect Prompt Injection. ACM AISec 2023. — The foundational indirect-prompt-injection paper; names the data-vs-instructions confusion behind §2.2 and answer-engine poisoning.
Zou, W., Geng, R., Wang, B., & Jia, J. (2025). PoisonedRAG: Knowledge Corruption Attacks to Retrieval-Augmented Generation of Large Language Models. USENIX Security 2025. — Source of the passage-scale poisoning result (order of ~5 texts, high targeted-query success) anchoring §2.1.
OWASP GenAI Security Project (2025). LLM01:2025 Prompt Injection. OWASP Top 10 for LLM Applications. — Prompt injection catalogued as the #1 generative-AI application risk.
Cloudflare (2025). Perplexity is using stealth, undeclared crawlers to evade website no-crawl directives. The Cloudflare Blog, August 2025. — Existence proof for the crawler-trust gap in §2.7 — declared-crawler blocks were evaded across tens of thousands of domains.
Carlini, N., Jagielski, M., Choquette-Choo, C. A., et al. (2024). Poisoning Web-Scale Training Datasets is Practical. IEEE S&P 2024. — Training-corpus poisoning is economically feasible — the upstream analogue of §2.1 and the basis for §2.6's capture concern.
Nasr, M., Carlini, N., Hayase, J., et al. (2023). Scalable Extraction of Training Data from (Production) Language Models. arXiv preprint. — Memorised training text is extractable from production models — the IP-leakage corollary in §2.6.
Liu, N. F., Lin, K., Hewitt, J., et al. (2024). Lost in the Middle: How Language Models Use Long Contexts. Transactions of the ACL, Volume 12. — Position effects in context use — why a retrieved passage's placement, not just its presence, shapes grounding.
Ahrefs (2026). We Analyzed 137K Sites: 97% of llms.txt Files Never Get Read. Ahrefs blog, June 2026. — ~97% of llms.txt files across 137K domains received zero requests; the few fetches were led by GPTBot and Claude-Code, not answer-engine retrieval agents. The enforcement backdrop for §2.7 and the companion protocol.
Sasson, G. (2026). A taxonomy of LLM citation behaviour across 14 frontier models. Algoholic, Vol. III, Essay 03. — The behaviour classes (silent absorption, mis-attribution, contradiction) that §2.3 and §2.5 weaponise.
Sasson, G. (2026). Statement-level visibility, or: why ranking a page no longer matters. Algoholic, Vol. III, Essay 04. — The asset defended in §1 — the entity-to-claim binding — defined at length.
Sasson, G. (2026). The AI Crawler Policy Generator. Algoholic tools. — The policy-artefact tool referenced in §4.4; ships robots.txt / llms.txt / ai.txt as honest declarations, not fences.

This matters for how the paper should be read and cited. Every quantified attack figure below is attributed to a named external study run by other researchers under their own methodology; the numbers are theirs, not a measurement I am claiming to have reproduced. The companion protocol — Does llms.txt do anything? — is written the same way, on purpose: a preregistered design published before the data exists, so the claims stay falsifiable and no reader mistakes a prediction for a result. ↩
“Answer-engine poisoning” is the practitioner name for indirect prompt injection aimed at the retrieval-and-grounding layer of public answer engines — Google’s AI Overviews, ChatGPT search, Perplexity, Bing Copilot — rather than at one private chatbot. The attacker publishes content engineered to be retrieved and cited, so the model repeats the attacker’s misinformation, scam details, or embedded instructions to whoever asks the triggering question. Unlike a classic SEO spam play, it does not need to rank; it needs to be retrievable. ↩
Zou, Geng, Wang & Jia, PoisonedRAG (USENIX Security 2025). The exact figures are setup-dependent — corpus, retriever, and target query all move them — and I cite the order of magnitude, not a universal constant. The load-bearing point for a defender is qualitative and robust across the poisoning literature: retrieval-stage attacks are passage-scale, not corpus-scale, which is what makes them cheap. ↩
Greshake et al., Not what you’ve signed up for: Compromising Real-World LLM-Integrated Applications with Indirect Prompt Injection (2023). The paper predates today’s answer engines but names the exact confusion — retrieved data treated as instructions — that answer-engine poisoning exploits at scale. ↩
Sasson, A taxonomy of LLM citation behaviour across 14 frontier models (Algoholic, Vol. III). Silent absorption and mis-attribution are described there as outcomes of ordinary synthesis; here they are the target of a deliberate attribution-reassignment attack. ↩
Nasr et al., Scalable Extraction of Training Data from (Production) Language Models (2023). The relevance to a publisher is not the headline attack but the corollary: content that enters a training corpus is not reliably “abstracted”; memorisation is real and extractable, so training capture is an IP question, not only an attribution one. ↩
Cloudflare, Perplexity is using stealth, undeclared crawlers to evade website no-crawl directives (blog.cloudflare.com, August 2025). Cited here for the structural fact it established — declared-crawler blocks can be evaded — not to adjudicate the parties’ competing characterisations. The defensive corollary stands regardless of who was right: policy artefacts are signals, and signals need enforcement underneath them. ↩

Gilad Sasson

aka Algoholic · גלעד ששון

Gilad Sasson, also known as Algoholic, is an Israeli digital marketing expert, founder & CEO of nekuda Web Solutions, and a pioneer in search engine optimization and data analytics since 1999. Head of internet & search at Zap Group 2002–2006; CMO at Interlogic 2006–2009. Speaker at SMX Israel, TNW Amsterdam, Web Summit Dublin, DMIEXPO.

LinkedIn @algoholic Work with me →