The working archive · since 1999

Forty essays, twenty-seven years, one continuous thread.

From AltaVista keyword density to LLM citation behaviour. The full archive of published research, in reverse chronological order. Two pieces a week — one field note, one working paper. How this site is written →

Published essays

Volumes

~700

In the wider archive

2/wk

Note + paper cadence

III

Volume III · current · 2025 → 2026

Generative retrieval & statement-level visibility.

The substrate shifted. We rebuild the framework from the ground up — claim-level scoring, LLM citation behaviour, applied measurement.

Current · 10 essays

Essay 12

Does llms.txt do anything? A preregistered efficacy protocol.

Over 844,000 sites have shipped an llms.txt file, and an independent 137K-site log study found ~97% of them are never read — Google has said it will not support it, and where the file is fetched at all it is mostly by training and coding agents, not the answer engines that shape citations. That is a strong prior, not a verdict. This note does the thing the debate is missing: it preregisters the controlled test that would actually settle whether llms.txt changes model behaviour, states the hypotheses and the falsification conditions in advance, and publishes the a-priori prediction before the data exists — because the live-model probing the test requires is a stated, unmet dependency, and a prediction dressed as a result is exactly the dishonesty this archive exists to avoid.

GEOAI Searchllms.txtmethodology

A GEO threat model: the attack surface of machine-mediated citation.

When a language model decides which claim to repeat and whose name to attach, that decision becomes an asset worth attacking. This paper treats generative-engine optimisation as a security problem rather than a marketing one: it enumerates the adversaries, the trust boundaries, and the seven attack classes against the retrieval-and-grounding layer that now mediates a user's first exposure to a topic — from answer-engine poisoning (five injected documents, ~90% success in the published literature) to citation hijacking, entity-graph spoofing, and the crawler-trust gap that the Cloudflare–Perplexity dispute made public. The controls that defended a page ranking do not transfer. The defensible position is provenance, monitoring, and statement-level discipline — and a frank admission of what a publisher cannot control at all.

cyberGEOAI Searchthreat model

Statement-level visibility, or: why ranking a page no longer matters.

The unit of competition has shifted from the page to the claim. We define statement-level visibility formally, propose a three-operation measurement framework (extract, probe, compare), and present 90-day data across 3,200 documents and 14 frontier models showing a structural durability gap between sourced, quantified statements and conventional commercial copy.

GEOretrievalmeasurement

A taxonomy of LLM citation behavior across 14 frontier models.

What gets cited, what gets paraphrased, what disappears. A controlled audit across GPT-5 Pro, Claude Sonnet 4.6, Gemini 2.5 Pro, and eleven others, with results from 47,800 probe runs over 90 days. The variance is structural, not stochastic, and it points at a single actionable lever: epistemic framing of the claim.

AI SearchmethodologyGEO

GEO is not SEO with prompts. A position paper.

Most of what the industry currently calls Generative Engine Optimization is recycled SEO advice rebadged for a moment of vendor opportunism. Some of it is genuinely new. The position of this paper is that the difference is structural and operationally consequential: GEO is a measurement discipline on a substrate that operates below the page, while SEO was an authority-and-relevance discipline on a substrate that operated at the page. We taxonomise the continuities and the discontinuities, name the strongest counter-positions in the field, and stake out a defensible distinction the practice can stand on.

GEOpositionfoundations

Ranking ≠ retrieval ≠ generation. A decomposition.

Three operations, routinely conflated, with three different failure modes. We separate them with notation, with worked examples, and with a per-operation audit of which SEO-era heuristics still apply, which have flipped sign, and which were always proxies for what we are now able to measure directly. The decomposition is the prerequisite for every measurement framework in the rest of the volume.

foundationsresearchretrieval

How LLMs read right-to-left: retrieval in Hebrew and Arabic.

Frontier transformer models do not stumble on niqqud or agglutination the way regex tokenisers did, but they inherit a deeper distributional disadvantage: roughly a thousand English documents per Hebrew one in the training corpus. We measure the RTL penalty empirically across fourteen models, decompose it into tokenization, embedding-density, and entity-grounding components, and show that a publisher with twenty years of Hebrew technical writing is structurally positioned to convert the disadvantage into an incumbency moat.

RTLAI SearchHebrewtokenization

Entity disambiguation, for humans who share a name.

Search 'Gilad Sasson' and the index returns at least four people: an SEO consultant, a rabbinic scholar, a chemical-engineering researcher, and an eleven-year-old footballer. To a vector model, we occupy overlapping regions of space, and asked for a biography of any one of us it will average us together — confidently, fluently, wrongly. This note is about why hallucination of this kind is not random, why editorial cannot fix it, and what the structural fix actually looks like.

Chunking is the new pagination.

Before a model ever reads your page, a retrieval system has already cut it into pieces. Those pieces — chunks — are the real unit that gets embedded, searched, and pulled into context. If a claim and the qualifier that makes it true land in different chunks, the claim arrives at the model orphaned. This note is the editorial discipline that follows from that fact, with examples of what fails and what passes.

retrievaltechnicalGEO

Reproducibility as a ranking signal.

Run the same query three times and a model will not give you the same answer three times — decoding is stochastic. But some claims survive the variance, reappearing run after run and paraphrase after paraphrase. Those claims are functionally more visible than the ones that flicker. Reproducibility, then, is a property worth measuring and worth engineering. This note defines the metric, explains why some claims reproduce and others don't, and proposes a workable per-claim reproducibility score that behaves — usefully — like a rank tracker for the generative era.

methodologyGEOmeasurement

Mar 2026

6 min

Field note

Volume II · 2020 → 2024

The transformer years.

BERT, MUM, helpful-content. The four-year interregnum when search learned to read — and most practitioners did not notice the shift had happened.

Complete · 2 essays

Essay 02

Panda, Penguin, BERT: a field guide to twenty years of correction.

Each major Google update corrected a specific exploit. Read in sequence — from a Tel Aviv office, over twenty-five years of practitioner work — they trace a single trajectory: away from manipulable surface signals, toward something much closer to comprehension. This field guide reads the updates as one continuous movement, because that movement is the only reliable predictor of what comes next. AI Overviews and generative retrieval are not a departure; they are its logical terminus.

algorithmshistoryfoundations

The link graph is not the trust graph anymore.

PageRank's genius was a proxy: it could not measure trust directly, so it measured links, and links correlated with trust well enough to build an empire. The proxy was always an approximation, and approximations drift. A language model does not need the link proxy — it has co-occurrence, citation context, entity consistency, and the sheer statistical weight of how often a claim is repeated by sources it already trusts. The link graph and the trust graph were always different objects; for two decades they were close enough to conflate. This note is about why they are diverging now, and what to do about it.

Volume I · 2009 → 2019

The ten-year warm-up.

Pre-transformer search. Panda, Penguin, manual penalties, the link-graph era. Where the apprenticeship happened.

Complete · 1 essays

Essay 10

What optimising for AltaVista taught me about LLMs.

Before PageRank swallowed the index, ranking was about presence and proximity — a five-engine market in which each engine weighted text differently and no link economy had yet arrived to launder weak pages into strong rankings. A quarter-century later, some of those discarded instincts are suddenly useful again. This is the memoir of a beginning, told for what it transferred and for what it didn't.

historyfoundationsmemoir

Aug 2011

26 min

Working paper

Showing 13 published · 3 volumes · 1999–2026 · two new every week