The working archive · since 1999

Forty essays, twenty-seven years, one continuous thread.

From AltaVista keyword density to LLM citation behaviour. The full archive of published research, in reverse chronological order. Two pieces a week — one field note, one working paper. How this site is written →

11
Published essays
3
Volumes
~700
In the wider archive
2/wk
Note + paper cadence
III
Volume III · current · 2025 → 2026

Generative retrieval & statement-level visibility.

The substrate shifted. We rebuild the framework from the ground up — claim-level scoring, LLM citation behaviour, applied measurement.

Current · 8 essays
Essay 04

Statement-level visibility, or: why ranking a page no longer matters.

The unit of competition has shifted from the page to the claim. We define statement-level visibility formally, propose a three-operation measurement framework (extract, probe, compare), and present 90-day data across 3,200 documents and 14 frontier models showing a structural durability gap between sourced, quantified statements and conventional commercial copy.

GEOretrievalmeasurement
May 2026
25 min
Working paper
★ this week
Essay 03

A taxonomy of LLM citation behavior across 14 frontier models.

What gets cited, what gets paraphrased, what disappears. A controlled audit across GPT-5 Pro, Claude Sonnet 4.6, Gemini 2.5 Pro, and eleven others, with results from 47,800 probe runs over 90 days. The variance is structural, not stochastic, and it points at a single actionable lever: epistemic framing of the claim.

AI SearchmethodologyGEO
May 2026
30 min
Working paper
Essay 06

GEO is not SEO with prompts. A position paper.

Most of what the industry currently calls Generative Engine Optimization is recycled SEO advice rebadged for a moment of vendor opportunism. Some of it is genuinely new. The position of this paper is that the difference is structural and operationally consequential: GEO is a measurement discipline on a substrate that operates below the page, while SEO was an authority-and-relevance discipline on a substrate that operated at the page. We taxonomise the continuities and the discontinuities, name the strongest counter-positions in the field, and stake out a defensible distinction the practice can stand on.

GEOpositionfoundations
May 2026
34 min
Working paper
Essay 01

Ranking ≠ retrieval ≠ generation. A decomposition.

Three operations, routinely conflated, with three different failure modes. We separate them with notation, with worked examples, and with a per-operation audit of which SEO-era heuristics still apply, which have flipped sign, and which were always proxies for what we are now able to measure directly. The decomposition is the prerequisite for every measurement framework in the rest of the volume.

foundationsresearchretrieval
Apr 2026
33 min
Working paper
Essay 05

How LLMs read right-to-left: retrieval in Hebrew and Arabic.

Frontier transformer models do not stumble on niqqud or agglutination the way regex tokenisers did, but they inherit a deeper distributional disadvantage: roughly a thousand English documents per Hebrew one in the training corpus. We measure the RTL penalty empirically across fourteen models, decompose it into tokenization, embedding-density, and entity-grounding components, and show that a publisher with twenty years of Hebrew technical writing is structurally positioned to convert the disadvantage into an incumbency moat.

RTLAI SearchHebrewtokenization
Apr 2026
31 min
Working paper
Essay 06

Entity disambiguation, for humans who share a name.

Search 'Gilad Sasson' and the index returns at least four people: an SEO consultant, a rabbinic scholar, a chemical-engineering researcher, and an eleven-year-old footballer. To a vector model, we occupy overlapping regions of space, and asked for a biography of any one of us it will average us together — confidently, fluently, wrongly. This note is about why hallucination of this kind is not random, why editorial cannot fix it, and what the structural fix actually looks like.

entitiesGEOschema
Apr 2026
6 min
Field note
Essay 07

Chunking is the new pagination.

Before a model ever reads your page, a retrieval system has already cut it into pieces. Those pieces — chunks — are the real unit that gets embedded, searched, and pulled into context. If a claim and the qualifier that makes it true land in different chunks, the claim arrives at the model orphaned. This note is the editorial discipline that follows from that fact, with examples of what fails and what passes.

retrievaltechnicalGEO
Mar 2026
7 min
Field note
Essay 08

Reproducibility as a ranking signal.

Run the same query three times and a model will not give you the same answer three times — decoding is stochastic. But some claims survive the variance, reappearing run after run and paraphrase after paraphrase. Those claims are functionally more visible than the ones that flicker. Reproducibility, then, is a property worth measuring and worth engineering. This note defines the metric, explains why some claims reproduce and others don't, and proposes a workable per-claim reproducibility score that behaves — usefully — like a rank tracker for the generative era.

methodologyGEOmeasurement
Mar 2026
6 min
Field note
II
Volume II · 2020 → 2024

The transformer years.

BERT, MUM, helpful-content. The four-year interregnum when search learned to read — and most practitioners did not notice the shift had happened.

Complete · 2 essays