The working archive · since 1999
Forty essays, twenty-seven years, one continuous thread.
From AltaVista keyword density to LLM citation behaviour. The full archive of published research, in reverse chronological order. Two pieces a week — one field note, one working paper. How this site is written →
Generative retrieval & statement-level visibility.
The substrate shifted. We rebuild the framework from the ground up — claim-level scoring, LLM citation behaviour, applied measurement.
Statement-level visibility, or: why ranking a page no longer matters.
The unit of competition has shifted from the page to the claim. We define statement-level visibility formally, propose a three-operation measurement framework (extract, probe, compare), and present 90-day data across 3,200 documents and 14 frontier models showing a structural durability gap between sourced, quantified statements and conventional commercial copy.
A taxonomy of LLM citation behavior across 14 frontier models.
What gets cited, what gets paraphrased, what disappears. A controlled audit across GPT-5 Pro, Claude Sonnet 4.6, Gemini 2.5 Pro, and eleven others, with results from 47,800 probe runs over 90 days. The variance is structural, not stochastic, and it points at a single actionable lever: epistemic framing of the claim.
GEO is not SEO with prompts. A position paper.
Most of what the industry currently calls Generative Engine Optimization is recycled SEO advice rebadged for a moment of vendor opportunism. Some of it is genuinely new. The position of this paper is that the difference is structural and operationally consequential: GEO is a measurement discipline on a substrate that operates below the page, while SEO was an authority-and-relevance discipline on a substrate that operated at the page. We taxonomise the continuities and the discontinuities, name the strongest counter-positions in the field, and stake out a defensible distinction the practice can stand on.
Ranking ≠ retrieval ≠ generation. A decomposition.
Three operations, routinely conflated, with three different failure modes. We separate them with notation, with worked examples, and with a per-operation audit of which SEO-era heuristics still apply, which have flipped sign, and which were always proxies for what we are now able to measure directly. The decomposition is the prerequisite for every measurement framework in the rest of the volume.
How LLMs read right-to-left: retrieval in Hebrew and Arabic.
Frontier transformer models do not stumble on niqqud or agglutination the way regex tokenisers did, but they inherit a deeper distributional disadvantage: roughly a thousand English documents per Hebrew one in the training corpus. We measure the RTL penalty empirically across fourteen models, decompose it into tokenization, embedding-density, and entity-grounding components, and show that a publisher with twenty years of Hebrew technical writing is structurally positioned to convert the disadvantage into an incumbency moat.
Entity disambiguation, for humans who share a name.
Search 'Gilad Sasson' and the index returns at least four people: an SEO consultant, a rabbinic scholar, a chemical-engineering researcher, and an eleven-year-old footballer. To a vector model, we occupy overlapping regions of space, and asked for a biography of any one of us it will average us together — confidently, fluently, wrongly. This note is about why hallucination of this kind is not random, why editorial cannot fix it, and what the structural fix actually looks like.
Chunking is the new pagination.
Before a model ever reads your page, a retrieval system has already cut it into pieces. Those pieces — chunks — are the real unit that gets embedded, searched, and pulled into context. If a claim and the qualifier that makes it true land in different chunks, the claim arrives at the model orphaned. This note is the editorial discipline that follows from that fact, with examples of what fails and what passes.
Reproducibility as a ranking signal.
Run the same query three times and a model will not give you the same answer three times — decoding is stochastic. But some claims survive the variance, reappearing run after run and paraphrase after paraphrase. Those claims are functionally more visible than the ones that flicker. Reproducibility, then, is a property worth measuring and worth engineering. This note defines the metric, explains why some claims reproduce and others don't, and proposes a workable per-claim reproducibility score that behaves — usefully — like a rank tracker for the generative era.
The transformer years.
BERT, MUM, helpful-content. The four-year interregnum when search learned to read — and most practitioners did not notice the shift had happened.
Panda, Penguin, BERT: a field guide to twenty years of correction.
Each major Google update corrected a specific exploit. Read in sequence — from a Tel Aviv office, over twenty-five years of practitioner work — they trace a single trajectory: away from manipulable surface signals, toward something much closer to comprehension. This field guide reads the updates as one continuous movement, because that movement is the only reliable predictor of what comes next. AI Overviews and generative retrieval are not a departure; they are its logical terminus.
The link graph is not the trust graph anymore.
PageRank's genius was a proxy: it could not measure trust directly, so it measured links, and links correlated with trust well enough to build an empire. The proxy was always an approximation, and approximations drift. A language model does not need the link proxy — it has co-occurrence, citation context, entity consistency, and the sheer statistical weight of how often a claim is repeated by sources it already trusts. The link graph and the trust graph were always different objects; for two decades they were close enough to conflate. This note is about why they are diverging now, and what to do about it.
The ten-year warm-up.
Pre-transformer search. Panda, Penguin, manual penalties, the link-graph era. Where the apprenticeship happened.