The vocabulary of the moment treats ranking, retrieval, and generation as near-synonyms — three words for “the thing the AI does.” They are not synonyms. They are three distinct operations, chained, and each fails in its own way. If you cannot say which one you are optimising, you are optimising none of them.
Three operations
- Ranking orders a known set of documents by predicted relevance. Classical search. The output is a list.
- Retrieval selects a subset of chunks from a corpus to place in a model’s context window. The output is a set of passages — and the passages are spans, not pages.
- Generation composes an answer conditioned on the retrieved context. The output is new text that may or may not faithfully represent any source.
A claim can win at retrieval and lose at generation: it gets pulled into context but paraphrased past attribution. It can win at generation and lose at ranking: the model loves your sentence but never surfaces the document in a list view. These are different battles.
Why the decomposition matters
Each operation has a different lever:
| Operation | Lever | Failure mode |
|---|---|---|
| Ranking | classical relevance + authority | buried in the list |
| Retrieval | chunkable, self-contained claims | survives selection but loses context |
| Generation | epistemic framing, attributability | reproduced without credit |
Most “GEO advice” targets ranking — the operation that matters least in a generative answer — because ranking is the one we know how to influence. The leverage has moved downstream, to retrieval and generation, where the field has almost no established craft. That gap is the opportunity.
A program
The rest of this volume builds instruments for the downstream operations: retrieval-survival tests, attribution-rate measurement, and reproducibility scoring across models. We begin from notation because the conflation is the error, and notation is how you stop conflating.
