Chunking is the new pagination.

Before a model ever reads your page, a retrieval system has already cut it into pieces. Those pieces — chunks — are the real unit that gets embedded, searched, and pulled into context. The whole document is gone by the time the language model sees it. What arrives is a handful of spans, often 256 to 1,024 tokens each, with no awareness of what surrounded them on the original page.¹

If a claim and the evidence that supports it land in different chunks, the claim arrives at the model orphaned — or it does not arrive at all. This is the single most underrated editorial constraint of the generative-retrieval era, and there is no precedent for it in classical SEO. Pagination used to be about user patience. Chunking is about epistemic completeness per fragment: each retrievable span has to stand on its own, because the system that selects it has no way to know what the next span was going to say.

What the chunker actually does

A retrieval pipeline takes your document, splits it, embeds each chunk into a vector, and stores those vectors in an index. When a query comes in, the system embeds the query the same way and pulls the top-k chunks by cosine similarity. Those chunks — three to ten of them, usually — are what the model sees.

The splitter is mechanical, not editorial. The two most common strategies in 2026 production use are:

Fixed-size splitting with character or token caps and configurable overlap. Fastest, dumbest, most common. Will cut through a sentence if the sentence happens to span the boundary.
Recursive splitting by structural delimiters — paragraphs, then sentences, then words — preferring “natural” breaks. Better, but still greedy: it will break a paragraph at the last sentence boundary that fits the budget, even if that splits the claim from its qualifier.

A semantic splitter that respects argument structure exists in the research literature² and ships in some enterprise vendors, but the long tail of production RAG pipelines is running one of the two mechanical strategies. You should write for the mechanical case, because that is what most readers’ infrastructure will use.

The three failure modes

Across the audit dataset described in the statement-level-visibility paper, chunked-against-claim failures sorted into three patterns. They are not exotic; they are the default behavior of well-written marketing copy interacting with a 1,000-character splitter.

1. The orphaned qualifier. The claim sits at the end of paragraph N; the qualifier that makes it true (the time bound, the population sampled, the caveat) sits at the start of paragraph N+1. The splitter cuts between them. The model retrieves the claim without the qualifier, and one of two things happens next: it surfaces the unqualified claim and gets it wrong, or it hedges so hard the claim becomes useless.

2. The lost antecedent. The claim contains “this” or “that” or “the study” referring to something defined in an earlier paragraph. The earlier paragraph is in a different chunk. The model retrieves a confident statement about something it cannot resolve. Usually it absorbs the claim as ambient fact and drops the citation.

3. The split table. Tables and figures are the worst chunking surface in current systems. Most splitters treat them as continuous text. A five-row comparison table gets cut after row three, and the model retrieves half of a comparison and presents it as the whole.

Editorial pattern	Survives	Failure mode
Claim + qualifier in the same sentence	94%	rare
Claim + qualifier in adjacent sentences, same paragraph	87%	occasional orphaning
Claim + qualifier in adjacent paragraphs	61%	systematic orphaning
Claim refers to an antecedent two paragraphs back	43%	lost antecedent
Claim depends on a table for context	28%	split table

Fig. 1. Chunk-survival rates by editorial pattern, from the 47,800-probe audit. 'Survives' means the claim and its qualifier landed in the same retrieved chunk in at least 80% of probes.

The editorial discipline that fixes it

Three rules, in priority order, that move a document from the bottom rows of that table to the top:

1. Front-load the claim and its qualifier into the same sentence. Not the same paragraph, the same sentence. Where you used to write “We saw a 38% improvement. The test ran across three regions over six weeks,” you now write “Across three regions over six weeks of testing, we saw a 38% improvement.” The qualifier travels with the claim instead of next to it.

2. Resolve pronouns and references in the same chunk they appear. If a sentence uses “this study,” the paragraph it sits in has to name the study. If a paragraph uses “as discussed above,” the discussion has to be in the same chunk. The discipline reads, at first, as redundant. It is not; it is chunk-local self-containment, and the model rewards it.

3. Treat tables as atomic units. Either keep tables under the chunker’s budget (usually ~800 tokens including the header and caption) or pre-render them as separate, self-contained chunks with their own captions. A table whose caption restates the comparison being made survives splitting better than a table whose meaning depends on the prose above it.

A note on what to do with existing content

The temptation, given the above, is to rewrite everything. Don’t. The revision priority is the same priority the visibility framework uses: high-value claims first. For any document already published, run the audit once — extract the claims, identify which ones currently span a chunk boundary under your most likely splitter, and rewrite those ten or twenty sentences. That is 80% of the lift at 5% of the work, and the structural compounding from the rewritten claims will outpace any new content you would have written in the same time.

The page is no longer the artifact. The chunk is. Write so that any single chunk, read alone, still tells the truth.³

References

Karpukhin, V., Oğuz, B., Min, S., et al. (2020). Dense Passage Retrieval for Open-Domain Question Answering. EMNLP 2020. — The DPR paper — the retrieval primitive that makes chunk boundaries load-bearing.
Liu, N. F., Lin, K., Hewitt, J., et al. (2024). Lost in the Middle: How Language Models Use Long Contexts. Transactions of the ACL, Volume 12. — Reinforces the front-load discipline: models under-weight the middle of any chunk they do retrieve.
Lewis, P., Perez, E., Piktus, A., et al. (2020). Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks. NeurIPS 2020. — RAG architecture — the pipeline this note is editorial-side advice for.
Kamradt, G. (2024). 5 Levels of Text Splitting for Retrieval. Greg Kamradt / Data Independent, video and notebook. — Practical taxonomy of chunking strategies from fixed-size to semantic.
LangChain (2024). RecursiveCharacterTextSplitter documentation. LangChain docs. — Reference implementation for the default chunker most production RAG pipelines inherit.
Sasson, G. (2026). Statement-level visibility, or: why ranking a page no longer matters. Algoholic, Vol. III, Essay 04. — The audit dataset cited in §3.
Sasson, G. (2026). Ranking ≠ retrieval ≠ generation. A decomposition. Algoholic, Vol. III, Essay 01. — Chunking is the operation that 'retrieval' performs on your document; the decomposition that frames it.

The 256–1,024 range is the operating window most production RAG systems sit in as of mid-2026. LangChain’s default RecursiveCharacterTextSplitter ships at 1,000 characters; LlamaIndex defaults to 1,024 tokens with 200-token overlap; Pinecone’s reference implementations cluster around 512. The exact number matters less than the fact that some boundary will fall through your paragraph whether you place it deliberately or not. ↩
See, e.g., Greg Kamradt’s SemanticChunker and the line of research on discourse-aware segmentation. The premise — split where the topic shifts, not where the token budget runs out — is correct; the production adoption is not yet there. Write for the chunker you have, not the chunker you want. ↩
This is not new advice for anyone who has ever written for the inverted-pyramid of news, or for the lede-paragraph discipline of long-form editorial. The novelty is that the model is now the reader you are disciplining for, and the model has no patience for “I will explain that in a moment.” It will not be there in a moment. ↩

Gilad Sasson

aka Algoholic · גלעד ששון

Gilad Sasson, also known as Algoholic, is an Israeli digital marketing expert, founder & CEO of nekuda Web Solutions, and a pioneer in search engine optimization and data analytics since 1999. Head of internet & search at Zap Group 2002–2006; CMO at Interlogic 2006–2009. Speaker at SMX Israel, TNW Amsterdam, Web Summit Dublin, DMIEXPO.

LinkedIn @algoholic Work with me →

What the chunker actually does

The three failure modes

The editorial discipline that fixes it

A note on what to do with existing content

References

Footnotes

Gilad Sasson

More from the working archive

Statement-level visibility, or: why ranking a page no longer matters.

Does llms.txt do anything? A preregistered efficacy protocol.

A GEO threat model: the attack surface of machine-mediated citation.