Open data · CC BY 4.0 · 2026-Q2 cut

The data behind the claims.

Every empirical figure in the archive traces to an aggregate dataset, published here so you can check the work — not just the conclusion. Panel means across the fourteen-model panel; ±3-point uncertainty band from the claim-matching classifier. Full methodology →

CSVDownload ↓

retention-curves-2026q2.csv

Panel-mean citation retention by content type across days 0 / 7 / 14 / 30.

Backs: Statement-level visibility, Fig. 1

CSVDownload ↓

survival-multipliers-2026q2.csv

Per-property survival multipliers — quantification, source anchoring, entity specificity, qualifier proximity, and their compounds.

Backs: Statement-level visibility §6 · Taxonomy §6

MDDownload ↓

probe-protocol.md

The extract → probe → compare procedure, as a replicator's checklist.

Backs: All empirical papers

MDDownload ↓

extraction-rubric.md

How a document is decomposed into atomic statements — splitting rules, tagging schema, the slot test.

Backs: All empirical papers

Aggregates, not raw rows. Each cell summarises hundreds to thousands of probe runs. The numbers are panel means, weighted by model traffic share where known and uniformly otherwise.
±3-point uncertainty band. Headline figures inherit the claim-matching classifier's error rate (~93% precision / ~89% recall against human spot-checks, κ = 0.81).
Per-probe logs are request-only. The raw response logs are retained privately for one year and shared with serious replication efforts under a non-redistribution agreement, to respect the model providers' terms of service. Request them →
Corrections are credited. A failed replication or a spotted error is a research contribution — acknowledged by name in the next paper revision. Write →

Cite as

Sasson, G. (2026). Algoholic open data — statement-level visibility datasets, 2026-Q2 cut. https://algoholic.com/data