Open data · CC BY 4.0 · 2026-Q2 cut
The data behind the claims.
Every empirical figure in the archive traces to an aggregate dataset, published here so you can check the work — not just the conclusion. Panel means across the fourteen-model panel; ±3-point uncertainty band from the claim-matching classifier. Full methodology →
CSVDownload ↓
retention-curves-2026q2.csv
Panel-mean citation retention by content type across days 0 / 7 / 14 / 30.
Backs: Statement-level visibility, Fig. 1
CSVDownload ↓
survival-multipliers-2026q2.csv
Per-property survival multipliers — quantification, source anchoring, entity specificity, qualifier proximity, and their compounds.
Backs: Statement-level visibility §6 · Taxonomy §6
MDDownload ↓
probe-protocol.md
The extract → probe → compare procedure, as a replicator's checklist.
Backs: All empirical papers
MDDownload ↓
extraction-rubric.md
How a document is decomposed into atomic statements — splitting rules, tagging schema, the slot test.
Backs: All empirical papers
- Aggregates, not raw rows. Each cell summarises hundreds to thousands of probe runs. The numbers are panel means, weighted by model traffic share where known and uniformly otherwise.
- ±3-point uncertainty band. Headline figures inherit the claim-matching classifier's error rate (~93% precision / ~89% recall against human spot-checks, κ = 0.81).
- Per-probe logs are request-only. The raw response logs are retained privately for one year and shared with serious replication efforts under a non-redistribution agreement, to respect the model providers' terms of service. Request them →
- Corrections are credited. A failed replication or a spotted error is a research contribution — acknowledged by name in the next paper revision. Write →
Cite as
Sasson, G. (2026). Algoholic open data — statement-level visibility datasets, 2026-Q2 cut. https://algoholic.com/data