Open data · CC BY 4.0 · 2026-Q2 cut

The data behind the claims.

Every empirical figure in the archive traces to an aggregate dataset, published here so you can check the work — not just the conclusion. Panel means across the fourteen-model panel; ±3-point uncertainty band from the claim-matching classifier. Full methodology →

  • Aggregates, not raw rows. Each cell summarises hundreds to thousands of probe runs. The numbers are panel means, weighted by model traffic share where known and uniformly otherwise.
  • ±3-point uncertainty band. Headline figures inherit the claim-matching classifier's error rate (~93% precision / ~89% recall against human spot-checks, κ = 0.81).
  • Per-probe logs are request-only. The raw response logs are retained privately for one year and shared with serious replication efforts under a non-redistribution agreement, to respect the model providers' terms of service. Request them →
  • Corrections are credited. A failed replication or a spotted error is a research contribution — acknowledged by name in the next paper revision. Write →
Cite as

Sasson, G. (2026). Algoholic open data — statement-level visibility datasets, 2026-Q2 cut. https://algoholic.com/data