Statistical Hallucination
Detects statistics that appear fabricated, such as percentages that do not add up to 100%, impossible p-values, or numerical claims that contradict each other within the same text.
Technical description
Extracts all numerical claims and statistical assertions from text and checks for internal consistency: percentages in lists summing to values other than 100%, contradictory numerical claims across sections, statistically impossible values (p-values > 1 or negative, correlation coefficients > 1), sample sizes that change unexpectedly, and effect sizes inconsistent with reported test statistics.
How it works
Layer 1 (deterministic): Extracts all numbers and their contexts using patterns. Identifies percentage lists and checks sums. Validates statistical values against mathematical constraints (0 <= p <= 1, -1 <= r <= 1). Cross-references sample sizes across sections. Checks arithmetic relationships between reported means, SDs, and test statistics.
Why this matters
AI models generate plausible-sounding but mathematically impossible statistics because they do not perform actual calculations, they predict likely-looking numbers. This leads to percentages that do not sum correctly, impossible p-values, and internally contradictory numerical claims. These hallucinated statistics are dangerous because they look superficially credible.
Score thresholds
- 0-1
- All statistics are internally consistent
- 2-3
- Minor numerical inconsistencies detected
- 4-5
- Multiple impossible or contradictory statistics found
Limitations
Some apparent inconsistencies may be due to rounding in the original publication. Different subsample analyses may legitimately yield different numbers. Complex statistical procedures may produce values that appear inconsistent without full context.
References
- Nuijten MB, Hartgerink CHJ, van Assen MALM, Epskamp S, Wicherts JM. (2016). The prevalence of statistical reporting errors in psychology (1985-2013). Behavior Research Methods
- Brown NJL, Heathers JAJ. (2017). The GRIM test: a simple technique detects numerous anomalies in the reporting of results in psychology. Social Psychological and Personality Science
- Crone G, Green CD. (2025). Tools of the data detective: a review of statistical methods to detect data and result anomalies in psychology. Theory & Psychology
- Head ML, Holman L, Lanfear R, Kahn AT, Jennions MD. (2015). The extent and consequences of p-hacking in science. PLOS Biology
- Barabesi L, Cerioli A, Cerasa A, Perrotta D. (2025). Robust inference under Benford's law. arXiv preprint arXiv:2507.08650
- Huang S, Peng Y, Qu L. (2026). TAB-AUDIT: detecting AI-fabricated scientific tables via multi-view likelihood mismatch. arXiv preprint arXiv:2603.19712