ResAIKit
Research Integrity Toolkit
Back to the encyclopedia
S9Statistical analysisStatistical ConsistencyLayer 1 (Deterministic)

Benford Test (Stats)

Tests whether the leading digit of the numbers reported in an article follows Benford's Law, the pattern by which naturally occurring data spanning many orders of magnitude start with a 1 far more often than with a 9. Invented data often fail this pattern because people pick starting digits more evenly than nature does. The indicator measures the gap between the observed leading-digit distribution and Benford's expectation, but only when the data span enough orders of magnitude for the law to apply.

Technical description

A deterministic test of the first significant digit of the article's numbers against Benford's Law, which gives leading digit d the probability log10(1 + 1/d). Because the extractor pools every standalone number, S9 first removes the classes that are not Benford-distributed free measurements: non-positive and sub-1 values (p-values, alpha levels, proportions, correlations, which cluster below one), four-digit-year integers from 1900 to 2100, and percentages (a number immediately followed by a percent sign). It then applies a strict precondition: the surviving values must span at least two orders of magnitude (max/min >= 100) and number at least twenty, because the first-digit law only holds over a wide multiplicative range. When met, the shared Benford routine forms the observed proportions for digits 1 to 9 and computes the mean absolute deviation (MAD) from Benford's expectation and a chi-squared p-value, plus the second-digit MAD, the more robust digit test of Diekmann and Nigrini used to corroborate a first-digit non-conformity. The test runs only on documents classified as articles; otherwise a neutral score, and with no data or insufficient spread a neutral no-signal result.

How it works

Layer 1 (deterministic): the reported numbers are filtered to plausible measurements by dropping non-positive and sub-1 values, four-digit years, and percentages (detected from context when number positions are available). The spread of what remains is the ratio of largest to smallest; with fewer than twenty values or a spread below a hundredfold the indicator returns a neutral skip (reason insufficient_magnitude_span). Otherwise the first significant digit of each value is extracted and the MAD and chi-squared p-value against Benford are computed. Score by MAD band: <0.006 gives 0, 0.006-0.012 gives 1.0, 0.012-0.015 gives 2.5, >=0.015 gives 4.0; a chi-squared p<0.01 adds 1.0; when the first-digit MAD is already non-conforming (>=0.012) and the second-digit MAD also deviates (>=0.012) a further 0.5 is added (Diekmann, Nigrini); capped at 5.0. Findings draw examples from values leading with 7, 8, or 9. Metadata records mad, chi2_p, second_digit_mad, and numbers_analyzed.

Why this matters

Benford's Law is the best-known statistical fingerprint of naturally generated numbers, catalogued by Benford across data from river areas to physical constants. Its forensic value is the difficulty of faking it: people inventing figures spread leading digits too evenly, producing an excess of middle and high digits where nature declines steeply from 1. Nigrini turned this into an audit tool with the MAD conformity bands used here. Diekmann showed that first digits of regression coefficients can reveal fabrication but that the first-digit test is fragile and applies only to wide-ranging data, so naive use invites false alarms and missed cases. S9 follows that guidance by refusing to judge data spanning less than two orders of magnitude.

Score thresholds

0-1
Leading digits conform to Benford's Law within the close or acceptable bands.
2-3
A concerning deviation, with a mean absolute deviation between 0.012 and 0.015.
4-5
A non-conforming leading-digit distribution (MAD at or above 0.015), the upper end reached when the chi-squared test is also highly significant.

Limitations

Benford's first-digit law applies only to data from a wide multiplicative range, so the indicator stays silent unless the surviving values span at least two orders of magnitude, and even then a single article's numbers are a small, heterogeneous sample versus the large natural datasets where Benford holds cleanly. The filter removes non-positive and sub-1 values, four-digit years, and percentages, but other non-measurement values such as counts, sample sizes, degrees of freedom, and test statistics can remain and distort the leading-digit distribution. Excluding all sub-1 values also drops the occasional genuine sub-1 measurement (Benford is scale-invariant and would accept it), a deliberate trade that removes the clustered p-values and proportions dominating that range. The first-digit test is the weaker digit test, with the second digit often more diagnostic, so a deviation is a screening signal not proof. The conformity bands are Nigrini's first-digit thresholds and are directional. Runs only on documents classified as articles. The terminal-digit test on the same numbers is S8, and Benford analysis of individual-patient data is D22.

References

  1. Benford F. (1938). The law of anomalous numbers. Proceedings of the American Philosophical Society 78(4):551-572
  2. Diekmann A. (2007). Not the First Digit! Using Benford's Law to Detect Fraudulent Scientific Data. Journal of Applied Statistics 34(3):321-329
  3. Nigrini MJ. (2012). Benford's Law: Applications for Forensic Accounting, Auditing, and Fraud Detection. Wiley, Hoboken NJ
  4. Horton J, Krishnakumar D, Wood A. (2020). Detecting academic fraud using Benford law: The case of Professor James Hunton. Research Policy 49(8):104084
  5. Crone G, Green CD. (2025). Tools of the data detective: A review of statistical methods to detect data and result anomalies in psychology. Theory & Psychology 35(3):359-380
  6. Bordewijk EM, Li W, van Eekelen R, Wang R, Showell M, Mol BW, van Wely M. (2021). Methods to assess research misconduct in health-related research: A scoping review. Journal of Clinical Epidemiology 136:189-202
  7. Hunter KE, Aberoumand M, Libesman S, et al.. (2024). The Individual Participant Data Integrity Tool for assessing the integrity of randomised trials. Research Synthesis Methods 15(6):917-939
  8. Wilkinson J, Heal C, Antoniou GA, et al.. (2024). A survey of experts to identify methods to detect problematic studies: stage 1 of the INveStigating ProblEmatic Clinical Trials in Systematic Reviews project. Journal of Clinical Epidemiology 175:111512