ResAIKit
Research Integrity Toolkit
Back to the encyclopedia
D22Statistical analysisFabrication DetectionLayer 1 (Deterministic)

Benford IPD (1st + 2nd Digit)

Applies Benford's Law to the raw participant-level numbers, checking both the first and the second significant digit. In data spanning a wide range, the leading digit is a 1 about thirty percent of the time and falls off logarithmically, and the second digit follows its own gentler version. Fabricated data often fails one or both, because invented or model-generated numbers do not inherit the logarithmic digit structure of real measurement. The indicator pools the individual-patient values and measures how far the first- and second-digit distributions stray from Benford. It runs only on articles.

Technical description

A deterministic Benford screen on the pooled numeric values of individual-patient data, testing the first and second significant digits, on articles only. It pools all numeric values, keeps those with absolute value at least ten (so the leading digit is well defined), and requires at least one hundred qualifying values for full confidence or at least thirty with reduced confidence. It extracts the first and second significant digits and compares their distributions against Benford expectations (first digit: log10(1 + 1/d); second digit: the corresponding sum over leading digits). It computes the first-digit mean absolute deviation and a normalised chi-squared for both digits, scoring from the first-digit deviation with a penalty added when the second digit also departs, since failing both at once is far less likely under honest data.

How it works

Layer 1 (deterministic, articles only): pooled values at least ten in magnitude have their first and second significant digits extracted. The first-digit mean absolute deviation drives the base score through the established conformity bands, and chi-squared statistics quantify each digit's departure. When the second digit also deviates significantly, a bonus is added. A mantissa arc test (Cinelli's benford.analysis) maps each value's log10 mantissa to the unit circle and computes the Rayleigh tail exp(minus n times L2) of circular uniformity; a significantly non-uniform mantissa together with an already non-conforming first-digit MAD adds a small increment. Fewer than one hundred but at least thirty values score more cautiously. Capped at 5.0. Metadata records the first-digit and second-digit deviation measures, a formal first-digit chi-squared goodness-of-fit p-value (a diagnostic, since MAD not chi-squared drives the score), the mantissa arc statistic and its Rayleigh p-value, and the number of values analysed. (Skip precedence: no stat context or no IPD returns the neutral no-data skip before the article gate.)

Why this matters

Benford's Law is the canonical fingerprint of naturally generated numbers spanning several orders of magnitude, catalogued by Benford across diverse natural data. Diekmann sharpened its forensic power and limits, showing the first digit of fabricated scientific data can deviate but the test must be applied carefully and that examining digits beyond the first strengthens it. Nigrini turned conformity into an audit instrument with the mean-absolute-deviation bands used here. Applying the test to raw individual-patient data and to two digits at once is more demanding than a first-digit check on summary numbers: a fabricator might tune leading digits to look Benford-like, but reproducing the joint first-and-second-digit structure across a whole patient-level dataset is much harder, so a simultaneous failure of both digits is a particularly credible signal.

Score thresholds

0-1
First and second digits conform to Benford's Law.
2-3
A clear first-digit departure from Benford.
4-5
First and second digits both depart from Benford, a strong fabrication signal.

Limitations

Benford's Law applies only to data spanning a wide multiplicative range, so pooling individual-patient values across columns (which mixes scales and widens the range) is what makes the test applicable, but a dataset dominated by one narrow-range variable may not conform even when genuine, and the indicator does not separately verify the magnitude span. It needs at least thirty and ideally one hundred values and keeps only values of magnitude ten or more, so small-valued variables contribute nothing. The second-digit test is weaker and noisier in small samples. The conformity bands are Nigrini's and directional. Runs only on documents classified as articles. The first-digit Benford test on reported text numbers is S9, and the terminal-digit test on the same data is D34.

References

  1. Benford F. (1938). The law of anomalous numbers. Proceedings of the American Philosophical Society 78(4):551-572
  2. Diekmann A. (2007). Not the First Digit! Using Benford's Law to Detect Fraudulent Scientific Data. Journal of Applied Statistics 34(3):321-329
  3. Nigrini MJ. (2012). Benford's Law: Applications for Forensic Accounting, Auditing, and Fraud Detection. Wiley, Hoboken NJ
  4. Hill TP. (1995). A Statistical Derivation of the Significant-Digit Law. Statistical Science 10(4):354-363
  5. Crone G, Green CD. (2025). Tools of the data detective: A review of statistical methods to detect data and result anomalies in psychology. Theory & Psychology 35(3):359-380
  6. Bordewijk EM, Li W, van Eekelen R, Wang R, Showell M, Mol BW, van Wely M. (2021). Methods to assess research misconduct in health-related research: A scoping review. Journal of Clinical Epidemiology 136:189-202
  7. Wilkinson J, Heal C, Antoniou GA, et al.. (2024). A survey of experts to identify methods to detect problematic studies: stage 1 of the INveStigating ProblEmatic Clinical Trials in Systematic Reviews project. Journal of Clinical Epidemiology 175:111512
  8. Carlisle JB. (2017). Data fabrication and other reasons for non-random sampling in 5087 randomised, controlled trials in anaesthetic and general medical journals. Anaesthesia 72(8):944-952
  9. Cinelli C. (2018). benford.analysis: Benford Analysis for Data Validation and Forensic Analytics. R package version 0.1.5