ResAIKit
Research Integrity Toolkit
Back to the encyclopedia
D34Statistical analysisFabrication ExtendedLayer 1 (Deterministic)

Terminal Digit Preference/Avoidance

Detects whether the last significant digit of numeric values is distributed appropriately; humans prefer some terminal digits (0, 5) while LLMs may avoid them.

Technical description

When a quantity is measured precisely its final digit is essentially random and the ten possibilities 0 to 9 appear about equally often, whereas people and generators that invent numbers lean toward round endings like 0 and 5 or systematically avoid certain digits. D34 is a domain-agnostic screen across the numeric columns of the individual-patient data (IPD), complementing D18 which uses domain knowledge of where heaping is expected. It runs on full articles with at least twenty rows, excludes categorical-looking columns (fewer than ten distinct values), takes the last digit of each value's rounded magnitude, requires at least fifty values so each digit cell's expected count is at least five, and tests the last-digit distribution against uniform with a chi-squared test. When the test is significant it requires any heaping at 0 or 5 to also be significant by an exact binomial upper-tail test before counting it, and it reports the mean absolute deviation of the ten digit proportions from one-tenth as a scale-free effect size.

How it works

Layer 1 (deterministic): a column qualifies when numeric, with at least ten distinct values (not an ordinal or binary scale) and at least fifty non-null values. The last digit is the rounded absolute value modulo ten. A chi-squared goodness-of-fit test compares the ten observed digit counts against equal expected counts; if its p-value is at or above 0.05 the column is not flagged. When significant the column is flagged if digit 0 or 5 reaches twenty percent and is also above the uniform one-in-ten rate by an exact binomial upper-tail test, any other digit exceeds twenty percent, or any digit falls below two percent. The proportion of flagged columns sets the score (0.0, then 1.0, 2.0, 3.0, 4.0 at the twenty, thirty-five, fifty, and seventy percent thresholds), with 0.5 added when the same dominant digit appears across at least three columns, capped at 5.0. The mean terminal-digit deviation from one-tenth across the flagged columns is recorded as a scale-free effect size.

Why this matters

The distribution of final digits is a classic forensic probe: genuine high-precision measurement yields an essentially uniform last digit, so departures mark rounding, coarse instruments, or invention, and experiments show people cannot generate uniform digits and lean toward preferred values. The coordinated form, one digit dominating the final place across several variables at once, is especially diagnostic, because the terminal digits of independent real measurements would not share a bias.

Score thresholds

0-1
Last digits are close to uniform, as precise measurement produces
2-3
A substantial share of columns show digit preference or avoidance
4-5
Most columns are non-uniform, or one digit dominates across several columns, consistent with fabrication

Limitations

The indicator examines the last digit of the rounded magnitude, so for decimal-recorded variables it tests the integer part rather than the final recorded place, and genuinely coarse instruments reporting to the nearest five or ten will legitimately concentrate on 0 and 5. Categorical and ordinal columns are excluded by the distinct-value rule, but a naturally narrow-range continuous variable can still be misjudged. The fifty-value requirement, needed for a valid chi-squared test, means smaller datasets or columns are not assessed. The patterns and bands are heuristic, and a flag prompts inspection rather than proving fabrication. Domain-informed heaping is indicator D18 and the machine-generation digit fingerprint is indicator D25; D34 focuses on the agnostic uniform-last-digit test across the IPD.

References

  1. Preece DA. (1981). Distributions of final digits in data. Journal of the Royal Statistical Society Series D (The Statistician)
  2. Mosimann JE, Wiseman CV, Edelman RE. (1995). Data fabrication: can people generate random digits?. Accountability in Research
  3. Cochran WG. (1954). Some methods for strengthening the common chi-squared tests. Biometrics
  4. Mosimann JE, Dahlberg JE, Davidian NM, Krueger JW. (2002). Terminal digits and the examination of questioned data. Accountability in Research 9(2):75-92
  5. Nigrini MJ. (2012). Benford's Law: Applications for Forensic Accounting, Auditing, and Fraud Detection. Wiley, Hoboken NJ
  6. Minhas J, Baird G, Appleby D, et al.. (2022). Terminal Digit Preference in Pulmonary Hypertension Endpoints. American Journal of Respiratory and Critical Care Medicine 205(12):1482-1485
  7. Bordewijk EM, Li W, van Eekelen R, Wang R, Showell M, Mol BW, van Wely M. (2021). Methods to assess research misconduct in health-related research: A scoping review. Journal of Clinical Epidemiology 136:189-202
  8. Carlisle JB. (2021). False individual patient data and zombie randomised controlled trials submitted to Anaesthesia. Anaesthesia 76(4):472-479
  9. Parker L, Boughton S, Lawrence R, Bero L. (2022). Experts identified warning signs of fraudulent research: a qualitative study to inform a screening tool. Journal of Clinical Epidemiology 151:1-17
  10. Wilkinson J, Heal C, Antoniou GA, et al.. (2024). A survey of experts to identify methods to detect problematic studies: stage 1 of the INveStigating ProblEmatic Clinical Trials in Systematic Reviews project. Journal of Clinical Epidemiology 175:111512
  11. Crone G, Green CD. (2025). Tools of the data detective: A review of statistical methods to detect data and result anomalies in psychology. Theory & Psychology 35(3):359-380