ResAIKit
Research Integrity Toolkit
Back to the encyclopedia
D34Statistical analysisFabrication ExtendedLayer 1 (Deterministic)

Terminal Digit Preference/Avoidance

Looks at the last digit of the numbers in each column. When a quantity is measured precisely, its final digit is essentially random and the ten possibilities 0 through 9 appear about equally often. People and generators that invent numbers instead lean toward round endings like 0 and 5, or systematically avoid certain digits, and when the same digit dominates the last place across several columns the coordination is a strong sign of fabrication. The indicator tests each numeric column's last-digit distribution against the uniform expectation. It works on the individual-patient data (IPD).

Technical description

D34 is a deterministic, domain-agnostic screen for terminal-digit preference or avoidance across the numeric columns of individual-patient data (IPD), complementing D18, which uses domain knowledge of where heaping is expected. It runs only on full articles and on datasets of at least twenty rows. For each numeric column it excludes categorical-looking columns, those with fewer than ten distinct values, takes the last digit of each value's rounded magnitude, and requires at least fifty values so that the expected count in each of the ten digit cells is at least five, the threshold for a valid chi-squared goodness-of-fit test. It compares the observed last-digit counts against the uniform expectation with a chi-squared test, and when that is significant it looks for the specific patterns of fabrication: over-representation of 0 or 5 that is also significantly above the uniform one-in-ten rate by an exact binomial test, any digit exceeding twenty percent, or any digit falling below two percent. For each flagged column it also reports the mean absolute deviation of the ten digit proportions from the uniform 0.10, a scale-free effect size in the spirit of Nigrini's digit-conformity measure. The score grows with the proportion of columns showing such a pattern, with a bonus when the same digit dominates across several columns.

How it works

A column qualifies when it is numeric, has at least ten distinct values so it is not an ordinal or binary scale, and has at least fifty non-null values. The last digit is the rounded absolute value modulo ten. A chi-squared goodness-of-fit test compares the ten observed digit counts against equal expected counts; if its p-value is at or above 0.05 the column is not flagged. When the test is significant the column is flagged if digit 0 or digit 5 both reaches twenty percent and is significantly above the uniform one-in-ten rate by an exact binomial upper-tail test, if any other digit exceeds twenty percent, or if any digit falls below two percent, and the dominant digit is recorded. The proportion of flagged columns among those tested sets the score: below twenty percent gives 0.0, then 1.0, 2.0, 3.0, and 4.0 at the twenty, thirty-five, fifty, and seventy percent thresholds. A further 0.5 is added, and a coordinated-preference finding emitted, when the same dominant digit appears across at least three columns, capped at 5.0. The metadata records the tested and flagged counts, the proportion, the dominant digit, the flagged columns, and the mean terminal-digit deviation across the flagged columns.

Score thresholds

Score Meaning
0 to 1 Last digits are close to uniform, as precise measurement produces.
2 to 3 A substantial share of columns show digit preference or avoidance.
4 to 5 Most columns are non-uniform, or one digit dominates across several columns, consistent with fabrication.

Why this matters

The distribution of final digits is a classic forensic probe. Preece surveyed how final digits behave in real data and showed that genuine high-precision measurement yields an essentially uniform last digit, so departures from uniformity mark rounding, coarse instruments, or invention [1]. Mosimann and colleagues demonstrated experimentally that people asked to generate numbers cannot produce uniform digits and lean toward preferred values, which is exactly the preference this indicator detects and a recognised reason to examine questioned data for non-random digit behaviour [2]. The coordinated form, one digit dominating the final place across several variables at once, is especially diagnostic, because independent real measurements would not share a terminal-digit bias. Grounding the test in a valid chi-squared comparison matters: Cochran established that the chi-squared goodness-of-fit approximation is trustworthy only when the expected count in each cell is at least about five, which is why the indicator requires at least fifty values across the ten digit cells before testing, so that a flag reflects a real departure rather than the noise of a sparse table [3]. Mosimann and colleagues later set out terminal-digit analysis as a concrete tool for examining questioned datasets [4], and Nigrini's mean absolute deviation supplies a scale-free companion to the significance test, summarising how far the digit profile sits from uniform rather than only whether the gap is detectable [5]. The same terminal-digit logic has been applied directly to clinical endpoints, where preference for round values tracked disease severity [6], and the check now sits among the data-integrity screens that recent work has assembled into practical instruments: catalogues of misconduct-detection methods [7], audits of false or zombie patient-level datasets [8], expert-derived warning-sign checklists [9], the INSPECT-SR trustworthiness tool [10], and reviews of the statistical data-detective toolkit [11].

Limitations

The indicator examines the last digit of the rounded magnitude, so for variables recorded with decimals it tests the integer part rather than the final recorded place, and genuinely coarse instruments that report to the nearest five or ten will legitimately concentrate on 0 and 5. Categorical and ordinal columns are excluded by the distinct-value rule, but a continuous variable with a naturally narrow range can still be misjudged. The fifty-value requirement, needed for a valid chi-squared test, means smaller datasets or columns are not assessed. The patterns and proportion bands are heuristic, and a flag is a prompt to inspect the raw values rather than proof of fabrication. Expected heaping informed by domain knowledge is indicator D18, and the machine-generation digit fingerprint is indicator D25, so D34 focuses on the agnostic uniform-last-digit test across the IPD.

Theoretical background

D34 rests on the principle that the trailing digit of a precisely measured quantity carries no information and is therefore uniform. If a measurement has resolution finer than its last reported place, the value below that place is effectively a fractional remainder spread evenly over the interval, so rounding to the reported precision leaves each of the ten final digits equally likely. Real data departs from this only when the instrument is coarse, when values are rounded by hand, or when they are invented, and each of these leaves a recognisable footprint: rounding pulls mass onto 0 and 5, coarse scales onto a few digits, and human or machine invention onto idiosyncratic favourites or away from disfavoured digits. The chi-squared statistic measures the total squared deviation of the observed digit counts from the uniform expectation, and its reference distribution is the chi-squared on nine degrees of freedom, an approximation whose accuracy depends on the expected cell counts being large enough; with ten cells this requires a sample of at least fifty for the minimum expected count to reach five, which the indicator enforces. Aggregating across columns and rewarding a shared dominant digit converts many weak per-column signals into a strong dataset-level one, because the terminal digits of independent real variables are independent, so a common bias across columns is improbable under genuine measurement and characteristic of a single fabricating hand or process. Because a significant chi-squared result reports only that some departure exists, the indicator pairs it with the mean absolute deviation of the digit proportions from one-tenth, an effect size that separates a large, fabrication-scale distortion from a small but detectable one in a large sample, and it requires the round-number heaping at 0 and 5 to clear an exact binomial upper-tail test before counting it as preference rather than ordinary sampling fluctuation.

References

  1. Preece DA. Distributions of final digits in data. Journal of the Royal Statistical Society Series D (The Statistician). 1981;30(1):31-60. DOI: 10.2307/2987702
  2. Mosimann JE, Wiseman CV, Edelman RE. Data fabrication: can people generate random digits? Accountability in Research. 1995;4(1):31-55. DOI: 10.1080/08989629508573866
  3. Cochran WG. Some methods for strengthening the common chi-squared tests. Biometrics. 1954;10(4):417-451. DOI: 10.2307/3001616
  4. Mosimann JE, Dahlberg JE, Davidian NM, Krueger JW. Terminal digits and the examination of questioned data. Accountability in Research. 2002;9(2):75-92. DOI: 10.1080/08989620212969
  5. Nigrini MJ. Benford's Law: Applications for Forensic Accounting, Auditing, and Fraud Detection. Hoboken, NJ: Wiley; 2012. DOI: 10.1002/9781119203094
  6. Minhas J, Baird G, Appleby D, et al. Terminal digit preference in pulmonary hypertension endpoints. American Journal of Respiratory and Critical Care Medicine. 2022;205(12):1482-1485. DOI: 10.1164/rccm.202108-2015LE
  7. Bordewijk EM, Li W, van Eekelen R, et al. Methods to assess research misconduct in health-related research: a scoping review. Journal of Clinical Epidemiology. 2021;136:189-202. DOI: 10.1016/j.jclinepi.2021.05.012
  8. Carlisle JB. False individual patient data and zombie randomised controlled trials submitted to Anaesthesia. Anaesthesia. 2021;76(4):472-479. DOI: 10.1111/anae.15263
  9. Parker L, Boughton S, Lawrence R, Bero L. Experts identified warning signs of fraudulent research: a qualitative study to inform a screening tool. Journal of Clinical Epidemiology. 2022;151:1-17. DOI: 10.1016/j.jclinepi.2022.07.006
  10. Wilkinson J, Heal C, Antoniou GA, et al. A survey of experts to identify methods to detect problematic studies: stage 1 of the INveStigating ProblEmatic Clinical Trials in Systematic Reviews project. Journal of Clinical Epidemiology. 2024;175:111512. DOI: 10.1016/j.jclinepi.2024.111512
  11. Crone G, Green CD. Tools of the data detective: a review of statistical methods to detect data and result anomalies in psychology. Theory & Psychology. 2025;35(3):359-380. DOI: 10.1177/09593543241311861