ResAIKit
Research Integrity Toolkit
Back to the encyclopedia
S8Statistical analysisStatistical ConsistencyLayer 1 (Deterministic)

Terminal Digit (Stats)

Looks at the last significant digit of every measurement reported in an article and tests whether those digits are spread evenly across 0 through 9, as genuine measured data should be. People who invent numbers tend to favour certain digits, especially 0 and 5, so a lopsided distribution of last digits is a sign that the data may have been made up or heavily rounded by hand. The indicator runs a chi-squared uniformity test and separately measures how often the last digit is a 0 or a 5, after filtering out numbers that are not free measurements. It works on the reported numbers alone, and only on articles with enough numeric data.

Technical description

S8 is a deterministic forensic test on the terminal digits of the numbers extracted from the article. The terminal digit of a genuine measurement is the least predictable part of the value, so across measured data the last digits should be near-uniform over 0 to 9; fabricated or hand-massaged data instead show digit preference, most often for 0 and 5. Because the extractor pools every standalone number in the text, S8 first removes the classes that are not free-terminal-digit measurements: exact zeros (a structural zero carries no chosen terminal digit), four-digit-year integers from 1900 to 2100, sub-1 values (p-values, alpha levels, proportions, and correlations, whose terminal digits are legitimately non-uniform), and percentages (a number immediately followed by a percent sign). It then requires at least thirty values, runs a Pearson chi-squared goodness-of-fit test of the ten last-digit counts against the uniform expectation, and computes the proportion of terminal digits equal to 0 or 5. The test runs only on documents classified as articles; with no statistical context it returns the neutral no-data result, and on a non-article it returns a neutral score.

How it works

The reported numbers are filtered down to plausible measurements: exact zeros, four-digit years, sub-1 probabilities and proportions, and percentages are removed. Percentages are detected from context when the parallel number positions are available, by checking whether the character after the number is a percent sign; in their absence the value-based exclusions still apply. The last significant digit of each surviving value is taken after stripping fractional trailing zeros, so 3.45 gives 5, 12.0 gives 2, and 100 gives 0. With at least thirty values, the ten observed digit counts O_0 ... O_9 are compared against the equal expected count E = N / 10 by Pearson's statistic chi2 = sum((O_i - E)^2 / E) over the digits 0 to 9, with nine degrees of freedom, yielding a uniformity p-value. In parallel the preference for 0 and 5 is computed as the fraction of terminal digits equal to 0 or 5.

The score follows the p-value, with a bonus for the specific 0-and-5 signature: p > 0.05 scores 0.0; 0.01 < p <= 0.05 scores 2.0 (mild non-uniformity); p <= 0.01 scores 3.5 (strong non-uniformity), rising to 4.5 when the 0-or-5 proportion exceeds 0.30. A non-zero score produces a finding (severity error at 3.5 and above, otherwise warning) reporting the chi-squared value, the p-value, and the 0-or-5 proportion, anchored to an example number ending in 0 or 5. The metadata records the count analysed, the chi-squared value, the p-value, and the 0-or-5 proportion.

Score thresholds

Score Meaning
0 Terminal digits are consistent with a uniform distribution.
2 Mild non-uniformity, with a uniformity p-value between 0.01 and 0.05.
3 Strong non-uniformity, with a uniformity p-value at or below 0.01.
4 to 5 Strong non-uniformity together with a marked preference for the digits 0 and 5.

Why this matters

The terminal digit is the part of a measurement that should behave like a lottery draw, so its distribution is one of the oldest and most robust tests for invented data. Mosimann and colleagues showed experimentally that people asked to write random digits cannot do so, systematically favouring some digits, and argued that suspect data should be screened for this non-randomness [1]; their later work formalised the examination of terminal digits in questioned datasets [2]. The phenomenon is real and measurable in practice: a chi-squared analysis of phase III pulmonary-hypertension trial endpoints found a clear terminal-digit preference for 0 and 5, with the potential to distort the assessment of treatment effects [3]. The same reasoning sits in the modern data-anomaly toolkit [4] and underpins the forensic re-analysis of clinical trials, where digit non-randomness is one signal among several [5]. Because the terminal digit is nearly free of legitimate scientific structure, a clear preference points to a human hand rather than an instrument, and separating mere non-uniformity from the specific 0-and-5 preference keeps the most diagnostic signature visible. Digit-distribution screening sits in the broader research-integrity toolkit, catalogued in scoping reviews [6] and embedded in validated trial-integrity instruments and trustworthiness checklists [7,8].

Limitations

The test needs enough data: at least thirty numbers, and is most reliable with fifty or more, where each digit's expected count clears the small-count regime in which the chi-squared approximation weakens. It assumes the surviving pool is genuine measurements; the filter removes exact zeros, four-digit years, sub-1 probabilities and proportions, and percentages, but other non-measurement values such as counts, sample sizes, degrees of freedom, and test statistics can remain and create non-uniformity that is not fabrication, so the result is a screening signal rather than proof. Excluding all sub-1 values also drops the occasional genuine measurement below one, a deliberate trade that favours fewer false positives. Heaping on 0 and 5 also arises from honest coarse rounding or from an instrument or observer with a known digit bias, so a high score flags a pattern to investigate. The test runs only on documents classified as articles. The Benford first-digit test on the same numbers is indicator S9, and the terminal-digit test on individual-patient data is D34.

Theoretical background

For a quantity measured to a fixed precision, the last retained digit is determined by sources of variation far smaller than the measurement step, so over many independent measurements it is effectively uniform on 0 to 9; this is the null hypothesis the chi-squared test evaluates. Departures take two broad forms. The first is general non-uniformity, any digit over-represented, which the chi-squared statistic detects without regard to which digit. The second is the specific human signature of preferring round values, an excess of 0 and 5, which the separate 0-or-5 proportion isolates, because people rounding or inventing numbers gravitate to those digits, a tendency demonstrated both in controlled random-digit experiments and in real clinical measurement. The exclusions sharpen the null by removing numbers whose terminal digits are not free: a structural zero has no chosen digit; a calendar year is drawn from a narrow recent range; a p-value, alpha, proportion, or correlation lives below one and clusters on conventional values; and a percentage is typically rounded to a whole number or a half. Pooling any of these with measurements would manufacture apparent preference. The chi-squared approximation assumes expected counts that are not too small, which is why the test requires a floor on the sample and is read more cautiously near it. The companion first-digit law, tested by S9, looks at the opposite end of the number and is governed by Benford's distribution rather than uniformity.

References

  1. Mosimann JE, Wiseman CV, Edelman RE. Data fabrication: Can people generate random digits? Accountability in Research. 1995;4(1):31-55. DOI: 10.1080/08989629508573866
  2. Mosimann JE, Dahlberg JE, Davidian NM, Krueger JW. Terminal digits and the examination of questioned data. Accountability in Research. 2002;9(2):75-92. DOI: 10.1080/08989620212969
  3. Minhas J, Baird G, Appleby D, et al. Terminal Digit Preference in Pulmonary Hypertension Endpoints. American Journal of Respiratory and Critical Care Medicine. 2022;205(12):1482-1485. DOI: 10.1164/rccm.202108-2015LE
  4. Crone G, Green CD. Tools of the data detective: A review of statistical methods to detect data and result anomalies in psychology. Theory & Psychology. 2025;35(3):359-380. DOI: 10.1177/09593543241311861
  5. Carlisle JB. Data fabrication and other reasons for non-random sampling in 5087 randomised, controlled trials in anaesthetic and general medical journals. Anaesthesia. 2017;72(8):944-952. DOI: 10.1111/anae.13938
  6. Bordewijk EM, Li W, van Eekelen R, Wang R, Showell M, Mol BW, van Wely M. Methods to assess research misconduct in health-related research: A scoping review. Journal of Clinical Epidemiology. 2021;136:189-202. https://doi.org/10.1016/j.jclinepi.2021.05.012
  7. Hunter KE, Aberoumand M, Libesman S, et al. The Individual Participant Data Integrity Tool for assessing the integrity of randomised trials. Research Synthesis Methods. 2024;15(6):917-939. https://doi.org/10.1002/jrsm.1738
  8. Wilkinson J, Heal C, Antoniou GA, et al. A survey of experts to identify methods to detect problematic studies: stage 1 of the INveStigating ProblEmatic Clinical Trials in Systematic Reviews project. Journal of Clinical Epidemiology. 2024;175:111512. https://doi.org/10.1016/j.jclinepi.2024.111512