ResAIKit
Research Integrity Toolkit
Back to the encyclopedia
G11-imgImage forensicsChart AnalysisLayer 1 (Deterministic)

Terminal Digit

Tests whether the last digits of numbers in a chart are distributed evenly, as expected in real data. Fabricated numbers often overuse digits like 0 and 5 as final digits.

Technical description

OCR-extracts the numbers from the chart, removes axis tick labels (a number left of the detected y-axis line or below the x-axis line, with a 12% margin fallback, whose terminal digit is 0 or 5 by the spacing of the ticks), and on the remaining plotted values tests the terminal (last) digit for uniformity. Integers and one-decimal values contribute their last integer digit; values with more decimals are skipped. A chi-squared test against a uniform expectation scores 0.0 (p > 0.10), 2.0 (0.01 < p <= 0.10), or 4.0 (p <= 0.01), and a combined 0-and-5 proportion above 0.35 (uniform expectation 0.20) adds 1.0. Requires at least thirty terminal digits after axis labels are removed; score capped at 5.0.

How it works

Layer 1 (deterministic). OCR-reads all numbers, removes axis tick labels by position, and on the remaining plotted values takes each terminal digit, tests the ten counts against a uniform distribution with a chi-squared test, and measures the 0-and-5 proportion. Sums the contributions, caps at 5.0, and reports the per-digit distribution, the chi-squared p-value, and the 0-and-5 proportion.

Why this matters

In genuine measured data the terminal digit is inconsequential and uniformly distributed; numbers people invent or record by hand are not. Controlled experiments show that people cannot generate uniform digits, which makes the terminal digits of fabricated data a detectable anomaly, and a surplus of 0 and 5 is the well-documented phenomenon of terminal-digit preference from manual rounding. The test has been used on questioned scientific data, election returns, and clinical-trial baselines. G11 applies it to the numbers a chart plots, removing the axis scale first so that only measured data is tested.

Score thresholds

0-1
Terminal digits are uniformly distributed, as expected for measured data
2-3
A non-uniform terminal-digit distribution, or an excess of 0 and 5
4-5
A strongly non-uniform distribution reinforced by zero-and-five rounding, consistent with manual fabrication

Limitations

Needs at least thirty terminal digits read by OCR from the plotted data after axis labels are removed, which most ordinary bar and line charts do not provide. The test assumes the terminal digit is inconsequential, which fails for deliberately rounded or coarsely measured data where an honest excess of 0 and 5 is expected, so that signal is a review cue rather than proof. Values with more than one decimal place are skipped. The axis-label split is positional. First-digit conformity is screened by the Benford indicator and mean-and-SD plausibility by the GRIM indicator, so G11 stays on the terminal-digit test.

References

  1. Mosimann JE, Wiseman CV, Edelman RE. (1995). Data fabrication: Can people generate random digits?. Accountability in Research 4(1):31-55
  2. Mosimann JE, Dahlberg JE, Davidian NM, Krueger JW. (2002). Terminal digits and the examination of questioned data. Accountability in Research 9(2):75-92
  3. Beber B, Scacco A. (2012). What the Numbers Say: A Digit-Based Test for Election Fraud. Political Analysis 20(2):211-234
  4. Al-Marzouki S, Evans S, Marshall T, Roberts I. (2005). Are these data real? Statistical methods for the detection of data fabrication in clinical trials. BMJ 331(7511):267-270