ResAIKit
Research Integrity Toolkit
Back to the encyclopedia
T6Image forensicsTable AnalysisLayer 1 (Deterministic)

Terminal Digit (Table)

Checks whether the last digits of a table's numbers are evenly spread. In genuine measured data the final significant digit is essentially random, so each digit appears about a tenth of the time; people inventing numbers favour round values ending in 0 or 5. Exact-zero cells are excluded so legitimate structural zeros do not create a false pattern.

Technical description

Gathers all numeric cell values, drops exact zeros, takes the last significant digit of each (ignoring trailing decimal zeros), and applies a chi-square test of uniformity over digits 0-9 plus the proportion of digits that are 0 or 5 (20 percent under uniformity). Scoring: chi-square p below 0.01 with a 0/5 preference above 30 percent scores 4.5; a significant chi-square alone scores 3.5; a marginal chi-square (p 0.01-0.05) or slight preference (25-30 percent) scores 2.0; otherwise 0. At least 30 non-zero values are required.

How it works

Layer 1 (deterministic): collects numeric values, removes exact zeros, extracts each value's last significant digit, runs a chi-square uniformity test, and measures the 0/5 preference. A significant non-uniformity and an excess 0/5 preference combine to set the score (up to 4.5); a marginal result scores 2.0. Findings describe the non-uniformity and any digit preference.

Why this matters

Terminal-digit uniformity is one of the oldest tests of data authenticity because people cannot imitate it: experiments show fabricated numbers fall into characteristic digit preferences, especially round values ending in 0 or 5. The last digit of a real measurement is noise at the limit of precision and is therefore uniform across magnitudes and units, so a non-uniform or 0/5-heaped distribution flags numbers that were chosen rather than measured. Excluding structural zeros keeps the test honest on sparse count tables.

Score thresholds

0-1
Terminal digits are uniformly distributed, as expected of measured data
2-3
A marginal non-uniformity or a slight preference for round digits
4-5
Strongly non-uniform terminal digits or a strong preference for 0 and 5, consistent with fabricated or heaped data

Limitations

Terminal-digit analysis assumes the recorded precision is fine enough that the last digit is noise; coarsely rounded data or values to few significant figures have non-random last digits even when genuine, so rounding can mimic fabrication. The chi-square test needs enough values, so small tables, or sparse tables after zeros are excluded, are skipped. It depends on OCR. Instruments reporting to the nearest 5 or 10, and unit conversions, produce legitimate digit preference. The chart-read version is indicator G11 and first-digit Benford analysis is a separate screen; T6 stays on the last digits of table numbers.

References

  1. Mosimann JE, Wiseman CV, Edelman RE. (1995). Data fabrication: Can people generate random digits?. Accountability in Research 4(1):31-55
  2. Preece DA. (1981). Distributions of Final Digits in Data. The Statistician 30(1):31-60
  3. Carlisle JB. (2017). Data fabrication and other reasons for non-random sampling in 5087 randomised, controlled trials in anaesthetic and general medical journals. Anaesthesia 72(8):944-952