Terminal Digit (Table)
Checks whether the last digits of the numbers in a table are evenly spread. In genuine measured data the final significant digit is essentially random, so each of the ten digits should appear about a tenth of the time. People inventing numbers cannot produce uniform last digits: they favour round values ending in zero or five, or fall into other patterns. The indicator collects the terminal digits of the table's numbers, tests them for uniformity, and measures the preference for zero and five. Exact-zero cells are excluded so that legitimate structural zeros do not create a false pattern. It works on the reported numbers alone.
Technical description
T6 is a deterministic, generator-agnostic screen for fabricated or heaped numbers based on terminal-digit analysis. The last significant digit of a genuine measurement is determined by noise at the limit of precision and is therefore close to uniform over the digits zero through nine, a fact long used to detect data that were invented rather than measured. T6 extracts all numeric values from the table, removes exact zeros, takes the last significant digit of each, and applies two tests: a chi-square test of uniformity over the ten digits, and the proportion of digits that are zero or five, which under uniformity is twenty percent. A significantly non-uniform distribution or an excess preference for zero and five raises the score. At least thirty values are required after excluding zeros, since the chi-square test is unreliable on few observations.
How it works
All numeric cell values are gathered, and exact-zero values are dropped: a structural zero, such as an absent count or an empty category, is not a chosen terminal digit, and a table with many legitimate zeros would otherwise spike the count of digit zero and inflate the zero-five preference, producing a false flag. If fewer than thirty values remain, the indicator returns a zero score and records that there were too few values.
For each remaining value the last significant digit is extracted, ignoring trailing zeros after the decimal point so that, for example, 3.40 contributes the digit 4. A chi-square test compares the observed counts of digits zero through nine against the uniform expectation, returning a p-value. Separately, the fraction of terminal digits equal to zero or five is computed. The score combines the two: a chi-square p-value below 0.01 together with a zero-five preference above thirty percent scores 4.5; a significant chi-square alone scores 3.5; a marginal chi-square, p between 0.01 and 0.05, or a slight preference between twenty-five and thirty percent scores 2.0; otherwise the score is 0. Findings describe a non-uniform distribution and a digit preference where present. The metadata records the value count, the chi-square statistic and p-value, the zero-five preference, and the two flag states.
Score thresholds
| Score | Meaning |
|---|---|
| 0 to 1 | Terminal digits are uniformly distributed, as expected of measured data. |
| 2 to 3 | A marginal non-uniformity or a slight preference for round digits. |
| 4 to 5 | Strongly non-uniform terminal digits, or a strong preference for zero and five. Consistent with fabricated or heaped data. |
Why this matters
The uniformity of terminal digits is one of the oldest quantitative tests of data authenticity, because people are demonstrably bad at imitating it. Mosimann and colleagues showed experimentally that when people fabricate numbers they cannot generate uniform terminal digits, falling instead into characteristic preferences, which makes the last digit a sensitive fingerprint of invention [1]. The statistical behaviour of final digits in real data, and the conditions under which they are and are not uniform, were characterised in detail by Preece, who set out when departures are meaningful and when they merely reflect the measurement process [2]. The cue is now a standard part of statistical screening for research misconduct, applied alongside the granularity and randomization tests to flag tables whose numbers were not produced by measurement [3]. Excluding structural zeros keeps the test honest, because a sparse table of counts is full of legitimate zeros that say nothing about digit preference. By reading both uniformity and the specific pull toward round numbers, T6 captures the two ways fabricated digits most often betray themselves.
Limitations
Terminal-digit analysis assumes the recorded precision is fine enough that the last digit is noise; for coarsely rounded data, or values reported to few significant figures, the last digit is not random even when the data are genuine, so rounding conventions can mimic fabrication. The chi-square test needs enough values, so small tables are skipped, and after excluding zeros a sparse table may fall below the threshold. The test depends on optical character recognition, and a systematic misreading of a digit shifts the distribution. Legitimate digit preference also arises from instruments that report to the nearest five or ten, and from unit conversions, so a flag is suggestive rather than conclusive. The same terminal-digit test applied to values read from charts is indicator G11, and the broader Benford first-digit analysis is a separate screen, so T6 stays on the last digits of table numbers.
Theoretical background
T6 rests on the statistics of measurement precision. A measured quantity is the sum of a true value and noise, and when the recording precision is at or below the scale of that noise, the least significant recorded digit is effectively a uniform random variable over zero to nine, independent of the leading digits. This is why authentic data, across wildly different magnitudes and units, share a flat terminal-digit distribution. Human fabrication breaks the property in two ways. First, people reach for round numbers, oversampling zero and five, a heaping effect well documented in self-reported and invented data. Second, even when trying to be random, people produce detectable serial and frequency biases that a uniformity test exposes. A structural zero is a third, benign source of digit-zero excess that has nothing to do with either, which is why it is removed before testing. Reading the terminal digits against the uniform null therefore distinguishes numbers that were measured from numbers that were chosen.
References
- Mosimann JE, Wiseman CV, Edelman RE. Data fabrication: Can people generate random digits? Accountability in Research. 1995;4(1):31-55. DOI: 10.1080/08989629508573866
- Preece DA. Distributions of Final Digits in Data. The Statistician. 1981;30(1):31-60. DOI: 10.2307/2987702
- Carlisle JB. Data fabrication and other reasons for non-random sampling in 5087 randomised, controlled trials in anaesthetic and general medical journals. Anaesthesia. 2017;72(8):944-952. DOI: 10.1111/anae.13938