ResAIKit
Research Integrity Toolkit
Back to the encyclopedia
D18Statistical analysisFabrication DetectionLayer 1 (Deterministic)

Natural Heaping Absent

Checks whether the rounding habits of a dataset match how the variables were actually obtained. People reporting their own age, weight, or pain score round to convenient numbers, so the last digits pile up on 0 and 5, a pattern called heaping. Instruments, by contrast, record whatever they measure, so their last digits are spread evenly. Data that has it backwards, self-reported fields with suspiciously even digits or instrument fields that heap on 0 and 5, points to fabrication or manual rounding. The indicator matches each column to a dictionary of expected heaping behaviour and flags the mismatches. It works on the individual-patient data.

Technical description

D18 is a deterministic screen comparing the last-digit distribution of each variable against the heaping behaviour expected from how it is collected. It loads a dictionary of variables tagged as self-reported, where digit heaping at 0 and 5 is expected, or instrument-measured, where a uniform last-digit distribution is expected, and matches each numeric column to it using whole-word token matching so that an alias such as age matches a column named age or patient age but not average. For each matched column with enough values, it extracts the last digit of every value and runs a chi-squared test against a uniform distribution, and computes the proportion of values on the heaping digits. A self-reported variable is suspicious when heaping is absent, the heaping proportion falls below thirty percent with a significant departure from uniform, or is very low even without significance. An instrument variable is suspicious when heaping is present, more than twenty-five percent of values fall on 0 or 5 with a significant non-uniformity. The proportion of suspicious matched columns sets the score, with a bonus when both directions of mismatch occur.

How it works

Each matched column's last digits are obtained by rounding to the nearest integer and taking the value modulo ten, requiring at least twenty values. The chi-squared test compares the ten digit counts against the uniform expectation. For a self-reported variable, the column is flagged when the combined proportion on the expected heaping digits is below thirty percent and the distribution is significantly non-uniform, or when that proportion is below fifteen percent in an adequate sample, since machine-generated self-reports lack the human rounding bias. For an instrument variable, the column is flagged when the proportion on 0 and 5 exceeds twenty-five percent with significant non-uniformity, indicating manual rounding of values an instrument would have recorded precisely. For each self-reported variable the indicator also computes Whipple's index, the share of values ending in 0 or 5 relative to the one-fifth expected under no preference and scaled to 100, so an index near 100 confirms the absence of heaping while 500 marks total concentration on those digits; it is reported alongside the per-column verdict [4]. The score rises with the proportion of suspicious columns through bands at fifteen, thirty, fifty, and seventy percent, and a half point is added when both self-reported and instrument columns are flagged, capped at 5.0. The metadata records the matched and suspicious counts, the split of suspicious columns into self-reported and instrument variables, the proportion, the per-column details, and the Whipple index of each self-reported column.

Score thresholds

Score Meaning
0 Heaping is present where expected and absent where not.
1 to 2 A minority of variables show the wrong heaping behaviour.
3 to 5 Many variables heap or fail to heap against expectation; the top reached when both directions occur.

Why this matters

How the last digits of a variable are distributed is a fingerprint of how the value was obtained, and it is direction-specific. Mosimann and colleagues showed that people cannot produce uniform digits and that the terminal digits of questioned data reveal their origin [1], and the same line of work demonstrated that humans reporting quantities round systematically, producing the heaping at 0 and 5 that genuine self-reported data shows [2]. A language model generating a self-reported field defaults to uniform digits and so omits this human signature, while a fabricator hand-entering instrument data tends to round and so adds heaping where a real instrument would not. Taloni and colleagues documented that model-fabricated clinical data does not respect these conventions [3]. Because the expected behaviour differs by data source, D18 does not treat heaping as good or bad in itself but checks it against what each variable should show, which makes both an unexpectedly uniform self-report and an unexpectedly heaped instrument reading informative. Age and digit heaping is a long-established data-quality measure, quantified by indices such as Whipple's that read the degree of rounding in reported numbers [4], and recent forensic re-analyses, scoping reviews, and trustworthiness instruments place digit-preference checks among the standard screens for fabricated and machine-generated data [5, 6, 7, 8].

Limitations

The check can only assess variables present in its heaping dictionary, tagged with the correct source, and whose column name it matches, so an unrecognised variable or mis-tagged source is skipped or misjudged. It needs at least twenty rows per column and at least two matched columns. The last-digit extraction rounds to the nearest integer, so a genuinely decimal self-reported value loses its sub-integer structure, and the test is most meaningful for integer-scale self-reports such as age. Real instrument data can show mild heaping for legitimate reasons, such as a device that rounds internally, and real self-reports can lack heaping when values are small or collected precisely, so a flag is a screening signal rather than proof. The thresholds, thirty percent for expected heaping, twenty-five percent for unexpected heaping, and the significance level, are heuristic. The general terminal-digit uniformity test on reported numbers is indicator S8 and on individual-patient data is indicator D34, so D18 focuses on heaping relative to the expected source behaviour.

Theoretical background

D18 rests on the source-dependence of the last-digit distribution. The least significant digit of a measured quantity is, for an instrument that records to its full precision, effectively uniform over 0 to 9, because it reflects fine variation the device captures faithfully. Human reporting introduces a different process: when people estimate or recall a quantity they round to cognitively convenient anchors, overwhelmingly multiples of five and ten, so self-reported variables develop a characteristic excess on the digits 0 and 5 known as heaping. These two processes leave opposite last-digit signatures, and each is expected for a particular kind of variable, so the diagnostic is not the presence or absence of heaping alone but its agreement with the variable's source. Fabrication disturbs this in two complementary ways: synthetic generation imposes uniformity everywhere, erasing the heaping that self-reports should show, while manual transcription imposes rounding, adding heaping to instrument fields that should be uniform. The chi-squared test against the uniform distribution detects departures from evenness, and the heaping-digit proportion localises whether any such departure is the human 0-and-5 pattern, allowing the indicator to decide, for each variable, whether the observed digit behaviour matches the behaviour its origin predicts. Whipple's index places the strength of any 0-and-5 heaping on the standard demographic scale on which it has long been measured, where 100 denotes an even spread and higher values quantify the degree of rounding, so the indicator reports not only whether a self-reported variable heaps as expected but how strongly it does [4].

References

  1. Mosimann JE, Wiseman CV, Edelman RE. Data fabrication: Can people generate random digits? Accountability in Research. 1995;4(1):31-55. DOI: 10.1080/08989629508573866
  2. Mosimann JE, Dahlberg JE, Davidian NM, Krueger JW. Terminal digits and the examination of questioned data. Accountability in Research. 2002;9(2):75-92. DOI: 10.1080/08989620212969
  3. Taloni A, Scorcia V, Giannaccare G. Large Language Model Advanced Data Analysis Abuse to Create a Fake Data Set in Medical Research. JAMA Ophthalmology. 2023;141(12):1174-1175. DOI: 10.1001/jamaophthalmol.2023.5162
  4. A'Hearn B, Baten J, Crayen D. Quantifying Quantitative Literacy: Age Heaping and the History of Human Capital. The Journal of Economic History. 2009;69(3):783-808. https://www.cambridge.org/core/journals/journal-of-economic-history/article/abs/quantifying-quantitative-literacy-age-heaping-and-the-history-of-human-capital/57B3C54D7B2EF7D11CC70D60F1F4B3C6
  5. Crone G, Green CD. Tools of the data detective: A review of statistical methods to detect data and result anomalies in psychology. Theory & Psychology. 2025;35(3):359-380. DOI: 10.1177/09593543241311861
  6. Carlisle JB. Data fabrication and other reasons for non-random sampling in 5087 randomised, controlled trials in anaesthetic and general medical journals. Anaesthesia. 2017;72(8):944-952. DOI: 10.1111/anae.13938
  7. Bordewijk EM, Li W, van Eekelen R, Wang R, Showell M, Mol BW, van Wely M. Methods to assess research misconduct in health-related research: A scoping review. Journal of Clinical Epidemiology. 2021;136:189-202. DOI: 10.1016/j.jclinepi.2021.05.012
  8. Wilkinson J, Heal C, Antoniou GA, et al. A survey of experts to identify methods to detect problematic studies: stage 1 of the INveStigating ProblEmatic Clinical Trials in Systematic Reviews project. Journal of Clinical Epidemiology. 2024;175:111512. DOI: 10.1016/j.jclinepi.2024.111512