ResAIKit
Research Integrity Toolkit
Back to the encyclopedia
S13Statistical analysisStatistical ConsistencyLayer 1 (Deterministic)

Suspicious Rounding

Looks for rounding and precision patterns in reported statistics that are unusual in genuinely computed results: p-values reported exactly on the thresholds 0.05 or 0.01, standard deviations identical across several groups, and reported means whose number of decimal places varies widely. Each is a weak individual clue, but together they point to numbers entered or copied by hand rather than produced by analysis software.

Technical description

A deterministic screen for three rounding and precision irregularities, each contributing one flag; the flag count sets the score. (1) Round p-values: counts p-values equal to exactly 0.05 or 0.01 and flags when two or more occur, since a computed p-value almost never lands precisely on a threshold. (2) Identical SDs: groups (mean, SD, n) triplets by source and flags any source where three or more share an identical standard deviation, a known fingerprint of copied or invented summary statistics. (3) Precision inconsistency: counts the significant decimal places of each reported mean from a fixed-point representation (so very small or large means are not misread through scientific notation) and, with at least four triplets, flags a coefficient of variation of the decimal counts above 0.5.

How it works

Layer 1 (deterministic): the round-p-value check counts exact 0.05 or 0.01 values and flags at two or more; the identical-SD check tallies standard deviations within each source and flags any value occurring three or more times; the precision check counts significant decimals of each mean from a fixed-point string and flags a coefficient of variation above 0.5 with at least four triplets. Score by flag count: 0 gives 0.0, one gives 2.0, two gives 3.0, three gives 4.5, capped at 5.0. Round-p and identical-SD findings are warnings; the precision finding is informational. Metadata records round_p_count, identical_sd_groups, precision_issues, and precision_cv.

Why this matters

The fine texture of how numbers are rounded carries information about how they were produced. Garcia-Berthou and Alcaraz found a substantial fraction of reported results in leading journals are internally incongruent, much of it from rounding and transcription. The granularity of reported statistics underlies the GRIM family of consistency tests, which exploit that real means and SDs of given sample sizes can only take certain values at a given precision. Identical summary statistics across groups are a recognised fabrication signal in forensic re-analysis of trials. None of the three patterns proves misconduct alone, which is why they are weak flags whose concurrence is the meaningful signal, each reflecting how hand-entered or copied numbers differ from analysis-pipeline output.

Score thresholds

0
None of the three rounding or precision patterns is present.
2
One pattern: exact-threshold p-values, identical standard deviations, or inconsistent precision.
3
Two of the three patterns occur together.
4-5
All three patterns occur, a combination unusual in genuinely computed results.

Limitations

Each sub-check is a weak heuristic. A p-value reported as exactly 0.05 or 0.01 is often legitimate rounding or a bound (below 0.05) reported as a point value, which the indicator cannot distinguish. Identical standard deviations can occur legitimately when variables share a scale or rounding collapses nearby values, and the check uses exact equality. The precision check is weakest: reporting different variables to different natural precisions is normal, so a high coefficient of variation of decimal places has many benign causes. All checks depend on accurate extraction. The thresholds two, three, and 0.5 are directional. GRIM and GRIMMER granularity tests are S3 and S4, identical-statistic detection on individual-patient data is D8, and value duplication is S14.

References

  1. García-Berthou E, Alcaraz C. (2004). Incongruence between test statistics and P values in medical papers. BMC Medical Research Methodology 4:13
  2. Brown NJL, Heathers JAJ. (2017). The GRIM Test: A Simple Technique Detects Numerous Anomalies in the Reporting of Results in Psychology. Social Psychological and Personality Science 8(4):363-369
  3. Simonsohn U. (2013). Just post it: the lesson from two cases of fabricated data detected by statistics alone. Psychological Science 24(10):1875-1888
  4. Carlisle JB. (2017). Data fabrication and other reasons for non-random sampling in 5087 randomised, controlled trials in anaesthetic and general medical journals. Anaesthesia 72(8):944-952
  5. Crone G, Green CD. (2025). Tools of the data detective: A review of statistical methods to detect data and result anomalies in psychology. Theory & Psychology 35(3):359-380
  6. Bordewijk EM, Li W, van Eekelen R, Wang R, Showell M, Mol BW, van Wely M. (2021). Methods to assess research misconduct in health-related research: A scoping review. Journal of Clinical Epidemiology 136:189-202
  7. Hunter KE, Aberoumand M, Libesman S, et al.. (2024). The Individual Participant Data Integrity Tool for assessing the integrity of randomised trials. Research Synthesis Methods 15(6):917-939
  8. Wilkinson J, Heal C, Antoniou GA, et al.. (2024). A survey of experts to identify methods to detect problematic studies: stage 1 of the INveStigating ProblEmatic Clinical Trials in Systematic Reviews project. Journal of Clinical Epidemiology 175:111512