S13Statistical analysisStatistical ConsistencyLayer 1 (Deterministic)

Suspicious Rounding

Looks for rounding and precision patterns in an article's reported statistics that are unusual in genuinely computed results. It runs three checks: p-values reported exactly on the conventional thresholds of 0.05 or 0.01, standard deviations that are identical across several groups, and reported means whose number of decimal places varies widely. Each pattern is a weak individual clue, but together they point to numbers entered or copied by hand rather than produced by analysis software. It works on the reported numbers alone.

Technical description

S13 is a deterministic screen for three rounding and precision irregularities, each contributing one flag, with the flag count setting the score. (1) Round p-values: it counts p-values equal to exactly 0.05 or 0.01 and flags when two or more occur, since a computed p-value almost never lands precisely on a threshold. (2) Identical standard deviations: it groups the (mean, standard deviation, sample size) triplets by their source and flags any source where three or more share an identical standard deviation, a known fingerprint of copied or invented summary statistics. (3) Precision inconsistency: it counts the significant decimal places of each reported mean from a fixed-point representation, so that very small or very large means are not misread through scientific notation, and with at least four triplets flags a coefficient of variation of those decimal-place counts above 0.5. None of the three proves misconduct alone; their concurrence is the meaningful signal.

How it works

Three sub-checks run on the statistical context.

Round p-values. It counts the reported p-values whose value is exactly 0.05 or 0.01 (round_p_count) and raises a warning when that count is two or more.

Identical standard deviations. It collects the standard deviations of the (mean, SD, n) triplets within each source and, for any standard-deviation value occurring three or more times in a source, increments identical_sd_groups; a non-zero count raises a warning.

Precision inconsistency. With at least four triplets, it counts the significant decimal places of each mean via _count_decimal_places, which formats the value in fixed point (f"{value:.10f}") and strips trailing zeros, then computes the coefficient of variation of those counts (precision_cv); a value above 0.5 raises an informational flag.

The score is set by the number of flags: 0 gives 0.0, one gives 2.0, two gives 3.0, three gives 4.5, capped at 5.0. The round-p and identical-SD findings are warnings and the precision finding is informational. The metadata records round_p_count, identical_sd_groups, precision_issues, and precision_cv.

Score thresholds

Score	Meaning
0	None of the three rounding or precision patterns is present.
2	One pattern: exact-threshold p-values, identical standard deviations, or inconsistent precision.
3	Two of the three patterns occur together.
4 to 5	All three patterns occur, a combination unusual in genuinely computed results.

Why this matters

The fine texture of how numbers are rounded carries information about how they were produced. García-Berthou and Alcaraz found that a substantial fraction of reported results in leading journals are internally incongruent, much of it from rounding and transcription [1]. The granularity of reported statistics underlies the GRIM family of consistency tests, which exploit that real means and standard deviations of a given sample size can only take certain values at a given precision [2]. Identical and suspiciously similar summary statistics are a recognised fabrication signal: cases of invented data have been exposed precisely because reported means and standard deviations were too alike to be real [3], and forensic re-analysis of trials treats repeated summary statistics across groups as a red flag [4]. These checks sit among the broader toolkit for screening reported descriptive statistics [5]. None of the three patterns proves misconduct alone, which is why they are weak flags whose concurrence is the meaningful signal, each reflecting how hand-entered or copied numbers differ from analysis-pipeline output. Rounding and precision irregularities are catalogued among the data-integrity checks in recent scoping reviews [6] and trustworthiness instruments and checklists [7,8].

Limitations

Each sub-check is a weak heuristic. A p-value reported as exactly 0.05 or 0.01 is often legitimate rounding, or a bound (below 0.05) reported as a point value, which the indicator cannot distinguish. Identical standard deviations can occur legitimately when variables share a scale or when rounding collapses nearby values, and the check uses exact equality. The precision check is the weakest: reporting different variables to different natural precisions is normal, so a high coefficient of variation of decimal places has many benign causes. All checks depend on accurate extraction of the p-values and triplets. The thresholds two, three, and 0.5 are directional. The GRIM and GRIMMER granularity tests are indicators S3 and S4, identical-statistic detection on individual-patient data is D8, and value duplication is S14.

Theoretical background

Numbers produced by an analysis pipeline and numbers entered or copied by hand differ in their fine structure. Software reports p-values to a consistent precision and rarely to a value that is exactly a conventional threshold, so a cluster of p-values sitting precisely on 0.05 or 0.01 suggests they were written to the threshold rather than computed. Standard deviations are continuous quantities that depend on every observation, so two groups producing the identical standard deviation to the reported precision is improbable, and three or more doing so within one table is a classic sign that a column was copied or invented; the same logic powers the granularity tests, which ask which means and standard deviations are even attainable for a given sample size. Decimal precision, finally, tends to be uniform within a genuine results table because it inherits the analysis software's formatting, so a wide spread in the number of reported decimals can indicate values assembled from different sources. Each signal is individually weak, with common innocent explanations, so the indicator treats them as additive flags and reserves its higher scores for their co-occurrence, which is far less easily explained away.

References

García-Berthou E, Alcaraz C. Incongruence between test statistics and P values in medical papers. BMC Medical Research Methodology. 2004;4:13. DOI: 10.1186/1471-2288-4-13
Brown NJL, Heathers JAJ. The GRIM Test: A Simple Technique Detects Numerous Anomalies in the Reporting of Results in Psychology. Social Psychological and Personality Science. 2017;8(4):363-369. DOI: 10.1177/1948550616673876
Simonsohn U. Just post it: the lesson from two cases of fabricated data detected by statistics alone. Psychological Science. 2013;24(10):1875-1888. DOI: 10.1177/0956797613480366
Carlisle JB. Data fabrication and other reasons for non-random sampling in 5087 randomised, controlled trials in anaesthetic and general medical journals. Anaesthesia. 2017;72(8):944-952. DOI: 10.1111/anae.13938
Crone G, Green CD. Tools of the data detective: A review of statistical methods to detect data and result anomalies in psychology. Theory & Psychology. 2025;35(3):359-380. DOI: 10.1177/09593543241311861
Bordewijk EM, Li W, van Eekelen R, Wang R, Showell M, Mol BW, van Wely M. Methods to assess research misconduct in health-related research: A scoping review. Journal of Clinical Epidemiology. 2021;136:189-202. https://doi.org/10.1016/j.jclinepi.2021.05.012
Hunter KE, Aberoumand M, Libesman S, et al. The Individual Participant Data Integrity Tool for assessing the integrity of randomised trials. Research Synthesis Methods. 2024;15(6):917-939. https://doi.org/10.1002/jrsm.1738
Wilkinson J, Heal C, Antoniou GA, et al. A survey of experts to identify methods to detect problematic studies: stage 1 of the INveStigating ProblEmatic Clinical Trials in Systematic Reviews project. Journal of Clinical Epidemiology. 2024;175:111512. https://doi.org/10.1016/j.jclinepi.2024.111512