S15Statistical analysisStatistical ConsistencyLayer 1 (Deterministic)

No Imperfections

Flags datasets that look too clean to be real. Genuine data collection produces missing values, occasional extreme observations, and dropout, so a study whose tables have no gaps at all, whose numeric columns contain no outliers, and whose text claims perfect follow-up or zero attrition is unusual enough to warrant a second look. The indicator runs three checks for these signs of suspicious perfection and combines them. It works on the parsed tables and the article text.

Technical description

S15 is a deterministic screen for the absence of the imperfections that characterise real-world data, combining three sub-checks whose pattern of activation sets the score. The first looks for zero missing values: across all tables, if there are more than fifty cells and not a single one is empty or a missing-data marker, the data are flagged as suspiciously complete. The set of recognised missing markers includes blanks, not-applicable variants, and the placeholder symbols journals use for an empty cell, such as the em dash, en dash, hyphen, middle dot, bullet, and ellipsis, as well as textual markers like not-detected and not-reported, so that a table which does mark its gaps is not mistaken for a complete one. The second looks for zero outliers: in any numeric column with at least ten parsed values, it checks whether every value lies within the robust band median plus or minus three times the median absolute deviation (MAD), the MAD scaled by 1.4826, following Leys and colleagues [4, 5]; this robust rule replaces a mean and standard deviation band, whose mean and standard deviation an outlier would itself inflate and so conceal (masking). The third scans the text for explicit perfection claims such as no dropout, one hundred percent follow-up, no missing data, zero attrition, complete data, and no adverse events. The score rises as more of these signs co-occur.

How it works

The missing-data check tallies every table cell, classifies a cell as missing when its trimmed, lower-cased text matches a known missing marker, and flags zero-missing only when the total cell count exceeds fifty and the missing count is zero. The outlier check parses each column's numeric cells, and for any column with at least ten values computes the median and the median absolute deviation (MAD), scales the MAD by 1.4826, and tests whether all values fall within three scaled MADs of the median; a column whose scaled MAD is zero, meaning half or more of its values are identical, is ignored as uninformative. If at least one column was checked and none contained an outlier, the zero-outlier sign is set. The text check matches a fixed list of perfection phrases case-insensitively.

The three signs combine into the score as follows: zero missing together with zero outliers and a text claim scores 5.0; zero missing with zero outliers scores 4.0; zero missing alone scores 2.5; a text claim alone or a zero-outlier finding alone scores 1.5; and the absence of all signs scores 0. The missing-data finding carries warning severity, while the outlier and text findings are informational. The metadata records each of the three signs and the matched text phrases.

Score thresholds

Score	Meaning
0	The data show the normal imperfections of real collection.
1 to 2	A single weak sign: a perfection claim in the text, or numeric columns with no outliers.
2 to 3	Tables with more than fifty cells and no missing values at all.
4 to 5	Several signs together: complete data, no outliers, and explicit perfection claims.

Why this matters

Real data are messy, and the absence of that mess is itself informative. Simonsohn showed that fabrication can be exposed by statistics alone when reported results are too clean or too similar to have come from genuine sampling, identifying two cases from the implausibly low variability of their summary statistics [1]. Carlisle's forensic re-analyses likewise treat improbably consistent and complete data as an integrity signal across large bodies of trials [2]. The classic biostatistical account of fraud detection lists the lack of normal data imperfections, including suspiciously complete follow-up and absent outliers, among the patterns that distinguish invented from genuine datasets [3]. The reason these signs matter is that a fabricator focused on producing a clean, publishable result rarely thinks to add the missing values, extreme observations, and dropout that real studies accumulate, so their absence, especially in combination, is a coherent fingerprint. Because each sign individually has innocent explanations, S15 treats them as weak flags and reserves its highest scores for their co-occurrence. Recent forensic and methodological work reinforces this: robust statistics are now the standard tool for deciding whether apparent outliers are real [4, 5, 6], tests for implausibly balanced or consistent baseline data formalise the too-clean signal [7], and current reviews and trustworthiness instruments catalogue absent imperfections among routine integrity checks [8, 9, 10, 11].

Limitations

Each sub-check is a heuristic with benign explanations, so a high score is a prompt to inspect rather than evidence of misconduct. The zero-outlier sub-check is the weakest: for a small sample, having no value beyond the robust three-MAD band is the normal expectation rather than a surprise, because an extreme observation at that distance is rare, so the sign is only meaningful in larger columns and should be read accordingly. The robust median and MAD band, unlike a mean and standard deviation band, is not itself distorted by the very outlier it seeks, so the decision that outliers are absent is more trustworthy [4, 5, 6]. Genuinely complete data exist, particularly in small, well-managed, or registry-based studies, so zero missing values is not proof of anything. The text-phrase list is fixed and literal, so it both misses paraphrased claims and can match a legitimately reported fact, such as a small safety study that truly observed no adverse events. The checks depend on tables being parsed correctly and on the missing-marker list being complete. The thresholds of fifty cells, ten values, and three standard deviations are directional. Outlier and distributional checks on individual-patient data are handled in the D series, so S15 stays on these surface signs of suspicious cleanliness in the article's tables and text.

Theoretical background

S15 rests on the statistical expectation that real measurement processes leave traces of imperfection. Missing data arise from the practical realities of collection, non-response, equipment failure, loss to follow-up, so over a large table the probability that not one of many cells is missing is low unless the data were curated or invented; this is why the missing-data check requires a substantial cell count before treating completeness as notable, and why correctly recognising the symbols that denote a gap is essential to avoid mistaking a marked-up table for a complete one. Outliers, in turn, are expected from heavy-tailed real distributions and from the occasional genuine extreme, but their expected frequency depends on sample size: at three standard deviations a Gaussian variable places only about a quarter of one percent of its mass in the tails, so a column of a few dozen values will usually contain none even when the data are entirely real. S15 measures that distance with the robust median and median absolute deviation rather than the mean and standard deviation, because for roughly Gaussian data the MAD scaled by 1.4826 estimates the same standard deviation while resisting the distortion a single extreme value would otherwise cause, so the band stays anchored to the bulk of the data [4, 5, 6]. This is the key caveat that keeps the outlier sign weak, and it is why S15 never relies on the absence of outliers alone for a high score. Explicit textual claims of perfection add a third, independent channel: they reflect the author's framing rather than the data's structure, and their value lies in concurrence with the structural signs. Combining the three multiplicatively in the score reflects that any one is common but their joint appearance is the coherent signature of data presented as flawless.

References

Simonsohn U. Just Post It: The Lesson From Two Cases of Fabricated Data Detected by Statistics Alone. Psychological Science. 2013;24(10):1875-1888. DOI: 10.1177/0956797613480366
Carlisle JB. Data fabrication and other reasons for non-random sampling in 5087 randomised, controlled trials in anaesthetic and general medical journals. Anaesthesia. 2017;72(8):944-952. DOI: 10.1111/anae.13938
Buyse M, George SL, Evans S, et al. The role of biostatistics in the prevention, detection and treatment of fraud in clinical trials. Statistics in Medicine. 1999;18(24):3435-3451. https://pubmed.ncbi.nlm.nih.gov/10611617/
Leys C, Ley C, Klein O, Bernard P, Licata L. Detecting outliers: Do not use standard deviation around the mean, use absolute deviation around the median. Journal of Experimental Social Psychology. 2013;49(4):764-766. DOI: 10.1016/j.jesp.2013.03.013
Leys C, Delacre M, Mora YL, Lakens D, Ley C. How to Classify, Detect, and Manage Univariate and Multivariate Outliers, With Emphasis on Pre-Registration. International Review of Social Psychology. 2019;32(1):5. DOI: 10.5334/irsp.289
Rousseeuw PJ, Hubert M. Anomaly detection by robust statistics. WIREs Data Mining and Knowledge Discovery. 2018;8(2):e1236. DOI: 10.1002/widm.1236
Proschan MA, Shaw PA. Diagnosing fraudulent baseline data in clinical trials. PLoS ONE. 2020;15(10):e0239121. DOI: 10.1371/journal.pone.0239121
Carlisle JB. False individual patient data and zombie randomised controlled trials submitted to Anaesthesia. Anaesthesia. 2021;76(4):472-479. DOI: 10.1111/anae.15263
Bordewijk EM, Li W, van Eekelen R, Wang R, Showell M, Mol BW, van Wely M. Methods to assess research misconduct in health-related research: A scoping review. Journal of Clinical Epidemiology. 2021;136:189-202. DOI: 10.1016/j.jclinepi.2021.05.012
Wilkinson J, Heal C, Antoniou GA, et al. A survey of experts to identify methods to detect problematic studies: stage 1 of the INveStigating ProblEmatic Clinical Trials in Systematic Reviews project. Journal of Clinical Epidemiology. 2024;175:111512. DOI: 10.1016/j.jclinepi.2024.111512
Crone G, Green CD. Tools of the data detective: A review of statistical methods to detect data and result anomalies in psychology. Theory & Psychology. 2025;35(3):359-380. DOI: 10.1177/09593543241311861