SPRITE Test (Stats)
Tests whether a reported mean and standard deviation on a bounded scale could have come from any real dataset of the stated size. For data confined between a known minimum and maximum, such as a 1-to-7 Likert item, the standard deviation cannot exceed a mathematical ceiling set by the mean and the scale limits. A standard deviation above the ceiling, or one for which no sample can be built, is impossible and points to fabrication.
Technical description
Applies SPRITE (Sample Parameter Reconstruction via Iterative TEchniques) to each triplet from the statistical context: it reads or infers a scale minimum a and maximum b and runs a Monte Carlo search for an integer sample of size n, all values in [a, b], whose mean and SD match the reported pair within rounding. Two exact necessary conditions run alongside the search: the reported mean must lie within [a, b], because the mean of bounded values is itself bounded; and on a bounded scale the population variance with mean m cannot exceed (m-a)(b-m) (Popoviciu, tightened by Bhatia and Davis), so the sample SD cannot exceed sqrt((m-a)(b-m) * n/(n-1)). A mean outside the scale, or an SD exceeding the ceiling by more than 0.05, is rejected analytically without needing the search to fail. Shared with the table-image indicator T4. Each impossible triplet is a flag; the count sets the score (0, 4.0, 4.5).
How it works
Layer 2 (stochastic reconstruction with an exact analytic pre-check): the scale is taken from context or inferred from the text (default 1 to 5), and triplets with n <= 1 are skipped. For each remaining triplet the shared sprite_check searches for a matching bounded integer sample; in addition the reported mean is rejected if it falls outside [a, b], and the sample SD is rejected if it exceeds sqrt((m-a)(b-m) * n/(n-1)) by more than 0.05 when the mean is inside the scale. A pair failing the search or either exact condition is flagged (severity error). The flag count maps to the score: zero failures 0.0, one 4.0, two or more 4.5. Each failure names the mean, SD, sample size, and scale; metadata records the counts and the detected scale.
Why this matters
SPRITE turns a reported mean and standard deviation into a constructive question: can any real sample produce them on this scale? When the answer is no the result is decisive, and Heathers and colleagues used SPRITE to expose impossible and implausible distributions behind published means in several high-profile cases. The analytic variance bound makes one whole class of impossibility immediate and certain, independent of any reconstruction. Forensic re-analysis of trials treats such impossible summary statistics, alongside GRIM and GRIMMER failures, as a primary fabrication signal.
Score thresholds
- 0-1
- Every tested pair is reconstructible by some bounded integer dataset of its reported size.
- 2-3
- One or two mean and standard deviation pairs cannot arise from any dataset on the stated scale.
- 4-5
- Three or more impossible pairs, consistent with fabricated or mis-reported descriptive statistics.
Limitations
Applies to bounded, typically integer, scales, so it needs the minimum and maximum to be known or reliably inferable; a wrong or defaulted scale (1 to 5 when none is detected) weakens both the reconstruction and the bound. The Monte Carlo search is stochastic, so a difficult but possible pair can occasionally go unreconstructed within the iteration budget, which is why a reconstruction-only flag is a screening signal, whereas the mean-in-range and variance-ceiling checks are exact and certain when they fire. The ceiling assumes the sample standard deviation and a mean inside the scale. It needs all three of mean, standard deviation, and sample size. The thresholds are directional. The mean-only test is S3, the standard-deviation test is S4, and the table-image version is T4.
References
- Heathers JAJ, Anaya J, van der Zee T, Brown NJL. (2018). Recovering data from summary statistics: Sample Parameter Reconstruction via Iterative TEchniques (SPRITE). PeerJ Preprints 6:e26968v1
- Wilner S, Wood S, Simons DJ. (2019). Complete recovery of values in Diophantine systems (CORVIDS). Behavior Research Methods 51(4):1766-1781
- Bhatia R, Davis C. (2000). A Better Bound on the Variance. The American Mathematical Monthly 107(4):353-357
- Crone G, Green CD. (2025). Tools of the data detective: A review of statistical methods to detect data and result anomalies in psychology. Theory & Psychology 35(3):359-380
- Carlisle JB. (2017). Data fabrication and other reasons for non-random sampling in 5087 randomised, controlled trials in anaesthetic and general medical journals. Anaesthesia 72(8):944-952
- Bordewijk EM, Li W, van Eekelen R, Wang R, Showell M, Mol BW, van Wely M. (2021). Methods to assess research misconduct in health-related research: A scoping review. Journal of Clinical Epidemiology 136:189-202
- Hunter KE, Aberoumand M, Libesman S, et al.. (2024). The Individual Participant Data Integrity Tool for assessing the integrity of randomised trials. Research Synthesis Methods 15(6):917-939
- Wilkinson J, Heal C, Antoniou GA, et al.. (2024). A survey of experts to identify methods to detect problematic studies: stage 1 of the INveStigating ProblEmatic Clinical Trials in Systematic Reviews project. Journal of Clinical Epidemiology 175:111512