T4Image forensicsTable AnalysisLayer 2 (Contextual)

SPRITE Test (Table)

Asks whether a reported mean and standard deviation could have come from real bounded data. For a measure on a known scale, such as a one-to-five Likert item, SPRITE searches by simulation for an integer sample that reproduces the reported statistics; if none can be found, the statistics are impossible. The indicator also applies an exact mathematical bound on how large a standard deviation can be for a given mean on a bounded scale, which immediately rejects impossible spreads near the ends of the scale. It works on the reported numbers and the scale alone.

Technical description

T4 is the table-image application of SPRITE, the Sample Parameter Reconstruction via Iterative TEchniques of Heathers and colleagues. Where GRIM and GRIMMER test the mean and the standard deviation separately, SPRITE asks the joint question: does any integer sample on the measurement scale have exactly this mean and this standard deviation. It extracts mean, standard deviation, and sample-size triplets from the table, detects the scale from headers or context with a default Likert range, and for each triplet attempts to construct a matching integer distribution by iterative simulation. A triplet for which no valid distribution exists is flagged. Because the simulation cannot prove impossibility within a finite number of attempts, T4 adds an exact analytic check: the variance of any distribution bounded on a scale is capped by a function of its mean, so a standard deviation above that cap is impossible regardless of simulation. As a Layer 2 indicator it uses simulation rather than a closed form, and it skips triplets with a sample size of one or a non-positive standard deviation.

How it works

The table grid is extracted by OCR and mean, standard deviation, and sample-size triplets are read. The scale minimum and maximum are taken from an explicit context override, or detected from header and cell text patterns such as one dash five or one to seven, defaulting to a one-to-seven Likert scale.

For each eligible triplet the SPRITE simulation draws integer samples on the scale, nudges them toward the reported mean by adjusting individual values within the scale bounds, and checks whether the resulting standard deviation matches the reported one within a small tolerance. If a match is found the triplet is possible. Independently, an analytic maximum-variance bound is applied: by the inequality of Popoviciu, sharpened by Bhatia and Davis, a distribution bounded on the interval from scale_min to scale_max with mean m has variance at most

(m - scale_min)(scale_max - m),

so the reported sample standard deviation cannot exceed the square root of that quantity times n over n minus one. A reported standard deviation above this cap is impossible, which the bound detects immediately even near the ends of the scale, where the maximum spread is far smaller than the midpoint range and a simulation fallback would otherwise give the benefit of the doubt. A triplet fails if either the simulation finds no match or the analytic cap is exceeded.

The score is 0 when no triplet fails, 4.0 for a single failure, and 4.5 for two or more. Each failure becomes a finding naming the mean, standard deviation, sample size, and scale. The metadata records the triplets found, tested, and skipped, the failure count, and the detected scale.

Score thresholds

Score	Meaning
0 to 1	Every tested mean and standard deviation is reproducible by a real bounded integer sample.
2 to 3	Reserved; SPRITE failures are scored at the higher band.
4 to 5	One or more mean and standard deviation pairs cannot arise from any integer sample on the scale. Consistent with fabricated statistics.

Why this matters

SPRITE was designed precisely to catch fabricated summary statistics that pass simpler tests: Heathers and colleagues showed that reconstructing candidate samples from a reported mean, standard deviation, sample size, and scale exposes combinations that are mathematically impossible or wildly implausible, and the technique has been used to unravel several high-profile cases of data fabrication [1]. It is the joint, distribution-level member of the granularity family that includes GRIM and GRIMMER, catching pairs of statistics that are each individually plausible but cannot coexist in any real bounded sample [3]. The analytic bound that T4 adds rests on a classical result in probability: the variance of a bounded random variable is limited by the product of its distances to the two ends of its range, a bound that Bhatia and Davis sharpened and that holds for every distribution, integer or not [2]. Together the simulation and the bound let T4 reject impossible dispersion both in the interior of the scale, where simulation is informative, and at its edges, where the closed-form cap is decisive. For bounded instruments like Likert scales, an impossible standard deviation is strong evidence that the numbers were not computed from data.

Limitations

SPRITE is only meaningful for data on a known bounded integer scale, so the scale must be detected correctly; a wrong or defaulted scale changes the verdict, and a continuous or unbounded measure is outside its scope. The simulation is stochastic and bounded in effort, so it can fail to find a valid distribution that exists, which is why a positive result is treated as possible and only the analytic cap proves impossibility with certainty. The test depends on optical character recognition for the statistics and the scale. Subscale means, reverse-coded items, and means aggregated over unequal groups can violate the simple bounded-scale assumption and fail legitimately. As a Layer 2 simulation it is slower and less deterministic than the Layer 1 granularity tests. The same reconstruction applied to values read from charts is indicator G12, and the separate mean and standard deviation granularity tests are indicators T2 and T3, so T4 stays on the joint feasibility of the pair on a bounded scale.

Theoretical background

T4 rests on the geometry of bounded samples. A sample of n values confined to an interval lives in a bounded region of n-dimensional space, and fixing its mean restricts it to a slice of that region; the standard deviation is the distance from the slice's centre, so it ranges only up to a finite maximum determined by the mean and the interval. That maximum is achieved by placing the mass at the two endpoints, which gives the variance bound (m - a)(b - m) for an interval from a to b, the sharp inequality behind the analytic cap. SPRITE explores the interior of the same region by simulation, looking for an integer point that matches both the mean and the standard deviation exactly; finding one proves possibility constructively, while the analytic cap proves impossibility whenever the reported spread lies outside the feasible region. A genuine dataset, computed from real bounded observations, always lies inside the region; a fabricated pair of statistics, chosen without regard to the geometry, frequently lies outside it, especially when the mean is near an endpoint and the reported standard deviation is large.

References

Heathers JAJ, Anaya J, van der Zee T, Brown NJL. Recovering data from summary statistics: Sample Parameter Reconstruction via Iterative TEchniques (SPRITE). PeerJ Preprints. 2018;6:e26968v1. DOI: 10.7287/peerj.preprints.26968v1
Bhatia R, Davis C. A better bound on the variance. The American Mathematical Monthly. 2000;107(4):353-357. DOI: 10.1080/00029890.2000.12005203
Brown NJL, Heathers JAJ. The GRIM Test: A Simple Technique Detects Numerous Anomalies in the Reporting of Results in Psychology. Social Psychological and Personality Science. 2017;8(4):363-369. DOI: 10.1177/1948550616673876