ResAIKit
Research Integrity Toolkit
Back to the encyclopedia
T3Image forensicsTable AnalysisLayer 1 (Deterministic)

GRIMMER Test (Table)

Extends GRIM from means to standard deviations: when data are whole numbers the sum of their squares is an integer, so only certain standard deviations are reachable for a given mean and sample size. A reported SD that no integer sum of squares can produce is impossible and points to fabrication or a reporting error.

Technical description

Extracts the table grid by OCR, gates on it holding statistical data, finds mean/SD/N triplets, and for each SD s over n with integer total T = round(mean*n) computes the implied sum of squares Q = s^2*(n-1) + T^2/n, which must be a non-negative integer. It checks the two integers nearest Q: for each it reconstructs the variance (Q - T^2/n)/(n-1) and its SD, and passes if that reconstructed SD, rounded to the reported precision, matches the reported SD. Triplets with SD <= 0 or n <= 1 are skipped.

How it works

Layer 1 (deterministic): reads the table, confirms it holds statistical data, finds mean/SD/N triplets, and flags any SD not reproducible by an integer sum of squares over its sample size. The flag count sets the score: 0 failures scores 0, one 2.0, two or more 4.0. Each failure names the mean, SD, and sample size.

Why this matters

GRIMMER deepens the granularity argument behind GRIM: it tests variability rather than the mean, and because the sum of squares of integers is itself an integer, the reachable standard deviations are even more sparsely constrained than the reachable means. A standard deviation that fails GRIMMER while the mean passes GRIM is a specific signal that the dispersion was not computed from the same integer data as the mean. The test needs no raw data, only the mean, SD, and sample size.

Score thresholds

0-1
Every tested standard deviation is reproducible by an integer dataset of its reported size
2-3
One standard deviation cannot be produced by any integer dataset of the stated size
4-5
Two or more impossible standard deviations, consistent with fabricated or mis-reported descriptive statistics

Limitations

GRIMMER applies only when the underlying data are whole numbers, so it is meaningful on integer-scale measures such as Likert items and counts. The reported precision drives the test, and because the extractor stores parsed numeric values, an SD written with trailing zeros loses that precision and is tested more leniently than its printed form. It depends on OCR, so a misread digit can create or hide a failure. An SD computed with a population denominator or over unequal groups can fail legitimately. The statistical-data gate skips mostly-text tables. The mean version is indicator T2 and the chart-read version is indicator G12.

References

  1. Anaya J. (2016). The GRIMMER test: A method for testing the validity of reported measures of variability. PeerJ Preprints 4:e2400v1
  2. Brown NJL, Heathers JAJ. (2017). The GRIM Test: A Simple Technique Detects Numerous Anomalies in the Reporting of Results in Psychology. Social Psychological and Personality Science 8(4):363-369
  3. Nuijten MB, Hartgerink CHJ, van Assen MALM, Epskamp S, Wicherts JM. (2016). The prevalence of statistical reporting errors in psychology (1985-2013). Behavior Research Methods 48(4):1205-1226