ResAIKit
Research Integrity Toolkit
Back to the encyclopedia
T2Image forensicsTable AnalysisLayer 1 (Deterministic)

GRIM Test (Table)

Applies the GRIM test to means in a table: when a mean averages whole-number data such as Likert items or counts, only certain values are mathematically reachable for a given sample size, so a mean no integer total can reproduce is impossible and points to fabrication or a reporting error.

Technical description

Extracts the table grid by OCR, gates on it holding statistical data, locates mean/SD/N triplets, and on integer-scale columns checks whether each reported mean m to d decimals is reachable: round(T/n, d) == m for some integer total T, tested at floor(m*n) and ceil(m*n). The reachability formulation is the exact GRIM criterion (accepts 3.47 with n=15 via 52/15, rejects 3.53 with n=10). The decimals are taken from the reported mean; an integer mean is trivially consistent.

How it works

Layer 1 (deterministic): reads the table, confirms it holds statistical data, finds mean/SD/N triplets, and on integer-scale columns flags any mean not reproducible by an integer total over its sample size. The flag count sets the score: 0 failures scores 0, one 2.0, two 3.5, three or more 4.5. Each failure names the mean, sample size, and product.

Why this matters

GRIM is the simplest and one of the most powerful numerical-consistency tests in forensic metascience: a mean of n whole-number observations equals an integer total divided by n, so only a discrete set of values can occur for a given n, and the gaps widen as n shrinks. A reported mean that no integer dataset of the stated size can produce is not a rounding artefact but evidence the number was invented or the sample size misstated. The test needs no raw data, only the mean and N.

Score thresholds

0-1
Every tested mean is reproducible by an integer dataset of its reported size
2-3
One or two means cannot be produced by any integer dataset of the stated size
4-5
Three or more impossible means, consistent with fabricated or mis-reported descriptive statistics

Limitations

GRIM applies only when the underlying data are whole numbers, so the indicator restricts itself to columns it identifies as integer-scale and is conservative, skipping non-integer-valued mean columns. It depends on OCR, so a misread digit or sample size can create or mask a failure, and the reported number of decimals must be read correctly because the granularity depends on it. A mean aggregated over unequal groups, or a trimmed or weighted mean, can fail legitimately. The statistical-data gate skips mostly-text tables. The same test on chart-read values is indicator G12 and the extension to standard deviations is indicator T3.

References

  1. Brown NJL, Heathers JAJ. (2017). The GRIM Test: A Simple Technique Detects Numerous Anomalies in the Reporting of Results in Psychology. Social Psychological and Personality Science 8(4):363-369
  2. Nuijten MB, Hartgerink CHJ, van Assen MALM, Epskamp S, Wicherts JM. (2016). The prevalence of statistical reporting errors in psychology (1985-2013). Behavior Research Methods 48(4):1205-1226
  3. Carlisle JB. (2017). Data fabrication and other reasons for non-random sampling in 5087 randomised, controlled trials in anaesthetic and general medical journals. Anaesthesia 72(8):944-952