Instrument-Specific
Checks reported values against the fixed scales of the measurement instruments they come from. A Likert item runs 1 to 5, a visual analogue scale 0 to 10, the Mini-Mental State Examination 0 to 30, so a score outside that range is impossible, an integer-only instrument reported with several decimals on a single observation is suspect, and the mean of an integer-scale instrument must satisfy the GRIM granularity test. The indicator recognises the instrument from a column header or a statistic's label and applies these scale-specific checks.
Technical description
A deterministic, dictionary-driven validator of reported values against named instrument scales. It loads an instrument-scales dictionary mapping instruments and aliases to (min, max, type, precision), covering Likert items, the visual analogue scale, the Glasgow Coma Scale, and the Mini-Mental State Examination, and matches the instrument named in each table header and each (mean, SD, n) triplet label. Four checks: a value outside [min, max] is an out-of-scale error; an individual value on an integer-only instrument with more than two decimal places is an excess-precision warning; and for an integer-scale instrument, a reported mean is run through the corrected shared GRIM reachability test, with a mean no integer dataset of the stated size could produce flagged as a GRIM failure; and a triplet whose mean passes GRIM has its reported SD checked with the shared GRIMMER test (Anaya 2016), flagging an SD no integer dataset of that mean and size could yield. The most serious finding sets the score.
How it works
Layer 1 (deterministic): each table header and triplet label is matched against the instrument dictionary. For a matched column, every numeric cell is compared against the range (out-of-scale is an error), and for an integer-type instrument a cell with more than two decimals is an excess-precision warning. For a matched triplet on an integer-type instrument with positive n, the mean is GRIM-tested via the corrected shared reachability test (replacing an earlier over-strict local check that rejected valid means such as 3.47 with n=15). A triplet whose mean passes GRIM has its SD checked with the shared GRIMMER test, flagging an SD no integer dataset could reproduce. Score (highest wins): any GRIM or GRIMMER failure 4.5, else any out-of-scale 4.0, else any precision issue 2.0, else 0. Out-of-scale is error severity, GRIM failure warning, precision informational. Metadata records matched_instruments, out_of_scale, precision_issues, grim_failures, and grimmer_failures.
Why this matters
Measurement instruments impose exact constraints real data must respect, so a violation is decisive evidence of error or fabrication. Scale bounds are definitional: an MMSE score cannot exceed 30 because the instrument has only thirty points, so a reported 34 is impossible. The granularity constraint is equally hard for integer scales: Brown and Heathers showed the mean of whole-number responses can only take specific values for a given sample size, and a mean outside that set cannot come from real data. Forensic re-analyses of trials treat both impossible scale values and granularity failures as primary fabrication signals. Tying each check to the specific instrument sharpens it, because the same number can be valid on one scale and impossible on another. Using the corrected reachability form of GRIM ensures valid means are not flagged, so a GRIM failure here is a genuine impossibility.
Score thresholds
- 0
- All recognised instrument values are within scale and granularity-consistent.
- 2
- Excess decimal precision on an integer-scale instrument, without harder violations.
- 4
- One or more values outside an instrument's valid range.
- 4-5
- A reported mean that fails the GRIM test for its integer instrument.
Limitations
Can only validate instruments in its scales dictionary whose header or label it matches exactly after case-folding, so an instrument named unusually, abbreviated unfamiliarly, or carrying extra qualifiers may be missed. The GRIM check applies only to integer-scale instruments and needs the mean, sample size, and precision extracted correctly; a trimmed, weighted, or subgroup-aggregated mean can fail GRIM legitimately; the GRIMMER check additionally needs the reported SD and runs only on GRIM-consistent means. The excess-precision check uses a two-decimal threshold on individual values and does not apply to means, where decimals are expected. Out-of-scale checking assumes the dictionary bounds are the true limits, so a non-standard or modified instrument version could be misjudged. The thresholds and highest-wins scoring are directional. The general physiological-range version is S18, the table-image instrument check is T14, and the standalone GRIM test on text means is S3.
References
- Folstein MF, Folstein SE, McHugh PR. (1975). Mini-mental state. A practical method for grading the cognitive state of patients for the clinician. Journal of Psychiatric Research 12(3):189-198
- Brown NJL, Heathers JAJ. (2017). The GRIM Test: A Simple Technique Detects Numerous Anomalies in the Reporting of Results in Psychology. Social Psychological and Personality Science 8(4):363-369
- Carlisle JB. (2017). Data fabrication and other reasons for non-random sampling in 5087 randomised, controlled trials in anaesthetic and general medical journals. Anaesthesia 72(8):944-952
- Anaya J. (2016). The GRIMMER test: A method for testing the validity of reported measures of variability. PeerJ Preprints 4:e2400v1
- van der Zee T, Anaya J, Brown NJL. (2017). Statistical heartburn: an attempt to digest four pizza publications from the Cornell Food and Brand Lab. BMC Nutrition 3:54
- Carlisle JB. (2021). False individual patient data and zombie randomised controlled trials submitted to Anaesthesia. Anaesthesia 76(4):472-479
- Bordewijk EM, Li W, van Eekelen R, Wang R, Showell M, Mol BW, van Wely M. (2021). Methods to assess research misconduct in health-related research: A scoping review. Journal of Clinical Epidemiology 136:189-202
- Wilkinson J, Heal C, Antoniou GA, et al.. (2024). A survey of experts to identify methods to detect problematic studies: stage 1 of the INveStigating ProblEmatic Clinical Trials in Systematic Reviews project. Journal of Clinical Epidemiology 175:111512
- Crone G, Green CD. (2025). Tools of the data detective: A review of statistical methods to detect data and result anomalies in psychology. Theory & Psychology 35(3):359-380