ResAIKit
Research Integrity Toolkit
Back to the encyclopedia
T14Image forensicsTable AnalysisLayer 1 (Deterministic)

Instrument-Specific (Table)

Checks whether values match the measurement instrument they were collected with. A score reported on a one-to-five Likert scale cannot be six, a Glasgow Coma Scale value cannot be below three, and a single integer-scale response cannot carry several decimal places. The indicator matches each column header exactly to a dictionary of named instruments and their scales, then checks the values against the scale bounds, the precision the instrument allows, and, for integer instruments, the granularity that a reported mean must obey. It works on the reported values and the column names alone.

Technical description

T14 is a deterministic, generator-agnostic check against a dictionary of measurement instruments, each defined by a minimum, a maximum, a type of integer or continuous, and a list of aliases. Where the implausible-values indicator T11 uses general physiological ranges, T14 uses the exact constraints of named instruments such as Likert scales, the Glasgow Coma Scale, and visual analogue scales. It extracts the table grid by OCR, matches each header exactly to an instrument, and applies three checks to matched columns: the values must lie within the instrument's scale, an integer instrument's values must not carry excessive decimal precision, and, when an integer instrument is present, the reported means must pass the GRIM granularity test, restricted to means that actually fall within the instrument's scale. The flag types set the score.

How it works

The instrument dictionary is loaded, and each column header is matched to an instrument only by exact equality with the instrument key or one of its aliases, a deliberately strict rule that avoids matching an unrelated column to an instrument by partial overlap. For each matched column, every data value is checked against the instrument's scale bounds, and a value below the minimum or above the maximum is flagged as out of scale at critical severity. For integer-type instruments, each value's decimal places are counted, and more than two decimals is flagged as excess precision at informational severity, since an integer-scale measurement should not carry fine fractional detail.

When at least one integer instrument is matched, the GRIM test is applied to the table's mean, standard deviation, and sample-size triplets, because the means then pertain to integer-scale data. To keep this sound, GRIM is applied only to triplets whose mean lies within the matched integer instrument's scale; a mean outside that scale belongs to a different variable and is not tested under the integer-instrument assumption. A GRIM failure is flagged as a warning. The score is 4.5 if any GRIM failure occurs, 4.0 if any value is out of scale, 2.0 if only excess precision is found, and 0 otherwise. The metadata records the matched instruments, the flag count and details, and the GRIM-failure count.

Score thresholds

Score Meaning
0 to 1 Values conform to their instruments' scales, precision, and granularity.
2 to 3 Integer-instrument values carry excess decimal precision.
4 to 5 Values fall outside an instrument's scale, or a mean fails GRIM for an integer instrument. Consistent with errors or fabrication.

Why this matters

Measurement instruments impose exact, well-documented constraints, so a value that violates them is an unambiguous error or fabrication rather than a borderline judgement. Comparing values against the valid range and properties of the instrument that produced them is a core data-cleaning and integrity check, the instrument-aware special case of the range and edit checks that frameworks for detecting data abnormalities prescribe [2]. For integer-scale instruments the constraint goes further: because responses are whole numbers, a reported mean is an integer total divided by the sample size and must satisfy the GRIM granularity test, the technique Brown and Heathers used to expose impossible means in psychology, where Likert scales are ubiquitous [1]. Restricting GRIM to means inside the instrument's scale keeps the test attached to the variable it describes. Out-of-scale values and impossible means on standardized instruments are exactly the kind of concrete, checkable violations that forensic re-analysis of trials and surveys relies on, because they require no modelling assumptions to interpret [3]. By tying each check to the specific instrument named in the header, T14 turns documented measurement constraints into a direct test of validity.

Limitations

The check applies only to instruments in its dictionary and only when a header names one exactly, so an unrecognised instrument, an unusual abbreviation, or a header that combines an instrument name with other words is missed; this exact-match rule trades coverage for precision. The precision check assumes the column holds instrument values rather than summary statistics, so a column of means legitimately carrying decimals could be flagged if it is matched as an instrument column. Decimal counting works on parsed numeric values, so a value written with trailing zeros loses that precision. The GRIM step infers the granularity from the reported mean and applies it under the integer-instrument assumption, which the in-scale restriction guards but does not perfectly establish. The thresholds are directional rather than exact. General physiological plausibility is the broader indicator T11, and the granularity tests on means and standard deviations are indicators T2 and T3, so T14 stays on instrument-specific validity.

Theoretical background

T14 rests on the fact that an instrument defines the set of values it can produce. A bounded scale admits only values in its interval, and an integer scale admits only whole numbers, so the feasible set of a single measurement is known exactly from the instrument's specification. Aggregates inherit derived constraints: the mean of integer responses is a rational number whose denominator is the sample size, which is why GRIM applies, and the spread is similarly constrained. Fabrication and transcription errors violate these constraints because they treat the value as a free number rather than as an output of a specific measuring process, producing scores beyond the scale, fractional values where only integers are possible, or means that no integer dataset on the scale could yield. The exactness of the matching matters because each constraint is meaningful only for the instrument that defines it: applying a Likert scale's bounds to a continuous measure, or its integer granularity to a different variable, would be a category error, which is why T14 matches headers strictly and restricts the granularity test to means that fall within the instrument's own scale. Reading values against their instrument's feasible set turns measurement specifications into a validity test.

References

  1. Brown NJL, Heathers JAJ. The GRIM Test: A Simple Technique Detects Numerous Anomalies in the Reporting of Results in Psychology. Social Psychological and Personality Science. 2017;8(4):363-369. DOI: 10.1177/1948550616673876
  2. Van den Broeck J, Cunningham SA, Eeckels R, Herbst K. Data cleaning: detecting, diagnosing, and editing data abnormalities. PLoS Medicine. 2005;2(10):e267. DOI: 10.1371/journal.pmed.0020267
  3. Carlisle JB. Data fabrication and other reasons for non-random sampling in 5087 randomised, controlled trials in anaesthetic and general medical journals. Anaesthesia. 2017;72(8):944-952. DOI: 10.1111/anae.13938