ResAIKit
Research Integrity Toolkit
Back to the encyclopedia
T11Image forensicsTable AnalysisLayer 1 (Deterministic)

Implausible Values (Table)

Checks whether the numbers in a table fall within physically possible ranges. A reported heart rate of five hundred, a blood pressure of ten, or a negative weight cannot occur in a real subject, so values outside known physiological limits point to a transcription error or fabrication. The indicator matches each column header to a dictionary of measurements and their plausible ranges, using whole-word matching so that a short alias does not latch onto an unrelated column, and grades each value by how far it falls outside its range. It works on the reported numbers and the column names alone.

Technical description

T11 is a deterministic, generator-agnostic range check against a dictionary of physiological and clinical limits. Each variable in the dictionary, such as systolic blood pressure or heart rate, carries a minimum and maximum that bound the biologically possible, together with a list of aliases. T11 extracts the table grid by OCR, matches each column header to a dictionary entry by whole-word matching, and classifies every value in a matched column by its deviation from the allowed range. A value modestly outside the range is a warning, a value far outside is an error, and a negative value for a quantity that must be positive is critical. The flag counts set the score. The matching is deliberately conservative: it requires the variable name or an alias to appear as a complete word in the header, so a two-letter alias like the symbol for potassium does not match an unrelated column such as a week index, which would otherwise test those values against the wrong range.

How it works

The image must be at least 50 pixels on a side and yield a table with data rows. The physiological dictionary is loaded, and each column header is matched against the dictionary keys and aliases. Matching tokenizes the header into words and accepts a candidate only when its own token sequence appears as a contiguous run of whole tokens in the header, which is case-insensitive and forgiving of punctuation but not of partial-word overlaps. This whole-word rule is the safeguard against the substring false positives that short aliases otherwise produce.

For each matched column, every data value is graded. A negative value for a mandatory-positive measurement is critical, since such a quantity cannot be below zero. A value within the range passes. A value below half the minimum or above twice the maximum is an error, well beyond any plausible measurement. A value outside the range but within those wide bounds is a warning. The score follows the counts: no out-of-range values score 0, a warning alone scores 1.0, a single error or critical scores 3.0, and two or more errors or criticals score 4.5. Each flagged value becomes a finding naming the column, the matched variable, the value, and the range. The metadata records the number of matched columns and the counts of warnings, errors, and criticals.

Score thresholds

Score Meaning
0 to 1 Values lie within plausible ranges, or only slightly outside.
2 to 3 One value is well outside its physiological range, or negative where it cannot be.
4 to 5 Several impossible values. Consistent with transcription errors or fabricated measurements.

Why this matters

Range and plausibility checks are the first line of defence in data quality and integrity work, because an impossible value is unambiguous evidence that something is wrong, whether a typo, a unit error, or invention. The standard data-cleaning framework of Van den Broeck and colleagues makes such checks, comparing each value against predefined physiological or logical limits, the core of the detecting and diagnosing stages of cleaning a dataset [1]. In the integrity context, impossible values are among the simplest fabrication signals, sitting alongside the distributional and granularity tests that forensic re-analysis of trials applies to catch invented data [2], and statistical detective work routinely starts by checking that reported values could exist at all [3]. The discipline T11 adds is in the matching: a careless dictionary lookup that fired on any header containing the letters of a short alias would flag legitimate columns against the wrong limits, so requiring a whole-word match keeps the screen precise. By grading severity, T11 separates a value just over a clinical threshold from one that no living subject could produce.

Limitations

The screen can only check variables in its dictionary, so an unlisted measurement is not assessed, and the plausible ranges are necessarily generous to avoid flagging genuine extremes, which means a value just outside a clinical reference interval but still biologically possible is treated leniently. Matching depends on optical character recognition of the header and on the header naming the variable in a recognised way; an unusual abbreviation or a non-English header is missed. The ranges are population-level and do not account for units, so a value reported in different units than the dictionary assumes can be wrongly flagged or wrongly cleared. The whole-word rule prevents short-alias false matches but can miss a header that runs words together without separators. The thresholds are directional rather than exact. Internal arithmetic consistency and granularity are separate indicators, so T11 stays on whether each individual value is physically possible.

Theoretical background

T11 rests on the existence of hard physical and biological bounds. Every physiological quantity is constrained by physics and anatomy: a temperature cannot fall below absolute limits compatible with life, a count cannot be negative, a concentration cannot exceed saturation. These bounds are not statistical tendencies but absolute limits, so a value beyond them did not come from a measurement of a real subject. The check is therefore a membership test against a known feasible set, graded by distance because the further outside the set a value lies, the less any rounding or unit slip can explain it, and a negative value for an inherently non-negative quantity is the cleanest impossibility of all. The matching step is where the test earns its reliability: the bound is meaningful only when applied to the variable it describes, so the header must genuinely name that variable, which is why whole-word matching, rather than loose substring overlap, is essential to avoid checking a value against an irrelevant limit. Reading values against their feasible sets turns domain knowledge of measurement limits into a direct test of whether the data could exist.

References

  1. Van den Broeck J, Cunningham SA, Eeckels R, Herbst K. Data cleaning: detecting, diagnosing, and editing data abnormalities. PLoS Medicine. 2005;2(10):e267. DOI: 10.1371/journal.pmed.0020267
  2. Carlisle JB. Data fabrication and other reasons for non-random sampling in 5087 randomised, controlled trials in anaesthetic and general medical journals. Anaesthesia. 2017;72(8):944-952. DOI: 10.1111/anae.13938
  3. Simonsohn U. Just Post It: The Lesson From Two Cases of Fabricated Data Detected by Statistics Alone. Psychological Science. 2013;24(10):1875-1888. DOI: 10.1177/0956797613480366