Implausible Values (Table)
Checks whether the numbers in a table fall within physically possible ranges. A heart rate of 500, a blood pressure of 10, or a negative weight cannot occur in a real subject. The indicator matches each column header to a dictionary of measurements and their plausible ranges, using whole-word matching so a short alias does not latch onto an unrelated column.
Technical description
Loads a dictionary of physiological and clinical limits (each variable with a min, max, and aliases), extracts the table grid by OCR, and matches each header to a dictionary entry by WHOLE-WORD matching: the variable name or an alias must appear as a contiguous run of word tokens in the header, which prevents short aliases (such as the symbol for potassium) from matching unrelated columns like a week index. Each value in a matched column is graded: negative for a mandatory-positive quantity is critical, below half the minimum or above twice the maximum is an error, otherwise out of range is a warning. Scores 0 (none), 1.0 (warning), 3.0 (one error/critical), 4.5 (two or more).
How it works
Layer 1 (deterministic): matches each column header to the physiological dictionary by whole-word matching, then grades every value in matched columns by its deviation from the allowed range, with negative values for positive-only quantities flagged as critical. The flag counts set the score: none scores 0, a warning alone 1.0, one error or critical 3.0, two or more 4.5. Each flagged value names the column, variable, value, and range.
Why this matters
Range and plausibility checks are the first line of defence in data quality and integrity work, because an impossible value is unambiguous evidence of a typo, a unit error, or invention. Comparing each value against predefined physiological or logical limits is the core of detecting and diagnosing data abnormalities, and impossible values are among the simplest fabrication signals. The whole-word matching keeps the screen precise: a careless lookup firing on any header containing a short alias's letters would flag legitimate columns against the wrong limits.
Score thresholds
- 0-1
- Values lie within plausible ranges, or only slightly outside
- 2-3
- One value is well outside its physiological range, or negative where it cannot be
- 4-5
- Several impossible values, consistent with transcription errors or fabricated measurements
Limitations
The screen only checks variables in its dictionary, so an unlisted measurement is not assessed, and ranges are generous to avoid flagging genuine extremes, so a value just outside a clinical reference interval but biologically possible is treated leniently. Matching depends on OCR of the header and on a recognised naming; an unusual abbreviation or non-English header is missed. Ranges are population-level and unit-blind, so a value in different units than assumed can be wrongly flagged or cleared. The whole-word rule prevents short-alias false matches but can miss a header that runs words together. Arithmetic consistency and granularity are separate indicators; T11 stays on whether each value is physically possible.
References
- Van den Broeck J, Cunningham SA, Eeckels R, Herbst K. (2005). Data cleaning: detecting, diagnosing, and editing data abnormalities. PLoS Medicine 2(10):e267
- Carlisle JB. (2017). Data fabrication and other reasons for non-random sampling in 5087 randomised, controlled trials in anaesthetic and general medical journals. Anaesthesia 72(8):944-952
- Simonsohn U. (2013). Just Post It: The Lesson From Two Cases of Fabricated Data Detected by Statistics Alone. Psychological Science 24(10):1875-1888