S18Statistical analysisStatistical ConsistencyLayer 1 (Deterministic)

Implausible Values

Checks reported clinical and physiological values against known plausible ranges. A heart rate of 300, a blood pressure of 1000, or a negative age cannot occur in a living patient, so a value outside the accepted range for its variable is either an error or invented. The indicator recognises the variable from a table column header or a reported statistic's label, looks up its physiological range, and flags values that fall outside it, marking impossible values such as negatives or extreme outliers more severely.

Technical description

A deterministic, dictionary-driven screen for biologically implausible values. It loads a physiological-ranges dictionary mapping clinical variables and aliases to a plausible (min, max), and matches the variable named in each table column header and each (mean, SD, n) triplet label against it. To match real headers, which carry units, the name is first normalised by removing a trailing parenthetical unit (such as years or mmHg) and a trailing comma-qualifier (such as kg), then looked up exactly; this recognises headers like 'Age (years)' or 'Weight, kg' without matching a dictionary term as a substring of an unrelated header. For each matched variable, every numeric cell in the column (or the reported mean) is classified: outside the range is a warning, while a negative value for a non-negative variable, an extreme value below half the minimum or above twice the maximum, or a value beyond a hard definitional bound (a percentage above 100, or a Glasgow Coma Scale outside 3 to 15) is an error.

How it works

Layer 1 (deterministic): each table header and triplet label is normalised (strip parenthetical units and trailing comma-qualifier) and looked up exactly in the range dictionary. For a matched variable, numeric values are parsed (a trailing percent tolerated) and classified: within range gives nothing; outside but not extreme gives a warning; negative for a non-negative variable, below half the minimum or above twice the maximum, or beyond a hard definitional bound (a percentage ceiling of 100, the Glasgow Coma Scale 3 to 15), gives an error. Score: two or more errors 4.5, exactly one error 3.0, warnings only 1.0, none 0, capped at 5.0. Metadata records matched_columns, warnings, and errors.

Why this matters

Plausibility limits are among the firmest constraints in clinical data: physiology and instruments cannot produce a heart rate of several hundred or a negative blood pressure, so a value beyond these bounds is unambiguously wrong. Carlisle's forensic re-analyses treat physiologically impossible and implausible values as direct evidence of error or fabrication, and reviews of clinical-trial fraud list out-of-range and impossible values among the standard checks. Al-Marzouki and colleagues likewise examined whether reported values fell within credible bounds. Range-checking is powerful because it needs no comparison group or model: a single number is judged against what is biologically possible, so an impossible value is decisive on its own. Distinguishing out-of-range warnings from impossible errors (negatives, extreme outliers) keeps the most serious violations visible.

Score thresholds

0: All recognised variables hold values within their plausible ranges.
1: One or more values outside the plausible range but not impossible.
3: One impossible value: a negative for a non-negative variable, or an extreme outlier.
4-5: Two or more impossible values.

Limitations

Can only check variables in its physiological-ranges dictionary whose header or label it can match, so an unrecognised name, unusual abbreviation, or non-English label is skipped. Matching normalises away unit decorations then does an exact lookup, which is conservative (avoids false matches) but misses headers phrased differently from the dictionary. The ranges are population-level plausibility bounds, so a value can be in range yet wrong for a context, or out of range yet genuine in a real extreme case, which is why most out-of-range values are warnings, errors being reserved for negatives, extreme outliers, and values beyond a hard definitional bound. It checks individual cells and reported means, not dispersion, so an implausible standard deviation is not caught here. The extreme-value thresholds (half the minimum, twice the maximum) are heuristic, though the hard definitional bounds it also applies (a percentage ceiling of 100, the Glasgow Coma Scale 3 to 15) are exact. The table-image version is T11, and instrument-specific range and scale checks are S19.