ResAIKit
Research Integrity Toolkit
Back to the encyclopedia
S18Statistical analysisStatistical ConsistencyLayer 1 (Deterministic)

Implausible Values

Checks reported clinical and physiological values against known plausible ranges. A heart rate of 300, a blood pressure of 1000, or a negative age cannot occur in a living patient, so a value outside the accepted range for its variable is either an error or invented. The indicator recognises the variable from a table column header or a reported statistic's label, looks up its physiological range, and flags values that fall outside it, marking impossible values such as negatives or extreme outliers more severely. It works on the reported numbers alone.

Technical description

S18 is a deterministic, dictionary-driven screen for biologically implausible values. It loads a physiological-ranges dictionary that maps clinical variables and their aliases to a plausible minimum and maximum, for example a heart rate range or a blood-pressure range. It then matches the variable named in each table column header and in each reported mean-standard-deviation-sample-size triplet against that dictionary. To match a real header, which often carries units, the name is first normalised by removing a trailing parenthetical unit such as years or millimetres of mercury and a trailing comma-qualifier such as kilograms, then looked up exactly; this recognises headers like Age in years or Weight in kilograms without matching a dictionary term as a substring of an unrelated header. For each matched variable, every numeric cell in the column, or the reported mean, is classified: a value outside the range is a warning, while a negative value for a non-negative variable, an extreme value below half the minimum or above twice the maximum, or a value beyond a hard physiological or definitional bound (a percentage above 100, or a Glasgow Coma Scale outside 3 to 15) is an error. The score rises with the number and severity of the flags.

How it works

Each table column header and each triplet label is normalised and looked up in the range dictionary. When a variable matches, its numeric values are parsed, with a trailing percent sign tolerated, and each is classified against the variable's range. The classification has three outcomes: a value within the range produces nothing, a value outside the range but not extreme produces a warning, and a value that is negative for a non-negative variable, below half the minimum or above twice the maximum, or beyond a hard definitional bound such as a percentage ceiling of 100 or the Glasgow Coma Scale range of 3 to 15, produces an error. Each finding names the variable, the value, the source location, and the expected range.

The score is set by the most serious findings: two or more errors score 4.5, exactly one error scores 3.0, warnings only score 1.0, and no flags score 0, with the total capped at 5.0. The metadata records the matched variable names and the counts of warnings and errors.

Score thresholds

Score Meaning
0 All recognised variables hold values within their plausible ranges.
1 One or more values outside the plausible range but not impossible.
3 One impossible value: a negative for a non-negative variable, or an extreme outlier.
4 to 5 Two or more impossible values.

Why this matters

Plausibility limits are among the firmest constraints in clinical data, because human physiology and measurement instruments cannot produce a heart rate of several hundred beats per minute or a negative blood pressure, so a value beyond these bounds is unambiguously wrong. Forensic re-analyses of trials by Carlisle treat physiologically impossible and implausible values as direct evidence of error or fabrication [1], and broader reviews of data fraud in clinical research list out-of-range and impossible values among the standard checks for invented data [2]. Al-Marzouki and colleagues, in demonstrating statistical detection of fabrication, similarly examined whether reported values fell within credible bounds [3]. The reason range-checking is powerful is that it needs no comparison group or model: a single number is judged against what is biologically possible, so an impossible value is decisive on its own, and a pattern of out-of-range values points to careless transcription or fabricated data. Distinguishing merely out-of-range warnings from impossible errors, such as negatives, extreme outliers, and values beyond a hard definitional bound, keeps the most serious violations visible; the plausibility dimension of established data-quality frameworks formalises exactly this kind of range and hard-limit check [4, 5], and recent forensic re-analyses, scoping reviews, and trustworthiness instruments treat impossible values as a routine integrity screen [6, 7, 8, 9].

Limitations

The indicator can only check variables that appear in its physiological-ranges dictionary and whose header or label it can match, so an unrecognised variable name, an unusual abbreviation, or a non-English label is silently skipped. Matching relies on normalising away unit decorations and then an exact lookup, which is conservative and avoids false matches but will miss a header phrased differently from the dictionary entries. The ranges are population-level plausibility bounds, so a value can be within range yet wrong for a specific context, or outside range yet genuine in an extreme but real case, which is why most out-of-range values are flagged as warnings rather than errors. The check applies to a column's individual cells and to reported means, not to dispersion, so an implausible standard deviation is not caught here. The thresholds for extreme values, half the minimum and twice the maximum, are heuristic, although the hard definitional bounds it also applies (a percentage ceiling of 100, the Glasgow Coma Scale range of 3 to 15) are exact. The table-image version of this range check is indicator T11, and instrument-specific range and scale checks are indicator S19, so S18 stays on physiological plausibility of the values in the reported tables and statistics.

Theoretical background

S18 rests on the existence of hard physical and biological bounds on measured quantities. Every clinical variable has a domain fixed by physiology and by the measuring instrument: a count cannot be negative, an age cannot exceed the human lifespan, a blood pressure cannot be zero in a living patient, and a percentage cannot exceed one hundred. These bounds are not statistical estimates but constraints on what the world can produce, so a reported value outside them is impossible rather than merely unlikely, which is what makes the check decisive where probabilistic tests only suggest. The dictionary encodes a generous version of each bound, wide enough that genuine extremes fall inside it, so that a violation is meaningful. The two-tier severity reflects two kinds of violation: a value just outside the generous range may be a real outlier or a transcription slip, and is flagged as a warning, whereas a value that is negative where negativity is impossible, or that exceeds the bound by a large factor, cannot be reconciled with any real measurement and is flagged as an error. The exact-match-after-normalisation strategy reflects a deliberate trade-off: by stripping only unit decorations and never matching a term inside a larger word, the indicator accepts some missed variables in exchange for avoiding the false matches that a looser substring search would create. Accordingly, S18 routes a value beyond a hard definitional bound, such as an oxygen saturation above 100 percent or a Glasgow Coma Scale outside 3 to 15, directly to an error, consistent with the hard-limit plausibility checks of standard data-quality frameworks [4, 5].

References

  1. Carlisle JB. Data fabrication and other reasons for non-random sampling in 5087 randomised, controlled trials in anaesthetic and general medical journals. Anaesthesia. 2017;72(8):944-952. DOI: 10.1111/anae.13938
  2. George SL, Buyse M. Data fraud in clinical trials. Clinical Investigation. 2015;5(2):161-173. DOI: 10.4155/cli.14.116
  3. Al-Marzouki S, Evans S, Marshall T, Roberts I. Are these data real? Statistical methods for the detection of data fabrication in clinical trials. BMJ. 2005;331(7511):267-270. DOI: 10.1136/bmj.331.7511.267
  4. Kahn MG, Callahan TJ, Barnard J, et al. A Harmonized Data Quality Assessment Terminology and Framework for the Secondary Use of Electronic Health Record Data. eGEMs. 2016;4(1):1244. DOI: 10.13063/2327-9214.1244
  5. Weiskopf NG, Weng C. Methods and dimensions of electronic health record data quality assessment: enabling reuse for clinical research. Journal of the American Medical Informatics Association. 2013;20(1):144-151. DOI: 10.1136/amiajnl-2011-000681
  6. Carlisle JB. False individual patient data and zombie randomised controlled trials submitted to Anaesthesia. Anaesthesia. 2021;76(4):472-479. DOI: 10.1111/anae.15263
  7. Bordewijk EM, Li W, van Eekelen R, Wang R, Showell M, Mol BW, van Wely M. Methods to assess research misconduct in health-related research: A scoping review. Journal of Clinical Epidemiology. 2021;136:189-202. DOI: 10.1016/j.jclinepi.2021.05.012
  8. Wilkinson J, Heal C, Antoniou GA, et al. A survey of experts to identify methods to detect problematic studies: stage 1 of the INveStigating ProblEmatic Clinical Trials in Systematic Reviews project. Journal of Clinical Epidemiology. 2024;175:111512. DOI: 10.1016/j.jclinepi.2024.111512
  9. Crone G, Green CD. Tools of the data detective: A review of statistical methods to detect data and result anomalies in psychology. Theory & Psychology. 2025;35(3):359-380. DOI: 10.1177/09593543241311861