ResAIKit
Research Integrity Toolkit
Back to the encyclopedia
D5Statistical analysisFabrication DetectionLayer 2 (Contextual)

Longitudinal Impossibility

Follows each participant across their study visits and checks that the changes from one visit to the next are biologically possible and naturally variable. A weight that jumps thirty kilograms in a month, a continuous lab value that repeats to the decimal across three visits, a trajectory that is implausibly smooth, or measurements with almost no within-person variation are all signs of fabricated or carried-forward longitudinal data. The indicator runs four such checks per variable, subject by subject.

Technical description

A contextual screen for fabricated longitudinal individual-patient data. It locates a subject-identifier column and a time/visit column, sorts by subject and time, and examines each numeric variable along each subject's visits. Four checks: impossible jumps (visit-to-visit absolute change exceeding a per-variable maximum from a biological-change-thresholds dictionary); copy-forward (three or more identical consecutive values within a subject on a CONTINUOUS variable, with continuity judged across the whole column so a copied run of whole numbers is not missed); autocorrelation (mean lag-one autocorrelation across subjects above 0.95, indicating artificially smooth trajectories); and variability ratio (mean within-subject SD over between-subject SD below 0.1, meaning each subject barely moves). The jump and copy-forward counts plus the variability flag set the score.

How it works

Layer 2 (contextual): after sorting by subject and time, each numeric variable (not the subject or time column) is analysed. Per subject, visit-to-visit absolute differences above the variable's threshold are counted as impossible jumps; on continuous variables, a run of three or more identical consecutive values is a copy-forward. Across subjects, a mean lag-one autocorrelation above 0.95 is flagged, and a mean within-subject SD over pooled between-subject SD below 0.1 is flagged. Score: three or more jumps add 2.5 (one or two add 1.5); three or more copy-forwards add 2.5 (one or two add 1.0); low variability adds 1.0; capped at 5.0. The within/between variability ratio is also reported as the intraclass correlation (between-subject variance over total), which approaches one as the within-subject spread vanishes. Metadata records subjects_checked, impossible_jumps, copy_forwards, low_variability_ratio, max_autocorr (the highest mean lag-one autocorrelation across variables), and max_intraclass_correlation.

Why this matters

Longitudinal individual-patient data is unusually revealing because it constrains not just each value but the path between values, and fabricators struggle to make those paths realistic. Carlisle found that access to raw longitudinal data dramatically increased detection of false data and zombie trials, because impossible changes and carried-forward values become visible only at the per-subject level. His earlier re-analyses established impossible and improbable values as integrity signals, and the classic biostatistical account of fraud lists carried-forward and implausibly smooth records among the markers. The four checks target distinct fabrication shortcuts: inventing a value ignores physiology (impossible jumps), copying a value produces exact repeats, generating a smooth trend produces excessive autocorrelation, and assigning near-constant trajectories collapses within-subject variability. The score rewards their accumulation.

Score thresholds

0
Visit-to-visit changes are plausible and naturally variable.
2-3
A few impossible jumps or copy-forward runs, or suspiciously low within-subject variability.
4-5
Systematic impossible changes or carried-forward values across subjects.

Limitations

Requires individual-patient data with both a recognisable subject identifier and a time/visit column, so cross-sectional or summary-only data is out of scope. The impossible-jump check works only for variables in the biological-change-thresholds dictionary. The copy-forward check applies only to continuous variables, since integer-scale measures (a stage, a count) legitimately repeat across visits, so a fabricated integer series is not caught here. Genuine clinical reality can produce some patterns: a stable patient on treatment shows low within-subject variability, and a slowly drifting biomarker is highly autocorrelated, so a flag prompts inspection. The thresholds (0.95 autocorrelation, 0.1 variability ratio, dictionary change limits) are directional. Cross-sectional duplication and distributional checks are other D-series indicators.

References

  1. Carlisle JB. (2021). False individual patient data and zombie randomised controlled trials submitted to Anaesthesia. Anaesthesia 76(4):472-479
  2. Carlisle JB. (2017). Data fabrication and other reasons for non-random sampling in 5087 randomised, controlled trials in anaesthetic and general medical journals. Anaesthesia 72(8):944-952
  3. Buyse M, George SL, Evans S, et al.. (1999). The role of biostatistics in the prevention, detection and treatment of fraud in clinical trials. Statistics in Medicine 18(24):3435-3451
  4. Al-Marzouki S, Evans S, Marshall T, Roberts I. (2005). Are these data real? Statistical methods for the detection of data fabrication in clinical trials. BMJ 331(7511):267-270
  5. George SL, Buyse M. (2015). Data fraud in clinical trials. Clinical Investigation 5(2):161-173
  6. Bordewijk EM, Li W, van Eekelen R, Wang R, Showell M, Mol BW, van Wely M. (2021). Methods to assess research misconduct in health-related research: A scoping review. Journal of Clinical Epidemiology 136:189-202
  7. Wilkinson J, Heal C, Antoniou GA, et al.. (2024). A survey of experts to identify methods to detect problematic studies: stage 1 of the INveStigating ProblEmatic Clinical Trials in Systematic Reviews project. Journal of Clinical Epidemiology 175:111512
  8. Crone G, Green CD. (2025). Tools of the data detective: A review of statistical methods to detect data and result anomalies in psychology. Theory & Psychology 35(3):359-380
  9. Shrout PE, Fleiss JL. (1979). Intraclass correlations: uses in assessing rater reliability. Psychological Bulletin 86(2):420-428