ResAIKit
Research Integrity Toolkit
Back to the encyclopedia
D5Statistical analysisFabrication DetectionLayer 2 (Contextual)

Longitudinal Impossibility

Follows each participant across their study visits and checks that the changes from one visit to the next are biologically possible and naturally variable. A weight that jumps thirty kilograms in a month, a continuous lab value that repeats to the decimal across three visits, a trajectory that is implausibly smooth, or measurements with almost no within-person variation are all signs of fabricated or carried-forward longitudinal data. The indicator runs four such checks per variable, subject by subject. It works on the individual-patient data when a subject identifier and a time column are present.

Technical description

D5 is a contextual screen for fabricated longitudinal individual-patient data. It locates a subject-identifier column and a time or visit column, sorts the data by subject and time, and examines each numeric variable along each subject's sequence of visits. Four checks run. The impossible-jump check compares each visit-to-visit absolute change against a per-variable maximum drawn from a biological-change-thresholds dictionary, flagging changes that exceed what physiology allows between visits. The copy-forward check looks for three or more identical consecutive values within a subject on a continuous variable, where continuity is judged across the whole column so that a copied run of whole numbers is not missed; such exact repeats suggest a value carried forward rather than measured. The autocorrelation check computes the lag-one autocorrelation of each subject's series and flags a variable whose mean autocorrelation across subjects exceeds 0.95, indicating artificially smooth trajectories. The variability-ratio check compares the average within-subject standard deviation to the between-subject standard deviation and flags a ratio below 0.1, meaning each subject barely moves relative to how much subjects differ. The counts of jumps and copy-forwards, plus the variability flag, set the score.

How it works

After sorting by subject and time, each numeric variable that is not the subject or time column is analysed. For each subject with at least two values, visit-to-visit absolute differences are compared against the variable's threshold when one is known, and each exceedance is counted as an impossible jump. For each continuous variable, defined by the presence of non-integer values anywhere in the column, a run of three or more identical consecutive values within a subject is counted as a copy-forward. Across subjects, the lag-one autocorrelation is averaged, and a variable above 0.95 is flagged. The mean within-subject standard deviation divided by the pooled between-subject standard deviation is flagged below 0.1. The same within-and-between decomposition is also reported as the intraclass correlation, the between-subject variance as a fraction of the total, which approaches one exactly when the within-subject spread vanishes relative to the between-subject spread [9].

The score accumulates: three or more impossible jumps add 2.5 and one or two add 1.5; three or more copy-forwards add 2.5 and one or two add 1.0; a low variability ratio adds 1.0; the total is capped at 5.0. Each anomaly produces a finding naming the subject and variable. The metadata records the number of subjects checked, the impossible-jump and copy-forward counts, whether the low-variability ratio fired, the highest mean lag-one autocorrelation across variables, and the highest intraclass correlation across variables.

Score thresholds

Score Meaning
0 Visit-to-visit changes are plausible and naturally variable.
2 to 3 A few impossible jumps or copy-forward runs, or suspiciously low within-subject variability.
4 to 5 Systematic impossible changes or carried-forward values across subjects.

Why this matters

Longitudinal individual-patient data is unusually revealing because it constrains not just each value but the path between values, and fabricators struggle to make those paths realistic. Carlisle, examining trials submitted with and without individual-patient data, found that access to the raw longitudinal data dramatically increased the detection of false data and zombie trials, because impossible changes and carried-forward values become visible only at the per-subject level [1]. His earlier large-scale re-analyses established impossible and improbable values as integrity signals across the literature [2], and the classic biostatistical account of fraud lists carried-forward and implausibly smooth longitudinal records among the patterns that distinguish invented from genuine data [3]. The four checks target distinct fabrication shortcuts: inventing a value without regard to physiology produces impossible jumps, copying a previous value produces exact repeats, generating a smooth trend produces excessive autocorrelation, and assigning each subject a near-constant trajectory produces vanishing within-subject variability. Because each can occur innocently in isolation, the score rewards their accumulation. Recent forensic re-analyses, scoping reviews, and trustworthiness instruments place within-subject and carried-forward longitudinal checks among the standard screens for fabricated patient-level data [4, 5, 6, 7, 8].

Limitations

The check requires individual-patient data with both a recognisable subject identifier and a time or visit column, so cross-sectional or summary-only data is outside its scope. The impossible-jump check works only for variables present in the biological-change-thresholds dictionary, so an unrecognised variable is not range-checked. The copy-forward check applies only to continuous variables, since integer-scale measures such as a stage or a count legitimately repeat across visits, which means a fabricated integer series is not caught here. Genuine clinical reality can produce some of these patterns: a stable patient on treatment may show low within-subject variability, and a slowly drifting biomarker can be highly autocorrelated, so a flag is a prompt to inspect rather than proof. The thresholds, a 0.95 autocorrelation, a 0.1 variability ratio, and the dictionary change limits, are directional. Cross-sectional duplication and distributional checks are other D-series indicators, so D5 stays on the within-subject longitudinal trajectories.

Theoretical background

D5 rests on the structure that repeated measurement of the same individual imposes. A real biological trajectory is constrained on three timescales at once: the magnitude of change between adjacent visits is bounded by physiology, the sequence of values carries genuine measurement noise so that consecutive readings rarely coincide exactly and the lag-one autocorrelation stays well below one, and the spread of a person's own values over time is comparable in order of magnitude to the spread between people. Fabrication breaks these in characteristic ways. Inventing each visit independently of the last violates the change bound, producing jumps no organism could make. Carrying a value forward, the path of least effort for a fabricator filling a longitudinal table, produces exact repeats that a noisy continuous measurement essentially never yields, which is why the check is confined to continuous variables and why continuity must be judged at the column level rather than from one subject's possibly-integer run. Generating a smooth synthetic curve drives the lag-one autocorrelation toward one, and assigning each subject a fixed level with negligible drift collapses the within-subject standard deviation relative to the between-subject spread. That collapse is precisely a near-unity intraclass correlation, the share of total variance lying between subjects rather than within them, so reporting the intraclass correlation expresses the variability-ratio check on the standard reliability scale where a value approaching one marks subjects that are each almost constant [9]. Each check reads one of these constraints, and because longitudinal fabrication usually leaves more than one trace, the indicator sums the evidence rather than relying on any single signal.

References

  1. Carlisle JB. False individual patient data and zombie randomised controlled trials submitted to Anaesthesia. Anaesthesia. 2021;76(4):472-479. DOI: 10.1111/anae.15263
  2. Carlisle JB. Data fabrication and other reasons for non-random sampling in 5087 randomised, controlled trials in anaesthetic and general medical journals. Anaesthesia. 2017;72(8):944-952. DOI: 10.1111/anae.13938
  3. Buyse M, George SL, Evans S, et al. The role of biostatistics in the prevention, detection and treatment of fraud in clinical trials. Statistics in Medicine. 1999;18(24):3435-3451. https://pubmed.ncbi.nlm.nih.gov/10611617/
  4. Al-Marzouki S, Evans S, Marshall T, Roberts I. Are these data real? Statistical methods for the detection of data fabrication in clinical trials. BMJ. 2005;331(7511):267-270. DOI: 10.1136/bmj.331.7511.267
  5. George SL, Buyse M. Data fraud in clinical trials. Clinical Investigation. 2015;5(2):161-173. DOI: 10.4155/cli.14.116
  6. Bordewijk EM, Li W, van Eekelen R, Wang R, Showell M, Mol BW, van Wely M. Methods to assess research misconduct in health-related research: A scoping review. Journal of Clinical Epidemiology. 2021;136:189-202. DOI: 10.1016/j.jclinepi.2021.05.012
  7. Wilkinson J, Heal C, Antoniou GA, et al. A survey of experts to identify methods to detect problematic studies: stage 1 of the INveStigating ProblEmatic Clinical Trials in Systematic Reviews project. Journal of Clinical Epidemiology. 2024;175:111512. DOI: 10.1016/j.jclinepi.2024.111512
  8. Crone G, Green CD. Tools of the data detective: A review of statistical methods to detect data and result anomalies in psychology. Theory & Psychology. 2025;35(3):359-380. DOI: 10.1177/09593543241311861
  9. Shrout PE, Fleiss JL. Intraclass correlations: uses in assessing rater reliability. Psychological Bulletin. 1979;86(2):420-428. DOI: 10.1037/0033-2909.86.2.420