ResAIKit
Research Integrity Toolkit
Back to the encyclopedia
R11Statistical analysisMethodological CoherenceLayer 2 (Contextual)

Missing Data

Checks how the paper says it dealt with missing data and grades the approach. Almost every study has some missing values, and the method chosen matters: multiple imputation with a sensitivity analysis is strong, complete-case analysis is acceptable when little is missing, and carrying the last value forward is weak. A paper that names no method at all leaves a gap. The indicator detects the handling method, whether a sensitivity analysis is reported, and the stated missing-data percentage, and scores the combination. It reads the article text.

Technical description

R11 is a contextual check of the quality of missing-data handling described in the text. It searches for the named methods, multiple imputation or MICE, last observation carried forward, complete-case or listwise or pairwise deletion, and generic imputation, for a sensitivity analysis, for the missingness-mechanism assumptions missing at random and its completely-at-random and not-at-random variants, and for a stated percentage of missing data. The acronyms for the missingness assumptions are matched case-sensitively so that the month abbreviation Mar or the ordinary word mar is not mistaken for the MAR assumption. The score grades the approach from best to worst: multiple imputation with a sensitivity analysis is best, multiple imputation alone is good, complete-case analysis with a small missing percentage is acceptable, complete-case with an unknown or high percentage and last-observation-carried-forward without a sensitivity analysis are weak, and no method at all is the worst. Independently of this grade, the indicator scores the completeness of the reporting against the TARMOS framework, counting how many of its four elements (the amount missing, the method, the assumed mechanism, and a sensitivity analysis) are present and flagging a named method whose account omits some of them.

How it works

The text is matched against the method patterns and the sensitivity-analysis, assumption, and missing-percentage patterns. Multiple imputation with a sensitivity analysis scores 0.0, multiple imputation alone scores 1.0. Complete-case analysis scores 1.5 when a stated missing percentage is below five and 2.5 otherwise, with a finding. Last observation carried forward scores 2.0 with a sensitivity analysis and 3.0 without, with a finding in the latter case. A generic imputation or a missingness-assumption mention, with no stronger method, scores 2.0. No missing-data handling mentioned at all scores 3.5 with a finding. The score is capped at 5.0. The metadata records the primary method found, whether a sensitivity analysis was present, the largest stated missing percentage (the worst-affected variable governs the adequacy of a complete-case analysis) and how many percentages were reported, and whether a missingness mechanism was stated. It also counts the TARMOS reporting elements present (amount, method, mechanism, sensitivity); when a method is named, the grade is not best practice, and the account is incomplete, an informational finding lists the missing elements, and the metadata records the count and the per-element status.

Score thresholds

Score Meaning
0 to 1.5 A principled method: multiple imputation, or complete-case analysis with little missing data.
2 to 3 A weaker approach: complete-case with high or unstated missingness, or last-observation-carried-forward.
3.5 to 5 No missing-data handling method is described.

Why this matters

Almost all studies have missing data, and how it is handled can change the conclusions, so the method and its reporting are part of a study's validity. Sterne and colleagues set out the appropriate use and reporting of multiple imputation, explaining why it is preferred to ad hoc methods and what must be reported for it to be credible, which is why the indicator treats multiple imputation, especially with a sensitivity analysis, as the strongest signal [1]. Little and Rubin established the framework of missingness mechanisms, missing completely at random, at random, and not at random, on which the validity of any handling method depends, so a paper that states its assumption is acknowledging the basis of its approach [2]. Lachin showed that last observation carried forward rests on an implausible assumption that a participant's value stays fixed after dropout and can bias results in either direction, which is why the indicator penalises it when it is used without a sensitivity analysis to bound its effect [3]. A study that describes no handling at all leaves the reader unable to judge whether missing data distorted the result, the gap the lowest score marks. Jakobsen and colleagues gave a practical decision guide for when and how to apply multiple imputation in trials, the choices the indicator rewards [4], and the TARMOS framework of Lee and colleagues set out what the treatment and reporting of missing data should cover, against which a bare or absent description falls short [5]. Statistical-reporting checklists list missing-data handling among the items reviewers should confirm [6], and research-integrity screening, through expert-derived warning signs [7], the INSPECT-SR instrument [8], and reviews of the data-detective toolkit [9], treats unexplained or implausible missing-data handling as a quality signal.

Limitations

Detection is keyword-based, so a method described in unconventional terms is missed and the study may be scored as having no handling, and conversely a method named only in passing, or in the background rather than as the study's approach, is credited. The missing percentage taken for scoring is the largest of those stated, since the worst-affected variable governs the adequacy of a complete-case analysis, but a complete-case analysis with genuinely low yet unstated missingness is still scored as if the rate were high. The grading reflects general preferences rather than the specifics of a given study, where a simpler method can be entirely appropriate, so a score is guidance rather than a verdict. The indicator reads the text and does not confirm that the described method was actually applied or applied correctly. The structure of missingness within the individual-patient data is indicator D29, so R11 focuses on the description and adequacy of the missing-data handling in the report.

Theoretical background

R11 rests on the principle that missing data is not neutral: discarding or filling it embeds an assumption about why it is missing, and the credibility of the analysis depends on that assumption holding. Under data missing completely at random, the observed cases are a random subsample and complete-case analysis is unbiased though inefficient; under the weaker missing-at-random condition, the missingness depends only on observed variables and multiple imputation, which draws plausible values from their conditional distribution and propagates the resulting uncertainty across several completed datasets, recovers valid estimates and standard errors; under missing not at random, no method is safe without modelling the mechanism, and a sensitivity analysis that varies the assumption is the honest response. This hierarchy is why the indicator ranks multiple imputation with a sensitivity analysis highest and an undocumented approach lowest. Last observation carried forward sits low because it imputes a single deterministic value that assumes no change after dropout, understating variability and biasing the estimate, so its use is acceptable only when a sensitivity analysis shows the conclusions are robust to it. Matching the assumption acronyms case-sensitively is a small but necessary safeguard, since treating the calendar month Mar as the MAR assumption would credit a study for reasoning it never did. The TARMOS completeness count operationalises the modern reporting standard that an account of missing data is judged not only by the method chosen but by whether the amount, the assumed mechanism, the method, and a sensitivity analysis are all stated, each being needed for a reader to assess whether the handling could have biased the result.

References

  1. Sterne JAC, White IR, Carlin JB, et al. Multiple imputation for missing data in epidemiological and clinical research: potential and pitfalls. BMJ. 2009;338:b2393. DOI: 10.1136/bmj.b2393
  2. Little RJA, Rubin DB. Statistical Analysis with Missing Data. 3rd ed. Hoboken, NJ: John Wiley and Sons; 2019. DOI: 10.1002/9781119482260
  3. Lachin JM. Fallacies of last observation carried forward analyses. Clinical Trials. 2016;13(2):161-168. DOI: 10.1177/1740774515602688
  4. Jakobsen JC, Gluud C, Wetterslev J, Winkel P. When and how should multiple imputation be used for handling missing data in randomised clinical trials: a practical guide with flowcharts. BMC Medical Research Methodology. 2017;17(1):162. DOI: 10.1186/s12874-017-0442-1
  5. Lee KJ, Tilling KM, Cornish RP, et al. Framework for the treatment and reporting of missing data in observational studies: the TARMOS framework. Journal of Clinical Epidemiology. 2021;134:79-88. DOI: 10.1016/j.jclinepi.2021.01.008
  6. Mansournia MA, Collins GS, Nielsen RO, et al. CHecklist for statistical Assessment of Medical Papers: the CHAMP statement. British Journal of Sports Medicine. 2021;55(18):1002-1003. DOI: 10.1136/bjsports-2020-103651
  7. Parker L, Boughton S, Lawrence R, Bero L. Experts identified warning signs of fraudulent research: a qualitative study to inform a screening tool. Journal of Clinical Epidemiology. 2022;151:1-17. DOI: 10.1016/j.jclinepi.2022.07.006
  8. Wilkinson J, Heal C, Antoniou GA, et al. A survey of experts to identify methods to detect problematic studies: stage 1 of the INveStigating ProblEmatic Clinical Trials in Systematic Reviews project. Journal of Clinical Epidemiology. 2024;175:111512. DOI: 10.1016/j.jclinepi.2024.111512
  9. Crone G, Green CD. Tools of the data detective: a review of statistical methods to detect data and result anomalies in psychology. Theory & Psychology. 2025;35(3):359-380. DOI: 10.1177/09593543241311861