ResAIKit
Research Integrity Toolkit
Back to the encyclopedia
R4Statistical analysisMethodological CoherenceLayer 2 (Contextual)

Protocol Fidelity

Checks that what a paper says about its design holds together with what it reports. If it claims an intention-to-treat analysis, the number analysed should be close to the number enrolled; if it claims a double-blind design, the results should not casually report unblinding. The indicator extracts the analysis-population, blinding, and randomisation claims and cross-checks them against the reported counts and against contradictory statements. It reads the article text and sections.

Technical description

R4 is a contextual cross-check between the design claims a paper makes and the evidence elsewhere in the text. It detects an intention-to-treat (ITT) claim, a per-protocol mention, the blinding level (single-, double-, triple-blind, or open-label), and the randomisation ratio, matching the compound terms across the hyphen, space, and typographic dash characters that typeset documents use. When ITT is claimed it extracts the enrolled or randomised count and the analysed or included count and computes their ratio: an ITT analysis should include essentially all randomised participants, so an analysed count well below the enrolled count contradicts the claim. When a double- or triple-blind design is claimed it scans the results for mentions of unblinding or unmasking, which contradict an undisturbed blind unless explained. It additionally detects a modified intention-to-treat (mITT) claim, which permits post-randomisation exclusions and is associated with larger, potentially biased effect estimates, and flags it as a weaker variant of strict intention-to-treat. Each contradiction yields a finding and raises the score.

How it works

The protocol claims are extracted by regular expression: the ITT term in its full and abbreviated forms, the per-protocol term, the blinding terms, and a numeric randomisation ratio such as 1:1 or 2:1. The hyphenated terms match any of the common dash characters, so a term typeset with an en dash or a non-breaking hyphen is still recognised. If no protocol information is found the indicator returns zero. When ITT is claimed, the first enrolled or randomised count and the first analysed or included count are read; if both are positive and the analysed-to-enrolled ratio falls below 0.85, a warning finding is added and the score set to at least 4.0. When the blinding is double- or triple-blind, the results section (or the full text if no results section is segmented) is searched for unblinding or unmasking, and a match adds a warning and sets the score to at least 2.0. The final score is the maximum across checks, with any flag raising at least 2.0, capped at 5.0. The metadata records the ITT claim, the per-protocol mention, the normalised blinding level, the randomisation ratio, the flag count, and, when an ITT claim is present and both counts are found, the enrolled and analysed counts and their ratio (the ITT fidelity ratio) as a diagnostic, even when the ratio stays above the 0.85 threshold. A modified intention-to-treat (mITT) claim is detected separately and, when present, adds a warning and raises the score to at least 2.0, with the metadata recording whether the analysis is a modified rather than a strict ITT.

Score thresholds

Score Meaning
0 Protocol claims are consistent with the reported counts and statements, or none were found.
2 A minor inconsistency, such as a blinding claim contradicted by an unblinding mention.
4 to 5 An intention-to-treat claim contradicted by an analysed count well below the enrolled count.

Why this matters

A study's stated design and its reported conduct are supposed to agree, and where they do not the discrepancy is both a quality flaw and a possible sign of after-the-fact change. The CONSORT statement requires authors to specify the analysis population, the blinding, and the randomisation and to reconcile the numbers analysed with those randomised, precisely so that a reader can verify this fidelity [1]. Intention-to-treat is the claim most often made loosely: Hollis and Campbell, surveying major journals, found that the term was used with conflicting meanings and that many trials claiming it had in fact excluded participants, so a stated ITT analysis whose analysed count falls well short of enrollment is a documented and common inconsistency [2]. The more general phenomenon, a published analysis that departs from the declared protocol, was demonstrated empirically by Chan and colleagues, who showed that what trials report frequently diverges from what their protocols specified [3]. Checking the analysis-population and blinding claims against the rest of the paper mechanises the most checkable parts of this fidelity, and an unexplained unblinding under a double-blind claim, or a shrunken ITT denominator, is exactly the kind of contradiction that warrants a closer look. The CHAMP checklist for statistical assessment lists the analysis population and the blinding among the items a reviewer should confirm [4], and recent research-integrity screening treats such design-conduct contradictions as trustworthiness signals: expert-derived warning signs [5], audits of fabricated trials [6], the INSPECT-SR instrument [7], reviews of the data-detective toolkit [8], and catalogues of misconduct-detection methods [9] all examine whether a paper's declared protocol matches what it reports. Abraha and colleagues found in a meta-epidemiological study that trials deviating from strict intention-to-treat, including those using a modified ITT that excludes participants after randomisation, reported systematically larger treatment effects, which is why a modified ITT claim is flagged here [10].

Limitations

The checks rest on pattern extraction, so claims phrased unusually are missed and the counts read for the ITT ratio are the first enrolled and analysed numbers found, which can be the wrong ones when a paper reports several populations, for example mistaking a per-protocol subset for the ITT analysed count. Legitimate, pre-planned unblinding, such as an independent committee's interim look, is flagged the same as an improper one, so a blinding finding is a prompt to read the explanation rather than a verdict. The randomisation ratio is captured as the first colon-separated integer pair, which can pick up a non-ratio such as a clock time, and it is recorded but not scored. The indicator reads text and so does not consult the participant-flow diagram, where the definitive counts often reside. Cross-section drift in the bare sample size is indicator R2 and the appropriateness of the chosen test is indicator R1, so R4 focuses on the consistency of the analysis-population and blinding claims with the rest of the report.

Theoretical background

R4 rests on the logical entailments that design claims carry. Intention-to-treat is defined by the rule that every randomised participant is analysed in the group to which they were assigned, regardless of adherence, so the analysed population is, by construction, the randomised population minus only unavoidable losses; a large gap between them is therefore not a stylistic choice but a contradiction of the claim, which is why the ratio of analysed to enrolled is a direct test of the assertion. Blinding entails that group identity is concealed from the relevant parties through the conduct and, ideally, the analysis of the trial, so an unblinding event reported without justification breaks the entailment, and the indicator treats double- and triple-blind claims, where concealment is strongest, as the ones whose contradiction is most informative. Because these claims are recovered from prose, recognition is the limiting step, and broadening the dash class addresses a concrete failure mode in which a term correctly written by the authors is missed only because the typesetter rendered its hyphen as a different glyph, so that the fidelity check is not defeated by typography. The indicator deliberately scores conservatively, raising a serious score only for the ITT-count contradiction whose arithmetic is unambiguous and a milder score for the blinding contradiction whose interpretation depends on context. A modified intention-to-treat claim is treated as a third, intermediate signal: it is not the arithmetic impossibility of the ITT-count contradiction, but because excluding randomised participants reintroduces the selection that intention-to-treat exists to prevent, an unqualified modified ITT is a substantive weakening of the analysis-population claim.

References

  1. Schulz KF, Altman DG, Moher D; CONSORT Group. CONSORT 2010 Statement: updated guidelines for reporting parallel group randomised trials. BMJ. 2010;340:c332. DOI: 10.1136/bmj.c332
  2. Hollis S, Campbell F. What is meant by intention to treat analysis? Survey of published randomised controlled trials. BMJ. 1999;319(7211):670-674. DOI: 10.1136/bmj.319.7211.670
  3. Chan AW, Hróbjartsson A, Haahr MT, Gøtzsche PC, Altman DG. Empirical evidence for selective reporting of outcomes in randomized trials: comparison of protocols to published articles. JAMA. 2004;291(20):2457-2465. DOI: 10.1001/jama.291.20.2457
  4. Mansournia MA, Collins GS, Nielsen RO, et al. CHecklist for statistical Assessment of Medical Papers: the CHAMP statement. British Journal of Sports Medicine. 2021;55(18):1002-1003. DOI: 10.1136/bjsports-2020-103651
  5. Parker L, Boughton S, Lawrence R, Bero L. Experts identified warning signs of fraudulent research: a qualitative study to inform a screening tool. Journal of Clinical Epidemiology. 2022;151:1-17. DOI: 10.1016/j.jclinepi.2022.07.006
  6. Carlisle JB. False individual patient data and zombie randomised controlled trials submitted to Anaesthesia. Anaesthesia. 2021;76(4):472-479. DOI: 10.1111/anae.15263
  7. Wilkinson J, Heal C, Antoniou GA, et al. A survey of experts to identify methods to detect problematic studies: stage 1 of the INveStigating ProblEmatic Clinical Trials in Systematic Reviews project. Journal of Clinical Epidemiology. 2024;175:111512. DOI: 10.1016/j.jclinepi.2024.111512
  8. Crone G, Green CD. Tools of the data detective: a review of statistical methods to detect data and result anomalies in psychology. Theory & Psychology. 2025;35(3):359-380. DOI: 10.1177/09593543241311861
  9. Bordewijk EM, Li W, van Eekelen R, et al. Methods to assess research misconduct in health-related research: a scoping review. Journal of Clinical Epidemiology. 2021;136:189-202. DOI: 10.1016/j.jclinepi.2021.05.012
  10. Abraha I, Cherubini A, Cozzolino F, et al. Deviation from intention to treat analysis in randomised trials and treatment effect estimates: meta-epidemiological study. BMJ. 2015;350:h2445. DOI: 10.1136/bmj.h2445