R4Statistical analysisMethodological CoherenceLayer 2 (Contextual)

Protocol Fidelity

Checks whether the statistical analysis described in the methods section matches what was actually reported in the results, detecting undisclosed analysis changes.

Technical description

R4 cross-checks the design claims a paper makes against evidence elsewhere in the text. It detects an intention-to-treat (ITT) claim, a per-protocol mention, the blinding level (single-, double-, triple-blind, or open-label), and the randomisation ratio, matching the compound terms across the hyphen, space, and typographic dash characters typeset documents use. When ITT is claimed it extracts the enrolled/randomised count and the analysed/included count and computes their ratio, since an ITT analysis should include essentially all randomised participants. When a double- or triple-blind design is claimed it scans the results for unblinding or unmasking, which contradict an undisturbed blind unless explained. It also detects a modified intention-to-treat (mITT) claim, flagging it as a weaker variant that permits post-randomisation exclusions and is linked to larger, potentially biased effects (Abraha 2015). Each contradiction yields a finding and raises the score.

How it works

Layer 2 (contextual): protocol claims are extracted by regex (ITT full and abbreviated, per-protocol, blinding terms, a numeric randomisation ratio such as 1:1), with the hyphenated terms matching any common dash character so a term typeset with an en dash or non-breaking hyphen is still recognised. No protocol information returns zero. When ITT is claimed, the first enrolled/randomised and analysed/included counts are read; if both positive and the analysed-to-enrolled ratio is below 0.85, a warning is added and the score set to at least 4.0. For double- or triple-blind, the results section (or full text if unsegmented) is searched for unblinding/unmasking; a match adds a warning and at least 2.0. The score is the maximum across checks, any flag raising at least 2.0, capped at 5.0. When an ITT claim is present and both an enrolled and an analysed count are found, their ratio (the ITT fidelity ratio) is recorded in the metadata as a diagnostic alongside the two counts. A modified intention-to-treat (mITT) claim is detected separately and adds a warning raising the score to at least 2.0, with the metadata recording whether the analysis is modified rather than strict ITT.

Why this matters

A study's stated design and its reported conduct should agree, and where they do not the discrepancy is a quality flaw and a possible sign of after-the-fact change. CONSORT requires specifying the analysis population, blinding, and randomisation and reconciling the numbers analysed with those randomised. Intention-to-treat is the claim most often made loosely, with surveys finding the term used with conflicting meanings and many claiming trials having excluded participants, so a stated ITT whose analysed count falls short of enrollment is a documented inconsistency; more generally, published analyses frequently diverge from the declared protocol.

Score thresholds

0: Protocol claims are consistent with the reported counts and statements, or none were found
2: A minor inconsistency, such as a blinding claim contradicted by an unblinding mention
4-5: An intention-to-treat claim contradicted by an analysed count well below the enrolled count

Limitations

The checks rest on pattern extraction, so unusually phrased claims are missed and the ITT counts read are the first enrolled and analysed numbers found, which can be wrong when several populations are reported (for example mistaking a per-protocol subset for the ITT analysed count). Legitimate pre-planned unblinding, such as an independent committee's interim look, is flagged like an improper one, so a blinding finding prompts reading the explanation. The randomisation ratio is the first colon-separated integer pair, which can capture a clock time, and is recorded but not scored. The indicator reads text, not the participant-flow diagram where definitive counts reside. Sample-size drift is indicator R2 and test appropriateness is indicator R1; R4 focuses on the consistency of the analysis-population and blinding claims with the rest of the report.