R3Statistical analysisMethodological CoherenceLayer 2 (Contextual)

Variable Consistency

Checks whether outcome variables are named consistently across the paper, catching cases where the same measurement is called different things in different sections.

Technical description

R3 checks agreement between the variables an article declares measuring and the variables it reports analysing. From the Methods it extracts candidate variable terms using verb cues (measured, assessed, evaluated, recorded, monitored, collected) and the phrasings of an outcome, variable, or primary or secondary endpoint, capturing the rest of the sentence and splitting it into comma- and conjunction-separated items. From the Results it takes sentence subjects preceding result verbs (showed, demonstrated, increased, decreased, improved, differed). Each term is normalised by lowercasing and removing stopwords and study-noise words, and the sets are compared with a tolerant word-level matcher. Methods-only terms are unreported (selective reporting), Results-only terms are undeclared (data dredging), and a declared primary endpoint absent from the Results is flagged separately. Undeclared variables openly labelled post-hoc or exploratory are treated as disclosed (COMPare distinction) and excluded from the count used for scoring.

How it works

Layer 2 (contextual): Methods verb patterns capture the text after each cue to the sentence end, split on commas and the word and. Results subjects are the capitalised span before a result verb, anchored to a sentence boundary. Terms are lowercased, split, and stripped of stopwords and words such as patients, subjects, participants. Two terms match when identical, when one's words are a subset of the other's, or when at least half of the shorter term's stemmed words overlap; word-level rather than raw-substring containment stops an incidental overlap such as age inside average from matching. Unreported and undeclared terms give warnings, a missing primary endpoint an error. Score is 0.0 for none, 2.0 for one or two, 4.0 for three or more, plus 1.0 for a missing primary endpoint, capped at 5.0. Returns zero when a section or any terms are absent. The metadata also reports the discrepancy counts and the discrepancy rate, the share of distinct variables that fail to appear in both sections. It detects post-hoc, exploratory, unplanned, or not-pre-specified disclosure in the Results; when present, undeclared variables are counted as disclosed rather than silent and excluded from the count that sets the score, with the disclosure flag and effective count recorded.

Why this matters

Discrepancy between the variables a study plans and the variables it reports is the operational definition of outcome reporting bias. Comparing trial protocols with publications shows outcomes frequently added, omitted, or switched in ways that track statistical significance, so a planned variable missing from the results is more often suppressed than forgotten. The opposite discrepancy, a result for a variable never declared, reflects the undisclosed analytic flexibility that manufactures false positives. A declared primary endpoint absent from the results removes the study's own pre-specified success criterion and is the most serious case.

Score thresholds

0: The variables declared and reported correspond, or sections were unavailable
2: One or two variables differ between Methods and Results
4-5: Three or more variables differ, or a declared primary endpoint is missing from the Results

Limitations

Extracting variables from prose is approximate: the cues miss variables in other phrasings and can capture non-variable fragments, so both sets carry noise and a flag prompts reading the sections rather than a verdict. The tolerant matcher trades precision for recall, so related but distinct variables may merge and a real discrepancy be missed, while stem-sharing unrelated terms may wrongly merge. Both a Methods and a Results section are required, so unsegmented documents are skipped, and tables and figures, where a variable's results may appear, are not read. It compares names, not the analyses behind them. Sample-size consistency is indicator R2 and design-test fit is indicator R1; R3 focuses on the correspondence of variables between Methods and Results.