R3Statistical analysisMethodological CoherenceLayer 2 (Contextual)

Variable Consistency

Checks that the variables a paper says it will measure are the ones it actually reports, and vice versa. A variable described in the Methods but absent from the Results points to selective reporting, and a variable that appears in the Results but was never declared in the Methods points to data dredging or undeclared post-hoc analysis. A primary endpoint declared in the Methods that never surfaces in the Results is the most serious case. The indicator extracts variable mentions from each section and compares them. It reads the Methods and Results section text.

Technical description

R3 is a contextual check of agreement between the variables an article declares measuring and the variables it reports analysing. It extracts candidate variable terms from the Methods section using verb cues that introduce measured quantities, such as measured, assessed, evaluated, recorded, monitored, collected, and the phrasings of an outcome, variable, or primary or secondary endpoint, capturing the remainder of the sentence and splitting it into comma- and conjunction-separated items. It extracts variable terms from the Results section by taking sentence subjects that precede result verbs such as showed, demonstrated, increased, decreased, improved, or differed. Each term is normalised by lowercasing, removing stopwords and study-noise words, and the two sets are compared with a tolerant word-level matcher. Terms in the Methods with no match in the Results are unreported, terms in the Results with no match in the Methods are undeclared, and a declared primary endpoint absent from the Results is flagged separately. Following the COMPare distinction between silent and declared outcome switching, undeclared variables that the paper openly labels post-hoc or exploratory are treated as disclosed analyses and excluded from the discrepancy count used for scoring.

How it works

Methods verb patterns capture the text following each cue up to the sentence end, which is then split on commas and the word and into individual variable phrases. Results subjects are captured as the capitalised span preceding a result verb, anchored to a sentence boundary. Each phrase is normalised: lowercased, split into words, with stopwords and words such as patients, subjects, participants, and groups removed. Two normalised terms match when they are identical, when the words of one are a subset of the words of the other, or when at least half of the shorter term's stemmed words overlap the other's; word-level rather than raw-substring containment prevents an incidental character overlap, such as age inside average, from being read as the same variable. Unreported and undeclared terms each yield warning findings, and a missing primary endpoint an error finding. The score is 0.0 for no discrepancy, 2.0 for one or two, and 4.0 for three or more, with a further 1.0 when the primary endpoint is missing, capped at 5.0. The indicator returns zero when either section or any variable terms are absent. The metadata lists the extracted terms, the unreported and undeclared sets, and the primary-endpoint status, together with the discrepancy counts and the discrepancy rate (the share of distinct variables that fail to appear in both sections). It also detects whether the Results disclose post-hoc, exploratory, unplanned, or not-pre-specified analyses; when they do, undeclared variables are counted as disclosed rather than silent and excluded from the discrepancy count that sets the score, with the disclosure flag and the effective count recorded in the metadata.

Score thresholds

Score	Meaning
0	The variables declared and the variables reported correspond, or sections were unavailable.
2	One or two variables differ between Methods and Results.
4 to 5	Three or more variables differ, or a declared primary endpoint is missing from the Results.

Why this matters

Discrepancy between the variables a study plans and the variables it reports is the operational definition of outcome reporting bias, one of the best-documented distortions in the literature. Chan and colleagues, comparing trial protocols with their publications, found that outcomes were frequently added, omitted, or switched between protocol and paper, and that the changes tracked statistical significance, so a primary endpoint declared but not reported is a recognised marker of biased reporting [1]. Dwan and colleagues, reviewing the empirical evidence, confirmed that statistically significant outcomes are more likely to be fully reported than non-significant ones, which means a variable measured but missing from the results is more often suppressed than forgotten [2]. The opposite discrepancy, a result for a variable never declared, reflects the analytic flexibility that Simmons and colleagues showed can manufacture significant findings: undisclosed measures and analyses are a principal route to false positives, so a result that has no counterpart in the methods signals an undeclared or post-hoc analysis [3]. R3 mechanises both halves of this correspondence and singles out the primary endpoint, whose disappearance is the most consequential form of the problem. The ORBIT programme of Kirkham and colleagues built a formal classification of missing-outcome reporting and showed it materially biases the evidence base [4], and the COMPare project of Goldacre and colleagues monitored trials in real time and found pervasive silent addition and omission of outcomes against pre-specification [5]. The same correspondence is now embedded in statistical-reporting checklists [6] and in research-integrity screening, where expert-derived warning signs [7], audits of fabricated trials [8], the INSPECT-SR instrument [9], reviews of the data-detective toolkit [10], and catalogues of misconduct-detection methods [11] treat a results section that does not match its declared variables as a marker worth examining.

Limitations

Extracting variables from prose is inherently approximate: the verb and subject cues miss variables introduced by other phrasings and can capture non-variable fragments, so both the unreported and undeclared sets carry noise and a flag is a prompt to read the two sections rather than a verdict. The tolerant matcher trades precision for recall, so closely related but distinct variables may be merged and a genuine discrepancy missed, while unrelated terms sharing stems may be wrongly merged. The check needs both a Methods and a Results section, so an unsegmented document is skipped, and it does not read tables or figures, where a variable's results may actually appear. It compares names, not the analyses behind them, so a variable reported in a different framing can be misjudged. Whether the reported sample size is consistent is indicator R2 and whether the chosen tests fit the design is indicator R1, so R3 focuses specifically on the correspondence of variables between Methods and Results.

Theoretical background

R3 operationalises the principle that a sound study fixes its variables in advance and reports them all, so the set declared in the Methods and the set reported in the Results should coincide. Departures partition into two directions with distinct meanings. A Methods-only variable indicates that a planned measurement was not reported, which under outcome reporting bias is disproportionately the measurement that failed to reach significance, so the omission systematically inflates the apparent success of the study. A Results-only variable indicates an analysis that was not pre-declared, which expands the researcher's degrees of freedom and the family of implicit comparisons, raising the chance that a reported significant finding is a false positive selected after the fact. The primary endpoint occupies a special place because it is the pre-specified basis on which the study is to be judged, so its absence from the results is not a minor omission but a removal of the study's own success criterion. Because the variables are recovered from free text, the comparison is necessarily fuzzy, and the design accepts that imprecision by matching at the level of words and stems rather than exact strings, while the move from raw-substring to word-subset containment removes a class of spurious matches in which a short term is accidentally embedded in an unrelated longer word, sharpening the boundary between a genuine correspondence and a coincidental one. The COMPare project distinguished silent outcome switching from openly declared post-hoc analyses, and the indicator honours that distinction: an added analysis the paper labels exploratory is transparent rather than a hidden degree of freedom, so it is recorded but excluded from the score.

References

Chan AW, Hróbjartsson A, Haahr MT, Gøtzsche PC, Altman DG. Empirical evidence for selective reporting of outcomes in randomized trials: comparison of protocols to published articles. JAMA. 2004;291(20):2457-2465. DOI: 10.1001/jama.291.20.2457
Dwan K, Altman DG, Arnaiz JA, et al. Systematic review of the empirical evidence of study publication bias and outcome reporting bias. PLoS ONE. 2008;3(8):e3081. DOI: 10.1371/journal.pone.0003081
Simmons JP, Nelson LD, Simonsohn U. False-positive psychology: undisclosed flexibility in data collection and analysis allows presenting anything as significant. Psychological Science. 2011;22(11):1359-1366. DOI: 10.1177/0956797611417632
Kirkham JJ, Dwan KM, Altman DG, et al. The impact of outcome reporting bias in randomised controlled trials on a cohort of systematic reviews. BMJ. 2010;340:c365. DOI: 10.1136/bmj.c365
Goldacre B, Drysdale H, Dale A, et al. COMPare: a prospective cohort study correcting and monitoring 58 misreported trials in real time. Trials. 2019;20(1):118. DOI: 10.1186/s13063-019-3173-2
Mansournia MA, Collins GS, Nielsen RO, et al. CHecklist for statistical Assessment of Medical Papers: the CHAMP statement. British Journal of Sports Medicine. 2021;55(18):1002-1003. DOI: 10.1136/bjsports-2020-103651
Parker L, Boughton S, Lawrence R, Bero L. Experts identified warning signs of fraudulent research: a qualitative study to inform a screening tool. Journal of Clinical Epidemiology. 2022;151:1-17. DOI: 10.1016/j.jclinepi.2022.07.006
Carlisle JB. False individual patient data and zombie randomised controlled trials submitted to Anaesthesia. Anaesthesia. 2021;76(4):472-479. DOI: 10.1111/anae.15263
Wilkinson J, Heal C, Antoniou GA, et al. A survey of experts to identify methods to detect problematic studies: stage 1 of the INveStigating ProblEmatic Clinical Trials in Systematic Reviews project. Journal of Clinical Epidemiology. 2024;175:111512. DOI: 10.1016/j.jclinepi.2024.111512
Crone G, Green CD. Tools of the data detective: a review of statistical methods to detect data and result anomalies in psychology. Theory & Psychology. 2025;35(3):359-380. DOI: 10.1177/09593543241311861
Bordewijk EM, Li W, van Eekelen R, et al. Methods to assess research misconduct in health-related research: a scoping review. Journal of Clinical Epidemiology. 2021;136:189-202. DOI: 10.1016/j.jclinepi.2021.05.012