ResAIKit
Research Integrity Toolkit
Back to the encyclopedia
R3Statistical analysisMethodological CoherenceLayer 2 (Contextual)

Variable Consistency

Checks whether outcome variables are named consistently across the paper, catching cases where the same measurement is called different things in different sections.

Technical description

R3 checks agreement between the variables an article declares measuring and the variables it reports analysing. From the Methods it extracts candidate variable terms using verb cues (measured, assessed, evaluated, recorded, monitored, collected) and the phrasings of an outcome, variable, or primary or secondary endpoint, capturing the rest of the sentence and splitting it into comma- and conjunction-separated items. From the Results it takes sentence subjects preceding result verbs (showed, demonstrated, increased, decreased, improved, differed). Each term is normalised by lowercasing and removing stopwords and study-noise words, and the sets are compared with a tolerant word-level matcher. Methods-only terms are unreported (selective reporting), Results-only terms are undeclared (data dredging), and a declared primary endpoint absent from the Results is flagged separately. Undeclared variables openly labelled post-hoc or exploratory are treated as disclosed (COMPare distinction) and excluded from the count used for scoring.

How it works

Layer 2 (contextual): Methods verb patterns capture the text after each cue to the sentence end, split on commas and the word and. Results subjects are the capitalised span before a result verb, anchored to a sentence boundary. Terms are lowercased, split, and stripped of stopwords and words such as patients, subjects, participants. Two terms match when identical, when one's words are a subset of the other's, or when at least half of the shorter term's stemmed words overlap; word-level rather than raw-substring containment stops an incidental overlap such as age inside average from matching. Unreported and undeclared terms give warnings, a missing primary endpoint an error. Score is 0.0 for none, 2.0 for one or two, 4.0 for three or more, plus 1.0 for a missing primary endpoint, capped at 5.0. Returns zero when a section or any terms are absent. The metadata also reports the discrepancy counts and the discrepancy rate, the share of distinct variables that fail to appear in both sections. It detects post-hoc, exploratory, unplanned, or not-pre-specified disclosure in the Results; when present, undeclared variables are counted as disclosed rather than silent and excluded from the count that sets the score, with the disclosure flag and effective count recorded.

Why this matters

Discrepancy between the variables a study plans and the variables it reports is the operational definition of outcome reporting bias. Comparing trial protocols with publications shows outcomes frequently added, omitted, or switched in ways that track statistical significance, so a planned variable missing from the results is more often suppressed than forgotten. The opposite discrepancy, a result for a variable never declared, reflects the undisclosed analytic flexibility that manufactures false positives. A declared primary endpoint absent from the results removes the study's own pre-specified success criterion and is the most serious case.

Score thresholds

0
The variables declared and reported correspond, or sections were unavailable
2
One or two variables differ between Methods and Results
4-5
Three or more variables differ, or a declared primary endpoint is missing from the Results

Limitations

Extracting variables from prose is approximate: the cues miss variables in other phrasings and can capture non-variable fragments, so both sets carry noise and a flag prompts reading the sections rather than a verdict. The tolerant matcher trades precision for recall, so related but distinct variables may merge and a real discrepancy be missed, while stem-sharing unrelated terms may wrongly merge. Both a Methods and a Results section are required, so unsegmented documents are skipped, and tables and figures, where a variable's results may appear, are not read. It compares names, not the analyses behind them. Sample-size consistency is indicator R2 and design-test fit is indicator R1; R3 focuses on the correspondence of variables between Methods and Results.

References

  1. Chan AW, Hrobjartsson A, Haahr MT, Gotzsche PC, Altman DG. (2004). Empirical evidence for selective reporting of outcomes in randomized trials: comparison of protocols to published articles. JAMA
  2. Dwan K, Altman DG, Arnaiz JA, et al.. (2008). Systematic review of the empirical evidence of study publication bias and outcome reporting bias. PLoS ONE
  3. Simmons JP, Nelson LD, Simonsohn U. (2011). False-positive psychology: undisclosed flexibility in data collection and analysis allows presenting anything as significant. Psychological Science
  4. Kirkham JJ, Dwan KM, Altman DG, et al.. (2010). The impact of outcome reporting bias in randomised controlled trials on a cohort of systematic reviews. BMJ 340:c365
  5. Goldacre B, Drysdale H, Dale A, et al.. (2019). COMPare: a prospective cohort study correcting and monitoring 58 misreported trials in real time. Trials 20(1):118
  6. Mansournia MA, Collins GS, Nielsen RO, Nazemipour M, Jewell NP, Altman DG, Campbell MJ. (2021). CHecklist for statistical Assessment of Medical Papers: the CHAMP statement. British Journal of Sports Medicine 55(18):1002-1003
  7. Parker L, Boughton S, Lawrence R, Bero L. (2022). Experts identified warning signs of fraudulent research: a qualitative study to inform a screening tool. Journal of Clinical Epidemiology 151:1-17
  8. Carlisle JB. (2021). False individual patient data and zombie randomised controlled trials submitted to Anaesthesia. Anaesthesia 76(4):472-479
  9. Wilkinson J, Heal C, Antoniou GA, et al.. (2024). A survey of experts to identify methods to detect problematic studies: stage 1 of the INveStigating ProblEmatic Clinical Trials in Systematic Reviews project. Journal of Clinical Epidemiology 175:111512
  10. Crone G, Green CD. (2025). Tools of the data detective: A review of statistical methods to detect data and result anomalies in psychology. Theory & Psychology 35(3):359-380
  11. Bordewijk EM, Li W, van Eekelen R, et al.. (2021). Methods to assess research misconduct in health-related research: A scoping review. Journal of Clinical Epidemiology 136:189-202