Prespecification
Checks whether the hypothesis and primary endpoint are pre-specified in the Methods and detects signs of HARKing, such as many declared primary endpoints or subgroup analyses that appear only in the Results.
Technical description
R10 checks for pre-specification and against HARKing (hypothesising after the results are known). It requires a Methods section and looks there for a hypothesis or primary-endpoint declaration, and treats a trial- or pre-registration reference anywhere in the text as an equivalent pre-specification signal. It counts primary-endpoint declarations in the Methods, treating more than three as a dilution of pre-specification into multiple primaries. It compares subgroup mentions between Methods and Results, flagging a subgroup analysis present in the Results but absent from the Methods as possible subgroup fishing, and notes post-hoc or exploratory analyses, acceptable when labelled but contributing a mild signal. The score reflects, from low to high, a clearly pre-specified study, a declared exploratory extension, and the HARKing markers of many primaries or undeclared subgroups.
How it works
Layer 2 (contextual): the Methods section is searched for hypothesis cues (hypothesis, hypothesised, primary endpoint, primary outcome, primary objective, we aimed to); their absence, together with the absence of any trial- or pre-registration reference (an NCT or ISRCTN identifier, clinicaltrials.gov, PROSPERO, or a stated registration), raises the score to at least 2.0. Primary-endpoint declarations are counted within the Methods rather than across the paper, so a single endpoint re-stated elsewhere is not mistaken for several; more than three raises the score to at least 4.0. A subgroup mention in the Results but absent from the Methods is flagged as fishing and raises the score to at least 4.0. A post-hoc or exploratory mention raises it to at least 2.0. Capped at 5.0. Metadata records whether a hypothesis was found, whether a registration reference was found, the Methods primary-endpoint count, whether subgroup fishing was detected, and whether a post-hoc or exploratory analysis was declared.
Why this matters
Pre-specification separates a confirmatory test from an exploratory search, and abandoning it undermines the meaning of a p-value. HARKing presents a hypothesis formed after seeing the data as though stated in advance, converting an exploratory finding into a spurious confirmation; the outcomes emphasised in published trials frequently differ from their protocols in ways that favour significance; and the flexibility to add unplanned subgroup analyses inflates the false-positive rate. A missing hypothesis, a proliferation of primaries, and undeclared subgroup analyses are the observable traces of these practices.
Score thresholds
- 0
- A hypothesis and a single primary endpoint are pre-specified, with no undeclared analyses
- 2
- No clear hypothesis declaration, or an openly labelled exploratory analysis
- 4-5
- More than three declared primary endpoints, or a subgroup analysis that appears only in the Results
Limitations
The check operates on the Methods and Results text, so it depends on those sections being identified and the statements appearing in them; a hypothesis or pre-specified subgroup stated only in a protocol or supplement is missed, producing a false flag. Detection is keyword-based, so an unconventionally phrased hypothesis is not recognised and a subgroup or post-hoc mention is matched literally. Counting primaries in the Methods reduces but does not remove ambiguity, since co-primary endpoints can be legitimate when declared with multiplicity control, which is not assessed. A genuinely pre-specified subgroup described only in the Results is flagged. The indicator detects the markers of HARKing and outcome multiplication, not the intent, so a flag prompts comparison against the protocol rather than proving misconduct.
References
- Kerr NL. (1998). HARKing: hypothesizing after the results are known. Personality and Social Psychology Review
- Chan AW, Hrobjartsson A, Haahr MT, Gotzsche PC, Altman DG. (2004). Empirical evidence for selective reporting of outcomes in randomized trials: comparison of protocols to published articles. JAMA
- Simmons JP, Nelson LD, Simonsohn U. (2011). False-positive psychology: undisclosed flexibility in data collection and analysis allows presenting anything as significant. Psychological Science
- Hardwicke TE, Wagenmakers EJ. (2023). Reducing bias, increasing transparency and calibrating confidence with preregistration. Nature Human Behaviour 7(1):15-26
- Goldacre B, Drysdale H, Dale A, et al.. (2019). COMPare: a prospective cohort study correcting and monitoring 58 misreported trials in real time. Trials 20(1):118
- Mansournia MA, Collins GS, Nielsen RO, Nazemipour M, Jewell NP, Altman DG, Campbell MJ. (2021). CHecklist for statistical Assessment of Medical Papers: the CHAMP statement. British Journal of Sports Medicine 55(18):1002-1003
- Parker L, Boughton S, Lawrence R, Bero L. (2022). Experts identified warning signs of fraudulent research: a qualitative study to inform a screening tool. Journal of Clinical Epidemiology 151:1-17
- Wilkinson J, Heal C, Antoniou GA, et al.. (2024). A survey of experts to identify methods to detect problematic studies: stage 1 of the INveStigating ProblEmatic Clinical Trials in Systematic Reviews project. Journal of Clinical Epidemiology 175:111512
- Crone G, Green CD. (2025). Tools of the data detective: A review of statistical methods to detect data and result anomalies in psychology. Theory & Psychology 35(3):359-380