Power Calculation
Checks whether the paper includes a sample size justification or power calculation, and if present, whether the calculation is consistent with the actual sample size used.
Technical description
R5 checks for the presence and plausibility of a sample-size justification. It searches the text, favouring the Methods section, for power-analysis cues (power analysis, sample size calculation, power calculation, a priori power), treating two or more supportive cues (effect size, Cohen's d, Type I error) plus a power percentage as equivalent evidence. When a calculation is present it extracts the significance level alpha, the target power percentage, the effect size, and the calculated N, and checks them: a level above the conventional 0.05 inflates the false-positive rate, and a target power below 80 percent is below convention. When absent it judges the study by the largest sample found. Where regression is mentioned with a predictor count, it applies the rule that N should be at least ten times the predictors. It also classifies the sample-size justification into the Lakens taxonomy (power analysis, accuracy, heuristic, resource constraint, or none).
How it works
Layer 2 (contextual): power and supportive cues are matched by regex. If a calculation is found, alpha, target power, effect size, and N are extracted. A significance level above 0.05 adds 1.0 (a stricter level below 0.05 is conservative and not penalised); a target power below 80 percent adds 1.0. An assumed effect size at or above Cohen's d of 1.5, far above the large-effect benchmark of 0.8, adds 1.0, since an inflated effect understates the required sample (the sample-size samba). If no calculation is found, the largest N in the text or triplets is taken: below 30 sets the score to 4.0, an adequate or unknown N to 2.0. In any branch, if regression is mentioned and the maximum N is below ten times the stated predictor count, 1.0 is added. Capped at 5.0. Metadata records whether a calculation was found, the alpha, power, and effect size, whether that effect size is implausibly large, the maximum N and whether it reaches 30, the predictor count, and the Lakens justification type (power analysis, accuracy, heuristic, resource constraint, or none).
Why this matters
Justifying the sample size is a basic expectation of sound design, and getting the justification right matters as much as having one. The conventional target of 80 percent power and the effect-size benchmarks a calculation depends on are well established, so a power well below convention signals a study planned to miss real effects. Low power both reduces the chance of detecting a true effect and lowers the probability that a significant finding is real while inflating its apparent size, so an unjustified small sample is a substantive reliability concern. Reporting standards make the justification mandatory, so its absence is a documented gap.
Score thresholds
- 0
- A power analysis is present with sensible parameters
- 2-3
- No sample-size justification with an adequate sample, or a present calculation with a questionable parameter
- 4-5
- No justification with a small sample, or several parameter problems together
Limitations
Detection is keyword-based, so a justification phrased without the expected cues is missed and the study wrongly treated as lacking one, while a passing mention of effect size can be read as a calculation. Extracted parameters are the first matches found, which can belong to a different analysis, and the regression check relies on a stated predictor count and the largest N, neither always the relevant figure. The thresholds (80 percent power, 0.05 alpha, ten-per-predictor) are conventions, so a defensible departure is flagged and a finding prompts reading the justification. The indicator does not recompute the sample size from the parameters, so it detects a missing calculation or out-of-convention inputs, not an internally wrong one. Sample-size consistency across sections is indicator R2; R5 focuses on the presence and plausibility of the power justification.
References
- Cohen J. (1988). Statistical Power Analysis for the Behavioral Sciences (2nd ed.). Lawrence Erlbaum Associates
- Button KS, Ioannidis JPA, Mokrysz C, Nosek BA, Flint J, Robinson ESJ, Munafo MR. (2013). Power failure: why small sample size undermines the reliability of neuroscience. Nature Reviews Neuroscience
- Schulz KF, Altman DG, Moher D (CONSORT Group). (2010). CONSORT 2010 Statement: updated guidelines for reporting parallel group randomised trials. BMJ
- Schulz KF, Grimes DA. (2005). Sample size calculations in randomised trials: mandatory and mystical. The Lancet 365(9467):1348-1353
- Lakens D. (2022). Sample size justification. Collabra: Psychology 8(1):33267
- Mansournia MA, Collins GS, Nielsen RO, Nazemipour M, Jewell NP, Altman DG, Campbell MJ. (2021). CHecklist for statistical Assessment of Medical Papers: the CHAMP statement. British Journal of Sports Medicine 55(18):1002-1003
- Parker L, Boughton S, Lawrence R, Bero L. (2022). Experts identified warning signs of fraudulent research: a qualitative study to inform a screening tool. Journal of Clinical Epidemiology 151:1-17
- Wilkinson J, Heal C, Antoniou GA, et al.. (2024). A survey of experts to identify methods to detect problematic studies: stage 1 of the INveStigating ProblEmatic Clinical Trials in Systematic Reviews project. Journal of Clinical Epidemiology 175:111512
- Crone G, Green CD. (2025). Tools of the data detective: A review of statistical methods to detect data and result anomalies in psychology. Theory & Psychology 35(3):359-380