ResAIKit
Research Integrity Toolkit
Back to the encyclopedia
R5Statistical analysisMethodological CoherenceLayer 2 (Contextual)

Power Calculation

Checks whether the paper includes a sample size justification or power calculation, and if present, whether the calculation is consistent with the actual sample size used.

Technical description

R5 checks for the presence and plausibility of a sample-size justification. It searches the text, favouring the Methods section, for power-analysis cues (power analysis, sample size calculation, power calculation, a priori power), treating two or more supportive cues (effect size, Cohen's d, Type I error) plus a power percentage as equivalent evidence. When a calculation is present it extracts the significance level alpha, the target power percentage, the effect size, and the calculated N, and checks them: a level above the conventional 0.05 inflates the false-positive rate, and a target power below 80 percent is below convention. When absent it judges the study by the largest sample found. Where regression is mentioned with a predictor count, it applies the rule that N should be at least ten times the predictors. It also classifies the sample-size justification into the Lakens taxonomy (power analysis, accuracy, heuristic, resource constraint, or none).

How it works

Layer 2 (contextual): power and supportive cues are matched by regex. If a calculation is found, alpha, target power, effect size, and N are extracted. A significance level above 0.05 adds 1.0 (a stricter level below 0.05 is conservative and not penalised); a target power below 80 percent adds 1.0. An assumed effect size at or above Cohen's d of 1.5, far above the large-effect benchmark of 0.8, adds 1.0, since an inflated effect understates the required sample (the sample-size samba). If no calculation is found, the largest N in the text or triplets is taken: below 30 sets the score to 4.0, an adequate or unknown N to 2.0. In any branch, if regression is mentioned and the maximum N is below ten times the stated predictor count, 1.0 is added. Capped at 5.0. Metadata records whether a calculation was found, the alpha, power, and effect size, whether that effect size is implausibly large, the maximum N and whether it reaches 30, the predictor count, and the Lakens justification type (power analysis, accuracy, heuristic, resource constraint, or none).

Why this matters

Justifying the sample size is a basic expectation of sound design, and getting the justification right matters as much as having one. The conventional target of 80 percent power and the effect-size benchmarks a calculation depends on are well established, so a power well below convention signals a study planned to miss real effects. Low power both reduces the chance of detecting a true effect and lowers the probability that a significant finding is real while inflating its apparent size, so an unjustified small sample is a substantive reliability concern. Reporting standards make the justification mandatory, so its absence is a documented gap.

Score thresholds

0
A power analysis is present with sensible parameters
2-3
No sample-size justification with an adequate sample, or a present calculation with a questionable parameter
4-5
No justification with a small sample, or several parameter problems together

Limitations

Detection is keyword-based, so a justification phrased without the expected cues is missed and the study wrongly treated as lacking one, while a passing mention of effect size can be read as a calculation. Extracted parameters are the first matches found, which can belong to a different analysis, and the regression check relies on a stated predictor count and the largest N, neither always the relevant figure. The thresholds (80 percent power, 0.05 alpha, ten-per-predictor) are conventions, so a defensible departure is flagged and a finding prompts reading the justification. The indicator does not recompute the sample size from the parameters, so it detects a missing calculation or out-of-convention inputs, not an internally wrong one. Sample-size consistency across sections is indicator R2; R5 focuses on the presence and plausibility of the power justification.

References

  1. Cohen J. (1988). Statistical Power Analysis for the Behavioral Sciences (2nd ed.). Lawrence Erlbaum Associates
  2. Button KS, Ioannidis JPA, Mokrysz C, Nosek BA, Flint J, Robinson ESJ, Munafo MR. (2013). Power failure: why small sample size undermines the reliability of neuroscience. Nature Reviews Neuroscience
  3. Schulz KF, Altman DG, Moher D (CONSORT Group). (2010). CONSORT 2010 Statement: updated guidelines for reporting parallel group randomised trials. BMJ
  4. Schulz KF, Grimes DA. (2005). Sample size calculations in randomised trials: mandatory and mystical. The Lancet 365(9467):1348-1353
  5. Lakens D. (2022). Sample size justification. Collabra: Psychology 8(1):33267
  6. Mansournia MA, Collins GS, Nielsen RO, Nazemipour M, Jewell NP, Altman DG, Campbell MJ. (2021). CHecklist for statistical Assessment of Medical Papers: the CHAMP statement. British Journal of Sports Medicine 55(18):1002-1003
  7. Parker L, Boughton S, Lawrence R, Bero L. (2022). Experts identified warning signs of fraudulent research: a qualitative study to inform a screening tool. Journal of Clinical Epidemiology 151:1-17
  8. Wilkinson J, Heal C, Antoniou GA, et al.. (2024). A survey of experts to identify methods to detect problematic studies: stage 1 of the INveStigating ProblEmatic Clinical Trials in Systematic Reviews project. Journal of Clinical Epidemiology 175:111512
  9. Crone G, Green CD. (2025). Tools of the data detective: A review of statistical methods to detect data and result anomalies in psychology. Theory & Psychology 35(3):359-380