Sample Size Consistency
Checks that the number of participants a paper reports is the same wherever it appears. When the abstract says 120 were studied, the methods say 115, and a table totals 110, the discrepancy points to careless editing, undocumented exclusions, or data manipulation. The indicator extracts the sample size from each section and from tables and flags drift between them, and treats a count of analysed participants larger than the number enrolled as impossible. It reads the article text and tables.
Technical description
R2 is a deterministic check that the reported sample size is consistent across the sections of an article and its tables. It extracts candidate N values from each section with a set of patterns covering the common phrasings, such as N equals a number, a number followed by participants, patients, or subjects, a stated sample size, and an enrolled count, and it reads N values from any table column headed N, sample size, or count. For each section it treats the largest extracted value as the reported total sample size, smaller values being per-arm or subgroup counts, and compares these totals across the abstract, methods, and results, and between each section and the tables. A drift above twenty percent is treated as a serious inconsistency and a smaller non-zero drift as a minor one. Separately, it flags the logically impossible case in which the number analysed exceeds the number enrolled, and it reconciles the participant flow: it extracts the reported losses (exclusions, withdrawals, dropouts, losses to follow-up) and flags a flow in which the enrolled count minus those losses does not equal the analysed count.
How it works
The patterns extract a set of N values per section; the largest is taken as that section's total. For the section pairs abstract-methods, abstract-results, and methods-results, and for each section against the tables, the drift between the two totals is the absolute difference over the larger value. A drift above twenty percent adds an error-severity finding and sets the score to at least 4.0; a smaller positive drift adds a warning and sets it to at least 2.0; equal totals are skipped. Enrollment sections (abstract, methods, study design) and analysis sections (results, findings) are then pooled, and if any analysed N exceeds the maximum enrolled N an error finding is added and the score set to at least 4.0, since analysis cannot include more participants than were enrolled. The score is the maximum across all triggered checks, capped at 5.0. The indicator returns zero when no sections, no N values, or only undivided full text are available. The metadata records the N values found per section, the inconsistency count, and, when both an enrolled total and an analysed total are present with the analysed not exceeding the enrolled, the attrition rate (the enrolled minus analysed shrinkage over the enrolled total) as a participant-flow diagnostic. It also extracts the reported losses (exclusions, withdrawals, dropouts, losses to follow-up) and, when an enrolled and an analysed total are both present, checks that enrolled minus losses equals analysed; a flow that does not reconcile adds a warning, with the total lost and whether the flow reconciles recorded in the metadata.
Score thresholds
| Score | Meaning |
|---|---|
| 0 | Sample-size totals agree across sections and tables, or none could be extracted. |
| 2 | A minor drift in the reported total between sections or tables. |
| 4 to 5 | A drift above twenty percent, or an analysed count exceeding the enrolled count. |
Why this matters
Accounting consistently for every participant is a core reporting requirement and a sensitive integrity signal. The CONSORT statement requires authors to report the numbers of participants enrolled, allocated, and analysed and to reconcile them in a flow diagram, precisely because unexplained changes in the denominator between sections obscure attrition and can hide selective exclusion [1]. Strasak and colleagues list inconsistent or unclear reporting of sample size and of the participants actually analysed among the recurring statistical faults in medical research, since a denominator that shifts between the abstract, methods, and results undermines every statistic computed from it [2]. The broader literature on within-paper numerical inconsistency shows how common such discrepancies are and how readily automated checking surfaces them: Nuijten and colleagues found that internal inconsistencies between reported numbers are pervasive in the published record, which motivates a mechanical cross-section check of the most basic number a study reports [3]. An analysed count larger than the enrolled count is the strongest form of the signal, because it is arithmetically impossible and indicates either a transcription error or fabricated data. Targeted data-integrity audits make the sample-size and participant-flow reconciliation a routine check: Bordewijk and colleagues compared enrollment and outcome counts across trials and found denominators inconsistent with proper conduct [4], the CHAMP checklist lists participant accounting among the items reviewers should verify [5], and expert-derived warning-sign tools [6], audits of fabricated trials [7], the INSPECT-SR instrument [8], and reviews of the statistical data-detective toolkit [9] all treat an unexplained shift in the reported sample size as a marker worth examining.
Limitations
Extraction is pattern-based, so an N expressed in an unanticipated phrasing is missed, and a number that matches a pattern but is not a sample size, such as a count of events or a dosage followed by patients, can be captured in error. Taking the largest value as the total is a heuristic that fails when a section reports only subgroup counts without the total, or when the largest number is not the sample size. Legitimate attrition produces a real difference between enrolled and analysed counts, which the indicator reports as a minor drift rather than treating as an error, so a flag is a prompt to check the participant flow rather than proof of misconduct. The check operates on section text, so it depends on the document having been segmented into sections; undivided text is skipped. Whether the reported statistics are internally consistent with the sample size is the domain of the granularity indicators, so R2 focuses on the agreement of the sample size itself across the paper.
Theoretical background
R2 rests on the principle that a single study has one enrollment and one analysed cohort, so the integers describing them must be stable wherever the paper restates them, with the only legitimate change being a documented reduction from enrolled to analysed through attrition or exclusion. The set of numbers a section emits mixes this total with derived counts, per-arm sizes, completer counts, and event tallies, so the informative quantity is the maximum, which corresponds to the broadest population the section refers to and is the natural anchor for comparison. Drift between these anchors across sections measures how far the paper contradicts itself about its own size, and the percentage form makes the threshold scale-free so that a five-participant change matters more in a study of twenty than in one of a thousand. The asymmetric impossibility, an analysed count exceeding the enrolled count, is a hard constraint rather than a soft drift, because no analysis can recruit participants the study never enrolled; flagging it separately and at high severity reflects that it cannot arise from ordinary attrition and must be an error or a fabrication. Comparing totals rather than every pairwise combination of extracted numbers is what keeps the check from mistaking the normal coexistence of a total and its subgroups for a discrepancy. The flow reconciliation applies the same accounting forward: every enrolled participant is either analysed or accounted for as a documented loss, so when the reported exclusions and dropouts do not bridge the gap between enrolled and analysed, the participant flow is internally inconsistent, the integrity a CONSORT flow diagram is meant to make transparent.
References
- Schulz KF, Altman DG, Moher D; CONSORT Group. CONSORT 2010 Statement: updated guidelines for reporting parallel group randomised trials. BMJ. 2010;340:c332. DOI: 10.1136/bmj.c332
- Strasak AM, Zaman Q, Pfeiffer KP, Göbel G, Ulmer H. Statistical errors in medical research: a review of common pitfalls. Swiss Medical Weekly. 2007;137(3-4):44-49. https://smw.ch/index.php/smw/article/view/693
- Nuijten MB, Hartgerink CHJ, van Assen MALM, Epskamp S, Wicherts JM. The prevalence of statistical reporting errors in psychology (1985-2013). Behavior Research Methods. 2016;48(4):1205-1226. DOI: 10.3758/s13428-015-0664-2
- Bordewijk EM, Wang R, Askie LM, et al. Data integrity of 35 randomised controlled trials in women's health. European Journal of Obstetrics & Gynecology and Reproductive Biology. 2020;249:72-83. DOI: 10.1016/j.ejogrb.2020.04.016
- Mansournia MA, Collins GS, Nielsen RO, et al. CHecklist for statistical Assessment of Medical Papers: the CHAMP statement. British Journal of Sports Medicine. 2021;55(18):1002-1003. DOI: 10.1136/bjsports-2020-103651
- Parker L, Boughton S, Lawrence R, Bero L. Experts identified warning signs of fraudulent research: a qualitative study to inform a screening tool. Journal of Clinical Epidemiology. 2022;151:1-17. DOI: 10.1016/j.jclinepi.2022.07.006
- Carlisle JB. False individual patient data and zombie randomised controlled trials submitted to Anaesthesia. Anaesthesia. 2021;76(4):472-479. DOI: 10.1111/anae.15263
- Wilkinson J, Heal C, Antoniou GA, et al. A survey of experts to identify methods to detect problematic studies: stage 1 of the INveStigating ProblEmatic Clinical Trials in Systematic Reviews project. Journal of Clinical Epidemiology. 2024;175:111512. DOI: 10.1016/j.jclinepi.2024.111512
- Crone G, Green CD. Tools of the data detective: a review of statistical methods to detect data and result anomalies in psychology. Theory & Psychology. 2025;35(3):359-380. DOI: 10.1177/09593543241311861