R7Statistical analysisMethodological CoherenceLayer 1 (Deterministic)

Software Declaration

Checks that the paper names the statistical software it used, gives its version, and that the named software can actually do the analyses described. Reporting the software and version is a basic reproducibility requirement, and a tool with known limitations, such as a spreadsheet used for advanced modelling, raises doubt about how the analysis was really done. The indicator detects the software, looks for a version, and compares the described methods against the software's documented capabilities. It reads the article text.

Technical description

R7 is a deterministic check of the statistical-software declaration. It loads a dictionary mapping each software to its aliases, a version-detection pattern, a list of capabilities, and a list of limitations. It searches the text for each software by name and alias, using strict boundaries for one- and two-character names such as R so that a primer or gene label is not mistaken for the software. For each detected software it looks for a version number with that software's version pattern. It then scans the text for analysis cues, Bayesian methods, mixed-effects models, logistic regression, survival analysis, structural equation modelling, and advanced or multiple regression, and treats a described analysis as a capability mismatch only when none of the declared software can perform it. The score reflects, in order, the absence of any software, a capability mismatch, a missing version, and a partial version declaration. Independently of the score, the indicator detects whether the paper provides a code or data availability statement (a sharing phrase, a public-repository reference such as GitHub, OSF, or Zenodo, or an accession number); software declared with neither code nor data shared draws an informational reproducibility note.

How it works

Each software's key and aliases are matched case-insensitively, with word boundaries for ordinary names and a stricter non-letter, non-hyphen boundary for names of two characters or fewer. If no software is found at all the score is 4.0 with a finding. Otherwise each detected software is tested for a version using its version pattern, and the described methods are collected. A software supports a method if it has the catch-all capability or does not list any of that method's limitation tags; a method is a mismatch only if no detected software supports it, in which case the score is 3.0 with a finding naming the unsupported analyses and the declared software. If there is no mismatch but no version was found for any software the score is 2.0, if every software has a version the score is 0.0, and if some have a version and some do not the score is 1.0; each versionless software also produces an informational finding. The score is capped at 5.0. The metadata records the software found, whether any version was detected, and whether a capability mismatch was found, together with the software lacking a version, the analysis methods detected, any methods unsupported by every declared tool, and whether a code or data availability statement is present; when software is declared but neither code nor data is shared, an informational finding is added, since reproducibility depends on sharing the analysis code and data as well as naming the tool.

Score thresholds

Score	Meaning
0	Software and version declared, and the declared tools can perform the analyses.
1 to 2	A version is missing for some or all declared software.
3	A described analysis is supported by none of the declared software.
4 to 5	No statistical software is declared at all.

Why this matters

Naming the statistical software and its version is a basic condition for reproducibility, because results can depend on the implementation and even the release. The SAMPL reporting guidance of Lang and Altman directs authors to state the statistical software, including its version, used for the analyses, so an absent declaration or a missing version is a documented reporting gap [1]. The capability check rests on the fact that software differs in what it can do correctly: McCullough and Heiser documented that a spreadsheet widely used for analysis fails standard accuracy tests for statistical distributions, random-number generation, and estimation, so an advanced analysis attributed to such a tool is implausible and worth questioning [2]. More broadly, a manifesto for reproducible science by Munafò and colleagues places transparent reporting of methods and analytic tools among the core measures for credible research, framing the declaration of software as part of the methodological record rather than an optional detail [3]. Flagging a capability mismatch only when no declared tool can perform the analysis reflects that researchers routinely combine programs, using one for description and another for modelling, so the concern is the absence of any capable tool, not the presence of a limited one. That spreadsheets remain a poor fit for research computing is not merely historical: Abeysooriya and colleagues found that automatic gene-name corruption in spreadsheet supplements continued to rise years after the problem was first reported, a concrete reminder that the tool named matters [4]. Broader reproducibility scholarship places transparent reporting of analytic software among the practices that make findings credible [5], the CHAMP checklist asks reviewers to confirm the software and version are stated [6], and research-integrity screening, through expert-derived warning signs [7], the INSPECT-SR instrument [8], and reviews of the data-detective toolkit [9], treats an absent or implausible software declaration as a transparency signal.

Limitations

Detection depends on the software dictionary, so an unlisted program is treated as no declaration, and version detection depends on the version appearing in a recognised form. The capability mapping is coarse: it knows a fixed set of methods and their limitation tags, so an analysis outside that set is not checked, and a tool's listed limitations are a simplification of its real abilities, which extensions and add-on packages can change. The analysis cues are matched anywhere in the text, so a method mentioned in the background rather than performed can be read as a described analysis, and the indicator cannot tell which declared tool actually ran which analysis, only whether some declared tool could. A catch-all capability exempts a tool entirely, so a flexible environment is never flagged even if a specific analysis would be unusual in it. The indicator checks the declaration, not whether the software was used correctly.

Theoretical background

R7 treats the software declaration as part of the reproducible record of an analysis. A statistical result is the output of a procedure implemented in a particular program at a particular version, and because implementations differ in algorithms, defaults, and numerical accuracy, the program and version are part of the information needed to reproduce or scrutinise the result. The indicator therefore checks three nested conditions: that a tool is named, that its version is given, and that the named tool is capable of the stated analysis. The capability condition encodes a consistency requirement between the methods section and the tools section, since a described analysis that no declared program can perform implies either an undeclared tool or a misdescribed method, both of which weaken the account of how the result was produced. The decision to judge capability against the whole set of declared tools, rather than each tool in isolation, follows from how analyses are actually conducted: a study that names a spreadsheet for tabulation and a full statistical environment for modelling is consistent, and only a study whose described analysis exceeds every tool it names exhibits the gap the indicator is meant to detect. The graded score orders the conditions by severity, from the complete absence of a declaration down to the cosmetic omission of a version. Naming the tool is necessary but not sufficient for reproducibility, so the indicator separately notes whether the analysis code and the data are shared, the transparency practices that let an independent reader rerun and check the analysis.

References

Lang TA, Altman DG. Basic statistical reporting for articles published in biomedical journals: the Statistical Analyses and Methods in the Published Literature (SAMPL) guidelines. In: Smart P, Maisonneuve H, Polderman A, eds. Science Editors' Handbook. European Association of Science Editors; 2013. https://www.equator-network.org/reporting-guidelines/sampl/
McCullough BD, Heiser DA. On the accuracy of statistical procedures in Microsoft Excel 2007. Computational Statistics and Data Analysis. 2008;52(10):4570-4578. DOI: 10.1016/j.csda.2008.03.004
Munafò MR, Nosek BA, Bishop DVM, Button KS, Chambers CD, Percie du Sert N, et al. A manifesto for reproducible science. Nature Human Behaviour. 2017;1(1):0021. DOI: 10.1038/s41562-016-0021
Abeysooriya M, Soria M, Kasu MS, Ziemann M. Gene name errors: lessons not learned. PLoS Computational Biology. 2021;17(7):e1008984. DOI: 10.1371/journal.pcbi.1008984
Nosek BA, Hardwicke TE, Moshontz H, et al. Replicability, robustness, and reproducibility in psychological science. Annual Review of Psychology. 2022;73:719-748. DOI: 10.1146/annurev-psych-020821-114157
Mansournia MA, Collins GS, Nielsen RO, et al. CHecklist for statistical Assessment of Medical Papers: the CHAMP statement. British Journal of Sports Medicine. 2021;55(18):1002-1003. DOI: 10.1136/bjsports-2020-103651
Parker L, Boughton S, Lawrence R, Bero L. Experts identified warning signs of fraudulent research: a qualitative study to inform a screening tool. Journal of Clinical Epidemiology. 2022;151:1-17. DOI: 10.1016/j.jclinepi.2022.07.006
Wilkinson J, Heal C, Antoniou GA, et al. A survey of experts to identify methods to detect problematic studies: stage 1 of the INveStigating ProblEmatic Clinical Trials in Systematic Reviews project. Journal of Clinical Epidemiology. 2024;175:111512. DOI: 10.1016/j.jclinepi.2024.111512
Crone G, Green CD. Tools of the data detective: a review of statistical methods to detect data and result anomalies in psychology. Theory & Psychology. 2025;35(3):359-380. DOI: 10.1177/09593543241311861