ResAIKit
Research Integrity Toolkit
Back to the encyclopedia
R7Statistical analysisMethodological CoherenceLayer 1 (Deterministic)

Software Declaration

Checks whether the paper mentions what statistical software was used, and whether that software is capable of performing the analyses described in the paper.

Technical description

R7 checks the statistical-software declaration. It loads a dictionary mapping each software to its aliases, a version-detection pattern, a list of capabilities, and a list of limitations. It searches the text for each software by name and alias, using strict boundaries for one- and two-character names such as R so a primer or gene label is not mistaken for the software, and looks for a version with that software's pattern. It scans the text for analysis cues (Bayesian, mixed-effects, logistic regression, survival analysis, structural equation modelling, advanced or multiple regression) and treats a described analysis as a capability mismatch only when none of the declared software can perform it. The score reflects, in order, the absence of any software, a capability mismatch, a missing version, and a partial version declaration. It also detects code or data availability statements (sharing phrases, repository references such as GitHub, OSF, or Zenodo, accession numbers) and notes software declared with neither code nor data shared.

How it works

Layer 1 (deterministic): each software key and alias is matched case-insensitively, with word boundaries for ordinary names and a stricter non-letter, non-hyphen boundary for names of two characters or fewer. No software found at all scores 4.0. Otherwise each detected software is tested for a version, and the described methods are collected. A software supports a method if it has the catch-all capability or lists none of that method's limitation tags; a method is a mismatch only if no detected software supports it, scoring 3.0 with a finding naming the unsupported analyses. Without a mismatch: no version for any software scores 2.0, every software versioned scores 0.0, and a mix scores 1.0, with an informational finding per versionless tool. Capped at 5.0. The metadata records the software found, whether any version was detected, whether a capability mismatch was found, the software lacking a version, the methods detected, any methods unsupported by every declared tool, and whether a code or data availability statement is present; an informational note is added when software is declared but neither code nor data is shared.

Why this matters

Naming the statistical software and version is a basic condition for reproducibility, because results can depend on the implementation and release. Reporting guidance directs authors to state the software and its version, so an absent declaration or missing version is a documented gap. Software also differs in what it can do correctly: a spreadsheet widely used for analysis fails standard accuracy tests, so an advanced analysis attributed to such a tool is implausible. Flagging a mismatch only when no declared tool can perform the analysis reflects that researchers combine programs, using one for description and another for modelling, so the concern is the absence of any capable tool.

Score thresholds

0
Software and version declared, and the declared tools can perform the analyses
3
A described analysis is supported by none of the declared software
1-2
A version is missing for some or all declared software
4-5
No statistical software is declared at all

Limitations

Detection depends on the software dictionary, so an unlisted program is treated as no declaration, and version detection depends on a recognised form. The capability mapping is coarse: it knows a fixed set of methods and limitation tags, an analysis outside that set is not checked, and listed limitations simplify real abilities that extensions and add-on packages can change. The analysis cues are matched anywhere, so a method mentioned in the background can be read as performed, and the indicator cannot tell which declared tool ran which analysis, only whether some declared tool could. A catch-all capability exempts a tool entirely. The indicator checks the declaration, not whether the software was used correctly.

References

  1. Lang TA, Altman DG. (2013). Basic statistical reporting for articles published in biomedical journals: the SAMPL guidelines. Science Editors' Handbook (European Association of Science Editors)
  2. McCullough BD, Heiser DA. (2008). On the accuracy of statistical procedures in Microsoft Excel 2007. Computational Statistics and Data Analysis
  3. Munafo MR, Nosek BA, Bishop DVM, Button KS, Chambers CD, et al.. (2017). A manifesto for reproducible science. Nature Human Behaviour
  4. Abeysooriya M, Soria M, Kasu MS, Ziemann M. (2021). Gene name errors: lessons not learned. PLoS Computational Biology 17(7):e1008984
  5. Nosek BA, Hardwicke TE, Moshontz H, et al.. (2022). Replicability, robustness, and reproducibility in psychological science. Annual Review of Psychology 73:719-748
  6. Mansournia MA, Collins GS, Nielsen RO, Nazemipour M, Jewell NP, Altman DG, Campbell MJ. (2021). CHecklist for statistical Assessment of Medical Papers: the CHAMP statement. British Journal of Sports Medicine 55(18):1002-1003
  7. Parker L, Boughton S, Lawrence R, Bero L. (2022). Experts identified warning signs of fraudulent research: a qualitative study to inform a screening tool. Journal of Clinical Epidemiology 151:1-17
  8. Wilkinson J, Heal C, Antoniou GA, et al.. (2024). A survey of experts to identify methods to detect problematic studies: stage 1 of the INveStigating ProblEmatic Clinical Trials in Systematic Reviews project. Journal of Clinical Epidemiology 175:111512
  9. Crone G, Green CD. (2025). Tools of the data detective: A review of statistical methods to detect data and result anomalies in psychology. Theory & Psychology 35(3):359-380