ResAIKit
Research Integrity Toolkit
Back to the encyclopedia
R6Statistical analysisMethodological CoherenceLayer 2 (Contextual)

Descriptive Reporting

Checks that the descriptive statistics a paper reports suit the shape of the data. A strictly positive variable whose standard deviation exceeds its mean is almost certainly skewed, so summarising it with a mean plus or minus a standard deviation, in a small sample and without acknowledging the skew, can mislead; reporting a median and interquartile range would be more honest. The indicator also flags a parametric test run on a small sample with no normality verification. It reads the extracted summary statistics and the article text.

Technical description

R6 is a contextual check that the chosen descriptive statistics match the underlying distribution. For each extracted mean-SD-N triplet with a positive mean and standard deviation it computes the coefficient of variation, the standard deviation divided by the mean. When the mean of a positive variable lies within two standard deviations of zero, equivalently a coefficient of variation above 0.5, a symmetric distribution would place an appreciable mass below zero, which a positive quantity cannot have, so the data are skewed; when this occurs in a small sample, below fifteen, and the text contains no acknowledgement of non-normality, the indicator flags the mean-plus-or-minus-standard-deviation report as potentially misleading. This mean-minus-two-standard-deviations criterion supersedes the earlier fixed coefficient-of-variation cutoff of 1. Separately, it checks whether parametric-test cues, such as t-test, ANOVA, or Pearson correlation, appear alongside a small-sample triplet without any normality-verification cue such as Shapiro-Wilk, Kolmogorov-Smirnov, a normality test, or a stated normal distribution. Each issue contributes to the score.

How it works

Each triplet with mean and standard deviation both positive is examined; the coefficient of variation is the ratio of the two. If the mean minus two standard deviations falls below zero (a coefficient of variation above 0.5) and the triplet's sample size is below fifteen, the text is searched for an acknowledgement of non-normality, the cues being non-normally distributed, skewed, skewness, interquartile or IQR, and log-transform; absent any of these the variable is counted as likely-skewed-but-reported-parametrically and a warning is added. The acknowledgement cues deliberately exclude a bare mention of the median, which is too common for unrelated reasons to signal that this variable was reported non-parametrically. The parametric check fires once if a parametric-test cue is present, at least one triplet has a sample size below fifteen, and no normality cue appears. The score is 0.0 for no issue, 2.0 for one or two, and 4.0 for three or more, capped at 5.0. The metadata records the number of triplets checked, the count flagged as likely skewed, and whether the parametric-without-normality issue fired.

Score thresholds

Score Meaning
0 Descriptive statistics suit the data, or non-normality is acknowledged.
2 to 3 One or two variables are reported with mean and SD despite likely skew, or a parametric test lacks a normality check.
4 to 5 Three or more such reporting problems.

Why this matters

How a distribution is summarised determines whether readers can interpret it, and the wrong summary for a skewed variable distorts the picture. Altman and Bland showed precisely how skewness can be detected from summary information, noting that on a positive quantity a mean lying within two standard deviations of zero implies a normal model that would extend below zero, a clear sign of a skewed distribution for which the mean and standard deviation are a poor description, which is the inference this indicator's mean-minus-two-standard-deviations rule encodes [1]. The reporting guidelines act on the same principle: the SAMPL guidance of Lang and Altman directs authors to report a median and interquartile range for skewed data rather than a mean and standard deviation, and to state how distributional assumptions were checked before applying parametric tests [2]. Strasak and colleagues list both failures, summarising skewed data with the mean and applying parametric tests without verifying their assumptions, among the recurrent statistical errors in medical research, since each can make an analysis look sounder than it is [3]. Requiring a genuine non-normality cue, rather than any mention of the median, keeps the check from being silently disabled by the median values that papers report for unrelated quantities such as follow-up time. The CHAMP checklist for statistical assessment asks reviewers to confirm that the summary statistics and tests suit the data distribution [4], and research-integrity screening treats a mismatch between data shape and chosen summary as a trustworthiness signal: expert-derived warning signs [5], audits of fabricated trials [6], the INSPECT-SR instrument [7], and reviews of the statistical data-detective toolkit [8] all examine whether the reporting fits the distribution.

Limitations

The criterion is a heuristic: a positive variable whose mean is within two standard deviations of zero is probably but not certainly skewed, and a genuinely skewed variable whose mean stays beyond that bound is not caught, so the check trades completeness for specificity. The small-sample gate means the same skewed reporting in a large sample is not flagged, on the rationale that the concern is most acute when few observations are available. The acknowledgement and the parametric and normality cues are searched over the whole text, so an acknowledgement of one variable's skew, or a normality test reported for one analysis, exempts the others, and the parametric and small-sample signals may belong to different analyses. The indicator reads summary triplets and text, not the raw data, so it cannot confirm skewness directly. Whether the reported summary statistics are internally consistent is the domain of the granularity indicators, so R6 focuses on whether the descriptive choice fits the distribution.

Theoretical background

R6 rests on the relationship between a variable's support and the summaries appropriate to it. For a quantity that cannot be negative, the mean and standard deviation are adequate only when the distribution is roughly symmetric, and symmetry becomes implausible once the mean falls within two standard deviations of zero, because the lower tail would then extend below zero; the mean-minus-two-standard-deviations criterion therefore acts as a detector of skewness that needs only the reported summary, which is exactly the diagnostic Altman and Bland describe. When data are skewed, the mean is pulled toward the long tail and the standard deviation overstates the typical spread, so the median and interquartile range, being order statistics, describe the distribution more faithfully. The parametric branch encodes a parallel principle: tests such as the t-test and analysis of variance derive their reference distributions from a normality assumption that is most fragile at small sample sizes, where neither the central limit theorem nor a visual check offers much protection, so the assumption should be verified explicitly. The deliberate exclusion of a bare median from the acknowledgement cues reflects that the informative signal is the dispersion measure that accompanies a median, the interquartile range, rather than the median itself, which appears throughout papers for quantities that have nothing to do with the variable in question.

References

  1. Altman DG, Bland JM. Statistics notes: detecting skewness from summary information. BMJ. 1996;313(7066):1200. DOI: 10.1136/bmj.313.7066.1200
  2. Lang TA, Altman DG. Basic statistical reporting for articles published in biomedical journals: the Statistical Analyses and Methods in the Published Literature (SAMPL) guidelines. In: Smart P, Maisonneuve H, Polderman A, eds. Science Editors' Handbook. European Association of Science Editors; 2013. https://www.equator-network.org/reporting-guidelines/sampl/
  3. Strasak AM, Zaman Q, Pfeiffer KP, Göbel G, Ulmer H. Statistical errors in medical research: a review of common pitfalls. Swiss Medical Weekly. 2007;137(3-4):44-49. https://smw.ch/index.php/smw/article/view/693
  4. Mansournia MA, Collins GS, Nielsen RO, et al. CHecklist for statistical Assessment of Medical Papers: the CHAMP statement. British Journal of Sports Medicine. 2021;55(18):1002-1003. DOI: 10.1136/bjsports-2020-103651
  5. Parker L, Boughton S, Lawrence R, Bero L. Experts identified warning signs of fraudulent research: a qualitative study to inform a screening tool. Journal of Clinical Epidemiology. 2022;151:1-17. DOI: 10.1016/j.jclinepi.2022.07.006
  6. Carlisle JB. False individual patient data and zombie randomised controlled trials submitted to Anaesthesia. Anaesthesia. 2021;76(4):472-479. DOI: 10.1111/anae.15263
  7. Wilkinson J, Heal C, Antoniou GA, et al. A survey of experts to identify methods to detect problematic studies: stage 1 of the INveStigating ProblEmatic Clinical Trials in Systematic Reviews project. Journal of Clinical Epidemiology. 2024;175:111512. DOI: 10.1016/j.jclinepi.2024.111512
  8. Crone G, Green CD. Tools of the data detective: a review of statistical methods to detect data and result anomalies in psychology. Theory & Psychology. 2025;35(3):359-380. DOI: 10.1177/09593543241311861