Descriptive Reporting
Checks whether the descriptive statistics suit the data distribution: flags mean and SD reporting for likely-skewed small-sample variables and parametric tests run without a normality check.
Technical description
R6 checks that the chosen descriptive statistics match the underlying distribution. For each mean-SD-N triplet with positive mean and standard deviation it computes the coefficient of variation (standard deviation over mean). When the mean of a positive variable lies within two standard deviations of zero (equivalently a coefficient of variation above 0.5), a symmetric distribution would place mass below zero, which a positive quantity cannot have, so the data are skewed; when this occurs in a small sample (below fifteen) with no acknowledgement of non-normality, the mean-plus-or-minus-SD report is flagged as potentially misleading. This mean-minus-two-SD criterion supersedes the earlier fixed coefficient-of-variation cutoff of 1. Separately, it flags parametric-test cues (t-test, ANOVA, Pearson correlation) appearing with a small-sample triplet but no normality-verification cue (Shapiro-Wilk, Kolmogorov-Smirnov, normality test, normally distributed).
How it works
Layer 2 (contextual): each triplet with positive mean and SD is examined; the coefficient of variation is their ratio. If the mean minus two standard deviations falls below zero (a coefficient of variation above 0.5) and the sample size is below fifteen, the text is searched for a non-normality cue (non-normally distributed, skewed, skewness, interquartile or IQR, log-transform), and absent any the variable is counted as likely-skewed-but-reported-parametrically with a warning; a bare median is deliberately excluded as too common to be informative. The parametric check fires once if a parametric-test cue is present, at least one triplet has a sample size below fifteen, and no normality cue appears. The score is 0.0 for none, 2.0 for one or two, 4.0 for three or more, capped at 5.0.
Why this matters
The wrong summary for a skewed variable distorts interpretation. On a positive quantity a standard deviation as large as the mean is a clear sign of skew for which mean and SD are a poor description, and reporting guidelines direct authors to use a median and interquartile range for skewed data and to state how distributional assumptions were checked before parametric tests. Summarising skewed data with the mean, and applying parametric tests without verifying their assumptions, are recurrent statistical errors that make an analysis look sounder than it is. Requiring a genuine non-normality cue, not any mention of the median, keeps the check from being disabled by medians reported for unrelated quantities.
Score thresholds
- 0
- Descriptive statistics suit the data, or non-normality is acknowledged
- 2-3
- One or two variables reported with mean and SD despite likely skew, or a parametric test without a normality check
- 4-5
- Three or more such reporting problems
Limitations
The coefficient-of-variation rule is a heuristic: a positive variable above 1 is probably but not certainly skewed, and a skewed variable below 1 is not caught, trading completeness for specificity. The small-sample gate means the same skewed reporting in a large sample is not flagged. The acknowledgement and the parametric and normality cues are searched over the whole text, so an acknowledgement or a normality test for one analysis exempts the others, and the parametric and small-sample signals may belong to different analyses. The indicator reads summary triplets and text, not the raw data, so it cannot confirm skewness directly. Internal consistency of the summary statistics is the domain of the granularity indicators; R6 focuses on whether the descriptive choice fits the distribution.
References
- Altman DG, Bland JM. (1996). Statistics notes: detecting skewness from summary information. BMJ
- Lang TA, Altman DG. (2013). Basic statistical reporting for articles published in biomedical journals: the SAMPL guidelines. Science Editors' Handbook (European Association of Science Editors)
- Strasak AM, Zaman Q, Pfeiffer KP, Goebel G, Ulmer H. (2007). Statistical errors in medical research: a review of common pitfalls. Swiss Medical Weekly
- Mansournia MA, Collins GS, Nielsen RO, Nazemipour M, Jewell NP, Altman DG, Campbell MJ. (2021). CHecklist for statistical Assessment of Medical Papers: the CHAMP statement. British Journal of Sports Medicine 55(18):1002-1003
- Parker L, Boughton S, Lawrence R, Bero L. (2022). Experts identified warning signs of fraudulent research: a qualitative study to inform a screening tool. Journal of Clinical Epidemiology 151:1-17
- Carlisle JB. (2021). False individual patient data and zombie randomised controlled trials submitted to Anaesthesia. Anaesthesia 76(4):472-479
- Wilkinson J, Heal C, Antoniou GA, et al.. (2024). A survey of experts to identify methods to detect problematic studies: stage 1 of the INveStigating ProblEmatic Clinical Trials in Systematic Reviews project. Journal of Clinical Epidemiology 175:111512
- Crone G, Green CD. (2025). Tools of the data detective: A review of statistical methods to detect data and result anomalies in psychology. Theory & Psychology 35(3):359-380