R6Statistical analysisMethodological CoherenceLayer 2 (Contextual)

Descriptive Reporting

Checks whether the descriptive statistics suit the data distribution: flags mean and SD reporting for likely-skewed small-sample variables and parametric tests run without a normality check.

Technical description

R6 checks that the chosen descriptive statistics match the underlying distribution. For each mean-SD-N triplet with positive mean and standard deviation it computes the coefficient of variation (standard deviation over mean). When the mean of a positive variable lies within two standard deviations of zero (equivalently a coefficient of variation above 0.5), a symmetric distribution would place mass below zero, which a positive quantity cannot have, so the data are skewed; when this occurs in a small sample (below fifteen) with no acknowledgement of non-normality, the mean-plus-or-minus-SD report is flagged as potentially misleading. This mean-minus-two-SD criterion supersedes the earlier fixed coefficient-of-variation cutoff of 1. Separately, it flags parametric-test cues (t-test, ANOVA, Pearson correlation) appearing with a small-sample triplet but no normality-verification cue (Shapiro-Wilk, Kolmogorov-Smirnov, normality test, normally distributed).

How it works

Layer 2 (contextual): each triplet with positive mean and SD is examined; the coefficient of variation is their ratio. If the mean minus two standard deviations falls below zero (a coefficient of variation above 0.5) and the sample size is below fifteen, the text is searched for a non-normality cue (non-normally distributed, skewed, skewness, interquartile or IQR, log-transform), and absent any the variable is counted as likely-skewed-but-reported-parametrically with a warning; a bare median is deliberately excluded as too common to be informative. The parametric check fires once if a parametric-test cue is present, at least one triplet has a sample size below fifteen, and no normality cue appears. The score is 0.0 for none, 2.0 for one or two, 4.0 for three or more, capped at 5.0.

Why this matters

The wrong summary for a skewed variable distorts interpretation. On a positive quantity a standard deviation as large as the mean is a clear sign of skew for which mean and SD are a poor description, and reporting guidelines direct authors to use a median and interquartile range for skewed data and to state how distributional assumptions were checked before parametric tests. Summarising skewed data with the mean, and applying parametric tests without verifying their assumptions, are recurrent statistical errors that make an analysis look sounder than it is. Requiring a genuine non-normality cue, not any mention of the median, keeps the check from being disabled by medians reported for unrelated quantities.

Score thresholds

0: Descriptive statistics suit the data, or non-normality is acknowledged
2-3: One or two variables reported with mean and SD despite likely skew, or a parametric test without a normality check
4-5: Three or more such reporting problems

Limitations

The coefficient-of-variation rule is a heuristic: a positive variable above 1 is probably but not certainly skewed, and a skewed variable below 1 is not caught, trading completeness for specificity. The small-sample gate means the same skewed reporting in a large sample is not flagged. The acknowledgement and the parametric and normality cues are searched over the whole text, so an acknowledgement or a normality test for one analysis exempts the others, and the parametric and small-sample signals may belong to different analyses. The indicator reads summary triplets and text, not the raw data, so it cannot confirm skewness directly. Internal consistency of the summary statistics is the domain of the granularity indicators; R6 focuses on whether the descriptive choice fits the distribution.