ResAIKit
Research Integrity Toolkit
Back to the encyclopedia
D15Statistical analysisFabrication DetectionLayer 2 (Contextual)

Variance of Variances Inconsistent

Looks at how much the variances of a dataset's columns differ from one another. Different measurements naturally have different amounts of spread, so the variances across columns should themselves vary. When many columns share almost the same variance, or the variance within a single column stays implausibly constant across subsets of the data, it suggests the values were generated from one fixed distribution rather than measured. The indicator compares the column variances and the within-column variance stability and flags suspicious uniformity. It works on the individual-patient data with at least five numeric variables.

Technical description

D15 is a contextual screen for suspiciously uniform variance structure in individual-patient data. It requires at least five numeric columns with non-zero variance and at least twenty non-missing rows each, excluding constant columns because a zero variance both masks genuine uniformity, by adding an outlying zero to the set of variances, and forces the across-column ratio to infinity. It computes each column's variance and derives the coefficient of variation of those variances, a scale-relative measure of how much the variances differ, and the ratio of the largest to the smallest column variance. For columns with at least fifty rows it also splits each into five sequential chunks, computes the variance of each chunk, and measures the within-column coefficient of variation of those chunk variances, averaging across columns. A very low coefficient of variation of the column variances, a near-unity max-to-min ratio, or implausibly stable within-column variance each contribute to the score, with their combination indicating data drawn from a fixed generating distribution.

How it works

The per-column variances are computed on the non-missing values. The coefficient of variation of these variances below 0.10 adds 3.0 and below 0.30 adds 1.5, flagging columns whose variances are nearly identical. A maximum-to-minimum variance ratio below 1.5, with at least five columns, adds 1.5, since real diverse variables usually span a wider range of variance. For columns of at least fifty rows, the within-column chunk-variance coefficient of variation below 0.05, averaged across columns, adds 0.5, flagging variance that does not fluctuate across subsets as real data's does. Bartlett's test of homogeneity of variance across the columns adds a further half point when its p-value exceeds 0.95, meaning the column variances are statistically indistinguishable: genuinely multi-scale variables reject a common variance with a p-value near zero, so an implausibly high one corroborates a single uniform generating spread. The total is capped at 5.0, and each condition raises a finding. The metadata records the number of qualifying columns, the coefficient of variation of the variances, the max-to-min ratio, the mean and median within-column coefficient of variation, Bartlett's p-value, and the column variances.

Score thresholds

Score Meaning
0 Column variances differ as expected of diverse real measurements.
1 to 2 Column variances are somewhat more uniform than expected, or within-column variance is very stable.
3 to 5 Column variances are nearly identical, a strong sign of values drawn from one fixed distribution.

Why this matters

Different quantities in a real study have genuinely different variability, both because they are measured on different scales and because their underlying processes differ, so the set of column variances is itself heterogeneous, and the variance within a column fluctuates across subsamples as real sampling dictates. Data generated by sampling from a single fixed distribution, or by independently drawing each column from a similar template, flattens this structure: the variances become alike and stop fluctuating. Taloni and colleagues showed that a model can fabricate a clinical dataset whose statistical structure is unrealistically regular [1], and the use of variance structure to separate genuine from invented data is established, with Al-Marzouki and colleagues exploiting the variances of trial variables [2] and Simonsohn detecting fabrication from implausibly low variability [3]. The within-column stability check adds a second, scale-free angle: even when columns differ in scale, a genuine variable's variance should not be near-constant across every subset, so its stability points to artificial generation. Recent forensic re-analyses, scoping reviews, and trustworthiness instruments treat implausibly uniform variance structure among the standard screens for fabricated and machine-generated data [4, 5, 6, 7, 8].

Limitations

The cross-column comparisons of raw variance are scale-dependent: variables measured on different units have different variances for reasons unrelated to fabrication, so a real dataset whose variables happen to share a scale, such as a battery of similar instruments or standardised scores, can show a low coefficient of variation of variances and be flagged, while a real dataset of mixed scales will never trip the max-to-min check. The within-column chunk check assumes the row order is meaningful, as in longitudinal data, and is less interpretable when rows are an arbitrary patient ordering. The check requires at least five non-constant numeric columns and twenty rows, with the chunk analysis needing fifty. The thresholds, a coefficient of variation of 0.10 and 0.30, a ratio of 1.5, and a within-column coefficient of variation of 0.05, are heuristic. The univariate excessive-normality and clustering signals are indicators D2 and D4, and the related too-clean distributional check is D13, so D15 focuses on the dispersion of variances across and within columns.

Theoretical background

D15 rests on the idea that variance is itself a random and heterogeneous quantity in real data. Across columns, each variable's variance is set by its measurement scale and its generating process, so a collection of genuine variables has variances spread over a wide range, and the coefficient of variation of those variances, which divides their standard deviation by their mean, is substantial. When data are drawn from a single distribution or from independent draws of a common template, the variances collapse toward a common value and that coefficient of variation falls toward zero, which is the primary signal. The max-to-min ratio is a complementary, coarser reading of the same spread, expected to be well above one for diverse variables. Within a column, the variance estimated on a subsample is itself a random variable whose value fluctuates from chunk to chunk under genuine sampling, so a near-constant chunk variance, captured by a very small within-column coefficient of variation, indicates a generator that holds the spread fixed rather than letting it vary. Because the cross-column metrics carry the scale of the variables, they are most reliable as a relative within-dataset comparison and are interpreted alongside the scale-free within-column check; excluding constant columns keeps the variance set free of the degenerate zeros that would otherwise dominate both the coefficient of variation and the ratio. Bartlett's test casts the same comparison as a formal hypothesis test of equal variances across the columns: real variables on different scales reject the null decisively, so a failure to reject it, an implausibly high p-value, is itself the diagnostic, turning the heuristic uniformity check into a calibrated statement that the variances cannot be distinguished [9].

References

  1. Taloni A, Scorcia V, Giannaccare G. Large Language Model Advanced Data Analysis Abuse to Create a Fake Data Set in Medical Research. JAMA Ophthalmology. 2023;141(12):1174-1175. DOI: 10.1001/jamaophthalmol.2023.5162
  2. Al-Marzouki S, Evans S, Marshall T, Roberts I. Are these data real? Statistical methods for the detection of data fabrication in clinical trials. BMJ. 2005;331(7511):267-270. DOI: 10.1136/bmj.331.7511.267
  3. Simonsohn U. Just Post It: The Lesson From Two Cases of Fabricated Data Detected by Statistics Alone. Psychological Science. 2013;24(10):1875-1888. DOI: 10.1177/0956797613480366
  4. Micceri T. The unicorn, the normal curve, and other improbable creatures. Psychological Bulletin. 1989;105(1):156-166. DOI: 10.1037/0033-2909.105.1.156
  5. Carlisle JB. Data fabrication and other reasons for non-random sampling in 5087 randomised, controlled trials in anaesthetic and general medical journals. Anaesthesia. 2017;72(8):944-952. DOI: 10.1111/anae.13938
  6. Bordewijk EM, Li W, van Eekelen R, Wang R, Showell M, Mol BW, van Wely M. Methods to assess research misconduct in health-related research: A scoping review. Journal of Clinical Epidemiology. 2021;136:189-202. DOI: 10.1016/j.jclinepi.2021.05.012
  7. Wilkinson J, Heal C, Antoniou GA, et al. A survey of experts to identify methods to detect problematic studies: stage 1 of the INveStigating ProblEmatic Clinical Trials in Systematic Reviews project. Journal of Clinical Epidemiology. 2024;175:111512. DOI: 10.1016/j.jclinepi.2024.111512
  8. Crone G, Green CD. Tools of the data detective: A review of statistical methods to detect data and result anomalies in psychology. Theory & Psychology. 2025;35(3):359-380. DOI: 10.1177/09593543241311861
  9. Bartlett MS. Properties of Sufficiency and Statistical Tests. Proceedings of the Royal Society of London. Series A. 1937;160(901):268-282. DOI: 10.1098/rspa.1937.0109