Variance of Variances Inconsistent
Looks at how much the variances of a dataset's columns differ from one another. Different measurements naturally have different amounts of spread, so the variances across columns should themselves vary. When many columns share almost the same variance, or the variance within a single column stays implausibly constant across subsets of the data, it suggests the values were generated from one fixed distribution rather than measured. The indicator compares the column variances and the within-column variance stability and flags suspicious uniformity.
Technical description
A contextual screen for uniform variance structure in individual-patient data. It requires at least five numeric columns with non-zero variance and twenty non-missing rows each (excluding constant columns, whose zero variance masks uniformity and forces the across-column ratio to infinity). It computes each column's variance and derives the coefficient of variation of those variances and the ratio of the largest to the smallest. For columns with at least fifty rows it splits each into five sequential chunks, computes each chunk's variance, and measures the within-column coefficient of variation of those chunk variances, averaged across columns. A very low coefficient of variation of column variances, a near-unity max-to-min ratio, or implausibly stable within-column variance each contribute to the score.
How it works
Layer 2 (contextual): per-column variances are computed on non-missing values. A coefficient of variation of these variances below 0.10 adds 3.0 and below 0.30 adds 1.5; a max-to-min variance ratio below 1.5 (with at least five columns) adds 1.5; for columns of at least fifty rows, a within-column chunk-variance coefficient of variation below 0.05 (averaged across columns) adds 0.5; and Bartlett's test of homogeneity of variance across the columns adds 0.5 when its p-value exceeds 0.95 (variances statistically indistinguishable, the multi-scale null rejecting near zero). Capped at 5.0, each condition raising a finding. Metadata records n_qualifying_columns, cv_of_variances, max_min_ratio, mean_cv_within, median_cv_within (robust complement), bartlett_pvalue, and col_variances.
Why this matters
Different quantities in a real study have genuinely different variability (different scales and processes), so the set of column variances is heterogeneous, and the variance within a column fluctuates across subsamples as real sampling dictates. Data from a single fixed distribution, or independent draws of a similar template, flattens this: variances become alike and stop fluctuating. Taloni and colleagues showed a model can fabricate a clinical dataset with unrealistically regular structure, and the use of variance structure to separate genuine from invented data is established (Al-Marzouki and colleagues; Simonsohn). The within-column stability check adds a scale-free angle: even when columns differ in scale, a genuine variable's variance should not be near-constant across every subset, so its stability points to artificial generation.
Score thresholds
- 0
- Column variances differ as expected of diverse real measurements.
- 1-2
- Column variances are somewhat more uniform than expected, or within-column variance is very stable.
- 3-5
- Column variances are nearly identical, a strong sign of values drawn from one fixed distribution.
Limitations
The cross-column comparisons of raw variance are scale-dependent: variables on different units have different variances for reasons unrelated to fabrication, so a real dataset whose variables share a scale (a battery of similar instruments, or standardised scores) can show a low coefficient of variation of variances and be flagged, while a mixed-scale dataset never trips the max-to-min check. The within-column chunk check assumes meaningful row order (longitudinal) and is less interpretable for an arbitrary patient ordering. The check requires at least five non-constant numeric columns and twenty rows, the chunk analysis needing fifty. The thresholds (coefficient of variation 0.10 and 0.30, ratio 1.5, within-column coefficient 0.05) are heuristic. Univariate excessive-normality and clustering signals are D2 and D4, and the related too-clean check is D13.
References
- Taloni A, Scorcia V, Giannaccare G. (2023). Large Language Model Advanced Data Analysis Abuse to Create a Fake Data Set in Medical Research. JAMA Ophthalmology 141(12):1174-1175
- Al-Marzouki S, Evans S, Marshall T, Roberts I. (2005). Are these data real? Statistical methods for the detection of data fabrication in clinical trials. BMJ 331(7511):267-270
- Simonsohn U. (2013). Just Post It: The Lesson From Two Cases of Fabricated Data Detected by Statistics Alone. Psychological Science 24(10):1875-1888
- Micceri T. (1989). The unicorn, the normal curve, and other improbable creatures. Psychological Bulletin 105(1):156-166
- Carlisle JB. (2017). Data fabrication and other reasons for non-random sampling in 5087 randomised, controlled trials in anaesthetic and general medical journals. Anaesthesia 72(8):944-952
- Bordewijk EM, Li W, van Eekelen R, Wang R, Showell M, Mol BW, van Wely M. (2021). Methods to assess research misconduct in health-related research: A scoping review. Journal of Clinical Epidemiology 136:189-202
- Wilkinson J, Heal C, Antoniou GA, et al.. (2024). A survey of experts to identify methods to detect problematic studies: stage 1 of the INveStigating ProblEmatic Clinical Trials in Systematic Reviews project. Journal of Clinical Epidemiology 175:111512
- Crone G, Green CD. (2025). Tools of the data detective: A review of statistical methods to detect data and result anomalies in psychology. Theory & Psychology 35(3):359-380
- Bartlett MS. (1937). Properties of Sufficiency and Statistical Tests. Proceedings of the Royal Society of London. Series A 160(901):268-282