Multicenter Anomalies
Compares the data contributed by each site of a multi-center study against the rest. Real sites differ from one another in natural ways, because they recruit different patients, use slightly different equipment, and have different missing-data habits. A site whose data is statistically too divergent, too uniform, shows a digit-preference fingerprint, or is implausibly complete while others have gaps stands out as possibly fabricated by a single source. The indicator runs four per-site checks and scores by how anomalous the sites are. It works on the individual-patient data when a site identifier is present.
Technical description
D6 is a contextual screen for site-level anomalies in multi-center individual-patient data, the kind that appear when one site's data is invented rather than independently collected. It detects a site or center identifier column and, with at least two sites, compares each site against the pooled remainder on the numeric variables, excluding the site identifier itself so that a numeric site code is not mistaken for a measurement. Four checks run per site. The distribution check applies a two-sample Kolmogorov-Smirnov test for each variable and flags a site whose distribution diverges from the rest at p below 0.001 on more than three variables. The variability check flags a site whose within-site standard deviation is below thirty percent of the overall standard deviation on any variable, indicating implausibly low spread. The terminal-digit check pools the reported-looking values at the site and flags a non-uniform last-digit distribution by a chi-squared test at p below 0.01. The missing-data check flags a site with zero missing data while another site exceeds ten percent. Each fired check adds a penalty, and the penalties sum to the score.
How it works
For each site, the four checks contribute penalties: a Kolmogorov-Smirnov divergence on more than three variables adds 1.5, a low within-site standard deviation on any variable adds 1.5, a terminal-digit bias adds 1.0, and the zero-missing-against-others pattern adds 1.5. Alongside the raw Kolmogorov-Smirnov count, the per-variable KS p-values for each site are also subjected to a Benjamini-Hochberg false-discovery-rate correction, and the number surviving it, with the smallest KS p-value, is reported as a multiple-comparison-aware diagnostic of how genuinely the site diverges [9]. The terminal-digit check considers only values that look reported, integers or values with at most two decimal places, since the trailing digits of raw floating-point numbers are precision artefacts, and it requires at least thirty values. The penalties accumulate across sites and the total is capped at 5.0. Each anomalous site produces a finding listing the checks it tripped, with severity rising when a site trips two or more. The metadata records the number of sites, the list of anomalous sites, the flags per site, the per-site Kolmogorov-Smirnov diagnostics (smallest p-value and false-discovery-rate-significant count), and the total accumulated penalty before the 5.0 cap.
Score thresholds
| Score | Meaning |
|---|---|
| 0 | Sites differ in the natural ways expected of independent centers. |
| 2 to 3 | One site is mildly anomalous on one or two checks. |
| 4 to 5 | One or more sites are strongly anomalous, for example combining zero variability with zero missing data. |
Why this matters
Multi-center trials offer a powerful internal control: genuine sites are independent samples of the same protocol, so they should resemble each other in distribution while differing in the small idiosyncratic ways that real operations produce. This is the basis of central statistical monitoring, which Venet and colleagues formalised as a way to find data-quality problems and fabrication by comparing each center's data against the others using exactly these kinds of distributional, variance, and digit tests [1]. Carlisle's forensic re-analyses repeatedly found fabricated trials in which one contributing source was statistically out of step with the rest [2], and the classic biostatistical account of fraud lists anomalous center effects among the detectable signatures of invented data [3]. The intuition is that a fabricator producing one site's data cannot match the natural heterogeneity of real centers: the invented site is often too clean, too uniform, too complete, or carries the digit-preference fingerprint of hand-entry, and any of these makes it stand out against its genuine peers. Subsequent work refined these center-comparison methods, applying a battery of statistical tests and mixed-effects models across centers [4, 5], and recent scoping reviews and trustworthiness instruments place central statistical monitoring among the standard screens for problematic trials [6, 7, 8].
Limitations
The check requires individual-patient data with a recognisable site or center column and at least two sites, so single-center or summary-only data is outside its scope. The checks treat each numeric column as a measurement, so a numeric identifier other than the site column, such as a patient ID, can still be analysed spuriously, and only the site column itself is excluded. Real sites can differ for legitimate reasons, a specialist center may genuinely have a narrower case mix or more complete data collection, so a flag is a prompt to investigate rather than proof of fabrication. The Kolmogorov-Smirnov and chi-squared tests need adequate per-site sample sizes to be meaningful, and small sites may be under-powered. The thresholds, a Kolmogorov-Smirnov p of 0.001 on more than three variables, a standard-deviation ratio of 0.3, a terminal-digit p of 0.01, and the missing-data percentages, are directional. The single-dataset terminal-digit and variability checks are covered by other indicators, so D6 stays on between-site comparison.
Theoretical background
D6 rests on the exchangeability that randomisation and a common protocol are supposed to confer across centers. If every site samples from the same population and follows the same procedures, then for each variable the site's distribution should be a sample from the same law as the rest, its within-site variability should be a fraction of the overall variability consistent with the number of sites, its reported digits should be uniform, and its missing-data rate should be comparable to others. Each check tests one of these expectations. The Kolmogorov-Smirnov statistic measures the largest gap between a site's empirical distribution and that of the pooled remainder, so a fabricated site that draws from a different or narrower law is detected across multiple variables at once. The standard-deviation ratio detects a site whose values barely vary, the hallmark of invented or copied data. The terminal-digit chi-squared detects the human digit preference that creeps into hand-entered numbers, which an honest instrument-recorded site does not show. The missing-data contrast detects the implausible perfection of a fabricated site against the inevitable gaps of real collection. Excluding the site identifier from the analysed variables is essential, because a numeric site code is constant within a site and would trivially and falsely trip both the distribution and variability checks for every site, manufacturing the very anomaly the indicator seeks. Comparing each site on many variables at once raises the multiple-comparison problem, so the per-variable Kolmogorov-Smirnov p-values are additionally read through a Benjamini-Hochberg false-discovery-rate correction, which reports how many variables remain divergent once the number of tests is accounted for, distinguishing a site that is genuinely out of step from one that shows a stray low p-value by chance [9].
References
- Venet D, Doffagne E, Burzykowski T, et al. A statistical approach to central monitoring of data quality in clinical trials. Clinical Trials. 2012;9(6):705-713. DOI: 10.1177/1740774512447898
- Carlisle JB. Data fabrication and other reasons for non-random sampling in 5087 randomised, controlled trials in anaesthetic and general medical journals. Anaesthesia. 2017;72(8):944-952. DOI: 10.1111/anae.13938
- Buyse M, George SL, Evans S, et al. The role of biostatistics in the prevention, detection and treatment of fraud in clinical trials. Statistics in Medicine. 1999;18(24):3435-3451. https://pubmed.ncbi.nlm.nih.gov/10611617/
- Kirkwood AA, Cox T, Hackshaw A. Application of methods for central statistical monitoring in clinical trials. Clinical Trials. 2013;10(5):783-806. DOI: 10.1177/1740774513494504
- Desmet L, Venet D, Doffagne E, et al. Linear mixed-effects models for central statistical monitoring of multicenter clinical trials. Statistics in Medicine. 2014;33(30):5265-5279. DOI: 10.1002/sim.6294
- Bordewijk EM, Li W, van Eekelen R, Wang R, Showell M, Mol BW, van Wely M. Methods to assess research misconduct in health-related research: A scoping review. Journal of Clinical Epidemiology. 2021;136:189-202. DOI: 10.1016/j.jclinepi.2021.05.012
- Wilkinson J, Heal C, Antoniou GA, et al. A survey of experts to identify methods to detect problematic studies: stage 1 of the INveStigating ProblEmatic Clinical Trials in Systematic Reviews project. Journal of Clinical Epidemiology. 2024;175:111512. DOI: 10.1016/j.jclinepi.2024.111512
- Crone G, Green CD. Tools of the data detective: A review of statistical methods to detect data and result anomalies in psychology. Theory & Psychology. 2025;35(3):359-380. DOI: 10.1177/09593543241311861
- Benjamini Y, Hochberg Y. Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple Testing. Journal of the Royal Statistical Society: Series B. 1995;57(1):289-300. DOI: 10.1111/j.2517-6161.1995.tb02031.x