D6Statistical analysisFabrication DetectionLayer 2 (Contextual)

Multicenter Anomalies

Compares the data contributed by each site of a multi-center study against the rest. Real sites differ in natural ways (different patients, equipment, missing-data habits). A site whose data is statistically too divergent, too uniform, shows a digit-preference fingerprint, or is implausibly complete while others have gaps stands out as possibly fabricated by a single source. The indicator runs four per-site checks and scores by how anomalous the sites are.

Technical description

A contextual screen for site-level anomalies in multi-center individual-patient data. It detects a site/center identifier column and, with at least two sites, compares each site against the pooled remainder on the numeric variables, EXCLUDING the site identifier itself so a numeric site code is not analysed as a measurement. Four per-site checks: a two-sample Kolmogorov-Smirnov test flagging a site diverging at p < 0.001 on more than three variables; a variability check flagging a within-site SD below thirty percent of the overall SD on any variable; a terminal-digit chi-squared (on reported-looking values, at least thirty) flagging non-uniformity at p < 0.01; and a missing-data check flagging a site with zero missing while another exceeds ten percent. Each fired check adds a penalty and the penalties sum to the score.

How it works

Layer 2 (contextual): for each site the four checks contribute penalties: a Kolmogorov-Smirnov divergence on more than three variables adds 1.5, a low within-site SD on any variable adds 1.5, a terminal-digit bias adds 1.0, and zero-missing-against-others adds 1.5. The terminal-digit check considers only reported-looking values (integers or at most two decimals) and needs at least thirty. Penalties accumulate across sites, capped at 5.0. The per-variable KS p-values per site are additionally corrected with a Benjamini-Hochberg false-discovery-rate procedure, reported as a multiple-comparison diagnostic. Each anomalous site yields a finding listing the checks it tripped, severity rising at two or more. Metadata records sites_found, anomalous_sites, flags_per_site, ks_diagnostics (per-site smallest KS p-value and FDR-significant count), and total_penalty (the accumulated evidence before the 5.0 cap).

Why this matters

Multi-center trials offer a powerful internal control: genuine sites are independent samples of the same protocol, so they should resemble each other in distribution while differing in small idiosyncratic ways. This is the basis of central statistical monitoring, which Venet and colleagues formalised to find data-quality problems and fabrication by comparing each center against the others with distributional, variance, and digit tests. Carlisle's re-analyses repeatedly found fabricated trials where one source was statistically out of step, and the classic biostatistical account lists anomalous center effects among detectable fabrication signatures. A fabricator producing one site's data cannot match the natural heterogeneity of real centers: the invented site is often too clean, too uniform, too complete, or carries the digit-preference fingerprint of hand-entry.

Score thresholds

0: Sites differ in the natural ways expected of independent centers.
2-3: One site is mildly anomalous on one or two checks.
4-5: One or more sites are strongly anomalous, for example combining zero variability with zero missing data.

Limitations

Requires individual-patient data with a recognisable site/center column and at least two sites, so single-center or summary-only data is out of scope. The checks treat each numeric column as a measurement, so a numeric identifier other than the site column (such as a patient ID) can still be analysed spuriously; only the site column itself is excluded. Real sites can differ legitimately (a specialist center may have a narrower case mix or more complete collection), so a flag prompts investigation. The Kolmogorov-Smirnov and chi-squared tests need adequate per-site sample sizes, and small sites may be under-powered. The thresholds (KS p 0.001 on more than three variables, SD ratio 0.3, terminal-digit p 0.01, missing-data percentages) are directional. Single-dataset terminal-digit and variability checks are covered by other indicators.