D26Statistical analysisFabrication ExtendedLayer 2 (Contextual)

Tail Dependence Absent

Tests for extreme-value co-occurrence between related variables; real data shows tail dependence (extreme values cluster together) that multivariate normal generators do not reproduce.

Technical description

Tail dependence is the tendency of two variables to take extreme values together: the lower-tail coefficient is the conditional probability that one variable is in its lower tail given that the other is, and the upper-tail coefficient is the same for the upper tail. In real multivariate data correlated measurements reach their extremes together more often than chance, but data drawn as independent or simply correlated Gaussians reproduces the average correlation without this clustering of extremes. D26 rank-transforms each numeric column of the individual-patient data (IPD), considers pairs whose absolute Pearson correlation exceeds 0.25, estimates the empirical lower- and upper-tail conditional probabilities at the 0.20 quantile, and compares the larger against a correlation-aware Gaussian-copula benchmark.

How it works

Layer 2 (contextual): requires at least three numeric columns and thirty rows. Each column is rank-transformed to the unit interval and zero-variance columns are skipped. For each pair with absolute Pearson correlation above 0.25 the lower-tail estimate is the fraction of rows with the first variable below q among those with the second below q, and the upper-tail estimate is the analogous fraction above one minus q, with q equal to 0.20; the pair statistic is the larger of the two. The benchmark is the Gaussian-copula lower-tail conditional probability for the pair's correlation, the bivariate-normal probability that both fall below their q quantile divided by q. A pair is flagged when its statistic falls below 0.65 of this benchmark but never above the independence-plus floor of 0.25. The proportion flagged sets the score (4.0 above eighty percent, 3.0 above sixty, 2.0 above forty, 1.0 above twenty), plus 0.5 when the mean tail dependence is below 0.22, capped at 5.0. Skipped when fewer than two correlated pairs exist. Metadata records the mean observed tail dependence and the mean Gaussian-copula benchmark, alongside the pair counts and thresholds.

Why this matters

A fabricator who draws correlated normals, or who clips or smooths the extremes, can match the reported correlations while leaving the joint tails too sparse, and a model asked to generate plausible data tends to the same smooth elliptical structure. Because the asymptotic tail dependence of jointly Gaussian variables is zero whatever the correlation, a check on extreme co-occurrence targets multivariate structure that marginal and correlation checks miss, and is especially diagnostic for a strongly correlated pair whose extremes nonetheless fail to coincide.

Score thresholds

0-1: Correlated pairs co-occur at their extremes about as expected
2-3: A substantial share of correlated pairs show too little extreme co-occurrence
4-5: Most correlated pairs lack tail dependence, consistent with independently generated or tail-clipped data

Limitations

The benchmark is the Gaussian copula, which itself has no asymptotic tail dependence, so the indicator detects data falling below even that modest finite-sample expectation rather than proving a specific generator. The tail estimates use a fixed 0.20 quantile and are noisy at small samples, mitigated by the thirty-row minimum. Rank transformation breaks ties arbitrarily, so heavy-tie or low-cardinality columns yield noisy tail membership. Only pairs with absolute Pearson correlation above 0.25 are examined, so dependence purely in the tails is not assessed. The benchmark fraction and score bands are heuristic. Absent linear correlation is indicator D01 and conditional independence is indicator D20; D26 focuses on extreme co-occurrence among already-correlated pairs in the IPD.