ResAIKit
Research Integrity Toolkit
Back to the encyclopedia
T5Image forensicsTable AnalysisLayer 2 (Contextual)

RCT Baseline

Checks whether a randomized trial's baseline table shows the random variation true allocation produces. Under genuine randomization the test statistics comparing groups across baseline variables scatter like standard normal draws; groups too similar (under-dispersed statistics) suggest gamed randomization or fabrication, while groups too different suggest an allocation problem.

Technical description

Extracts the table grid by OCR, gates on it holding statistical data, finds the group column and numeric variable columns, and computes a Welch t-statistic per variable comparing groups. Under randomization these t-statistics are approximately N(0,1), so their standard deviation should be near 1 and the sum of their squares is approximately chi-square with k degrees of freedom; a two-sided tail probability quantifies the dispersion. Score by SD: 0.5-2.0 scores 0, 0.3-0.5 or 2.0-3.0 scores 2.0, below 0.3 or above 3.0 scores 4.0. At least three variables are required.

How it works

Layer 2 (statistical): finds the group column and numeric variables, computes a t-statistic per variable, and reads the standard deviation of those statistics plus a chi-square tail probability for the overall dispersion. An SD between 0.5 and 2.0 scores 0; moderately outside scores 2.0; far outside (below 0.3 or above 3.0) scores 4.0. Under- and over-dispersion each raise a finding reporting the SD and chi-square probability.

Why this matters

A randomized trial's integrity rests on allocation actually being random, and the baseline table lets that be audited from published numbers alone. Carlisle showed the probability of a baseline pattern under random sampling can be calculated, and distributions far too similar between groups betray non-random allocation or fabrication; applying this across thousands of trials revealed many with impossible baseline tables, triggering retractions. Under-dispersed baseline statistics are a strong, scalable signal of the most consequential form of trial misconduct.

Score thresholds

0-1
Baseline statistics scatter as randomization predicts, with a standard deviation near one
2-3
The dispersion is moderately low or high, a possible randomization or balance problem
4-5
Statistics far too uniform or far too spread, consistent with gamed randomization or fabrication, or a serious allocation error

Limitations

The test needs per-subject or comparable per-group values split by a recognised group column, so a table reporting only mean and SD per group is only partially served by this row-wise implementation. It compares the first two groups, reducing a multi-arm trial to a pairwise comparison. At least three variables are required, and with few variables the SD is noisy, which the chi-square probability helps interpret. It depends on OCR and on identifying the group column. Correlated baseline variables violate the independence assumption. Over-dispersion is a weaker fabrication signal than under-dispersion. The statistical-data gate skips mostly-text tables.

References

  1. Carlisle JB, Dexter F, Pandit JJ, Shafer SL, Yentis SM. (2015). Calculating the probability of random sampling for continuous variables in submitted or published randomised controlled trials. Anaesthesia 70(7):848-858
  2. Carlisle JB. (2017). Data fabrication and other reasons for non-random sampling in 5087 randomised, controlled trials in anaesthetic and general medical journals. Anaesthesia 72(8):944-952
  3. Bolland MJ, Avenell A, Gamble GD, Grey A. (2016). Systematic review and statistical analysis of the integrity of 33 randomized controlled trials. Neurology 87(23):2391-2402