Timeline Implausibility
Reads the key dates a paper reports, ethics approval, the start and end of data collection, trial registration, and submission, and checks that they fall in a possible order. Ethics approval cannot follow the start of data collection, collection cannot end before it begins, and a trial should be registered before its results exist. It also estimates the recruitment rate from the sample size and the collection window and flags a rate that is implausibly fast for a single site. It works on the dates found in the article text.
Technical description
D9 is a contextual screen on the chronology a paper describes. It extracts month-and-year dates from the text using targeted patterns for ethics or institutional-review-board approval, the data-collection range, trial registration, and submission or receipt, and parses them to comparable dates. It then checks the expected ordering: ethics approval should not come after collection starts, collection should start before it ends, collection should end before submission, and registration should precede the end of collection. Each hard ordering violation is a serious flag. When the ordering is intact, it notes tight-but-possible gaps, such as ethics approval and collection start in the same or adjacent month, or collection end and submission almost coincident, as low-level observations. Finally, when both a collection duration and a sample size are available, it computes a recruitment rate in patients per month and flags a rate above fifty per month, unless the text indicates the study ran at more than one site, since a high rate is implausible for a single center but routine for a multi-center trial. The violations and the recruitment flag set the score.
How it works
Dates are captured by regular expressions keyed to the surrounding language and converted to month-resolution dates. Each ordering violation adds 4.0 and produces an error or warning finding naming the two dates: ethics after collection start, collection start after end, collection end after submission, and registration after collection end. With no ordering violation, an ethics-to-collection gap of zero or one month, or a collection-to-submission gap of zero or one month, each add 1.0 as an informational note. The recruitment rate is the largest reported sample size divided by the collection duration in months; if it exceeds fifty per month and no multi-site language is present, it adds 2.0. The total is capped at 5.0. The metadata records the dates found, the number of ordering violations, the recruitment rate, whether multi-site language was detected (which suppresses the recruitment-rate flag), and the inter-milestone gaps in months between the recognised dates.
Score thresholds
| Score | Meaning |
|---|---|
| 0 | The reported dates are consistent and the recruitment rate is plausible. |
| 1 to 2 | Tight but possible gaps, or a single soft concern. |
| 4 to 5 | An impossible ordering of dates, or a wildly implausible recruitment rate. |
Why this matters
The chronology of a study is a web of constraints that fabricated or hastily assembled papers often violate, because invented dates are not checked against one another the way real administrative records are. Carlisle's examination of trials submitted with individual-patient data found that impossible and inconsistent timelines were among the features that exposed false data and zombie trials [1]. Reviews of clinical-trial fraud list timeline and recruitment anomalies, such as enrolling more patients than a site could plausibly see, among the recognised markers of invented studies [2], and his large-scale re-analyses treated such impossibilities as integrity signals across the literature [3]. The ordering checks encode hard logical requirements: a study cannot collect data before it is approved, cannot end collection before starting, and cannot be submitted before collection finishes; a violation is not improbable but impossible. The recruitment-rate check encodes a softer plausibility bound on how fast a single site can enroll, which is why it is suspended when the study is multi-center. Registration after the end of collection, finally, signals retrospective registration, which undermines the prospective-registration safeguard against selective reporting established by the International Committee of Medical Journal Editors [4]. Recent forensic re-analyses, scoping reviews, and trustworthiness instruments likewise treat timeline and recruitment implausibility as standard screens for problematic studies [5, 6, 7, 8].
Limitations
The check depends on dates being stated in the text in a recognisable month-and-year form and on the extraction patterns associating each date with the right event, so unusual phrasing, day-level dates, or dates in tables can be missed or mis-assigned. It resolves dates only to the month, so same-month orderings are treated as consistent. The recruitment-rate check uses the largest sample size found anywhere in the text, which may not be the enrolled count, and divides by the collection window, so a mis-extracted denominator distorts it; the multi-site suppression depends on the text actually describing the study as multi-center. Retrospective registration is common and not always misconduct, so that flag is directional. The thresholds, a fifty-per-month rate and the one-month tight-gap window, are heuristic. This indicator reads dates from the narrative text; date-level anomalies within individual-patient records, such as weekend visit clustering, are handled by the demographic indicator D3, so D9 stays on the reported study chronology.
Theoretical background
D9 rests on the distinction between logically impossible and merely implausible chronologies. The ordering relations among approval, collection, registration, and submission are not statistical tendencies but causal necessities: approval authorises collection, collection produces the data, and submission reports them, so their dates must respect that sequence, and a reversal is proof of error or fabrication rather than evidence of it. These checks therefore carry the heaviest weight. Recruitment rate is different in kind: it is bounded not by logic but by the physical capacity of a site to identify, consent, and enrol eligible participants, a capacity that scales with the number of sites, so a rate that is impossible for one center becomes ordinary for many. Recognising multi-site language is essential to keep this check honest, because otherwise a legitimate large multi-center trial would be flagged for the very feature that makes it valuable. Registration timing occupies a middle ground: prospective registration is a safeguard adopted precisely because retrospective registration enables undisclosed changes to outcomes, so registration after data collection is a meaningful warning even though it is not, by itself, impossible. Reading these together lets the indicator separate the decisive evidence of an impossible order from the softer signals of a rushed or retrofitted timeline. The gaps in months between successive milestones, computed to evaluate these orderings, are also surfaced in the metadata, since the spacing between approval, collection, registration, and submission is the raw evidence a reviewer needs to judge a timeline that is ordered yet improbably compressed.
References
- Carlisle JB. False individual patient data and zombie randomised controlled trials submitted to Anaesthesia. Anaesthesia. 2021;76(4):472-479. DOI: 10.1111/anae.15263
- George SL, Buyse M. Data fraud in clinical trials. Clinical Investigation. 2015;5(2):161-173. DOI: 10.4155/cli.14.116
- Carlisle JB. Data fabrication and other reasons for non-random sampling in 5087 randomised, controlled trials in anaesthetic and general medical journals. Anaesthesia. 2017;72(8):944-952. DOI: 10.1111/anae.13938
- De Angelis C, Drazen JM, Frizelle FA, et al. Clinical trial registration: a statement from the International Committee of Medical Journal Editors. New England Journal of Medicine. 2004;351(12):1250-1251. https://pubmed.ncbi.nlm.nih.gov/15356289/
- Al-Marzouki S, Evans S, Marshall T, Roberts I. Are these data real? Statistical methods for the detection of data fabrication in clinical trials. BMJ. 2005;331(7511):267-270. DOI: 10.1136/bmj.331.7511.267
- Bordewijk EM, Li W, van Eekelen R, Wang R, Showell M, Mol BW, van Wely M. Methods to assess research misconduct in health-related research: A scoping review. Journal of Clinical Epidemiology. 2021;136:189-202. DOI: 10.1016/j.jclinepi.2021.05.012
- Wilkinson J, Heal C, Antoniou GA, et al. A survey of experts to identify methods to detect problematic studies: stage 1 of the INveStigating ProblEmatic Clinical Trials in Systematic Reviews project. Journal of Clinical Epidemiology. 2024;175:111512. DOI: 10.1016/j.jclinepi.2024.111512
- Crone G, Green CD. Tools of the data detective: A review of statistical methods to detect data and result anomalies in psychology. Theory & Psychology. 2025;35(3):359-380. DOI: 10.1177/09593543241311861