ResAIKit
Research Integrity Toolkit
Back to the encyclopedia
G4Text analysisHallucinationLayer 3

Citation Support Verification

Checks whether each cited paper actually supports the claim it is attached to, by comparing the claim with the cited paper's own title and abstract.

Technical description

Covers the support question that the internal and existence checks leave open: granting that a reference is real and correctly identified, does the cited paper say what the citing sentence claims. For each citation, the cited paper's title and abstract are retrieved from an external scholarly index and compared with the claim sentence. Support is measured directionally, as the share of the claim's meaningful terms that appear in the evidence, with distinctive terms such as named entities and specific numbers weighted more heavily so a citation cannot pass on generic vocabulary alone. Each citation is graded as strong, partial, or low support.

How it works

Layer 3 (external lookup, no language model): Extracts author-year and identifier citations with their positions. Takes the sentence around each citation as the claim. Retrieves the cited paper's title and abstract from an external scholarly index. Measures directional coverage of the claim by the evidence, weighting distinctive terms more. Grades each citation as strong, partial, or low support and flags the weak ones, with the score capped per citation and overall.

Why this matters

A reference that exists but does not support its claim is now one of the most common citation failures and is far harder to spot than an outright fake. Recent benchmarks built to ask whether authors actually read what they cited find that a large share of machine-generated citations do not fully support the claims they are attached to, even when the cited papers are real. Surfacing on-topic but unsupportive citations lets reviewers focus their checking where it matters.

Score thresholds

0-1
Checked citations are well covered by their sources
2-3
Several citations cover their claims only partially or weakly
4-5
Most checked citations are barely reflected in their sources

Limitations

Reasons from words, not meaning, so it does not understand negation or direction: a claim that a treatment did not work and an abstract reporting that it did look similar and can pass. It depends on an external index, so papers that are not indexed or have no abstract are set aside rather than penalised. It checks only a capped number of citations per document, so long reference lists are sampled. It is a screen that points reviewers at the citations most worth verifying by hand, not a final judgement.

References

  1. Shi K, Sun W, Zhang Z, Sun L, Chawla NV, Ye Y. (2026). CiteAudit: you cited it, but did you read it? A benchmark for verifying scientific references in the LLM era. arXiv preprint arXiv:2602.23452
  2. Haan S. (2025). SemanticCite: citation verification with AI-powered full-text analysis and evidence-based reasoning. arXiv preprint arXiv:2511.16198
  3. Onweller H, Lumer E, Huber A, Ramchandani P, Subbiah VK, Feld C. (2026). Cited but not verified: parsing and evaluating source attribution in LLM deep research agents. arXiv preprint arXiv:2605.06635
  4. Wadden D, Lin S, Lo K, Wang LL, van Zuylen M, Cohan A, Hajishirzi H. (2020). Fact or fiction: verifying scientific claims. Proceedings of EMNLP 2020
  5. Nicholson JM, Mordaunt M, Lopez P, Uppala A, Rosati D, Rodrigues NP, Grabitz P, Rife SC. (2021). scite: a smart citation index that displays the context of citations and classifies their intent using deep learning. Quantitative Science Studies