ResAIKit
Research Integrity Toolkit
Back to the encyclopedia
C6Text analysisStylisticLayer 2 (Contextual)

Local Coherence

Detects decorative connectors and logic jumps between adjacent sentences, paragraphs, and IMRaD (Introduction, Methods, Results and Discussion) sections. AI-generated text often uses linking words without genuine semantic continuity.

Technical description

C6 operationalises the "local coherence" dimension from the Anti-AI Vibe Review spec. It asks four questions: (1) do connector words link semantically related content or are they decorative, (2) is the document's terminology consistent across the text, (3) do adjacent paragraphs bridge their topics naturally, and (4) are IMRaD section transitions coherent or abrupt. The four sub-checks operate independently and sum into a single 0 to 5 score. The indicator runs at Layer 2 because it requires sentence segmentation, lemmatisation, and part-of-speech tagging.

How it works

Sub-check 1, connector validation via lemma overlap. Each sentence is scanned for connector words from the per-language connectors dictionary (16 entries for English: furthermore, moreover, additionally, however, therefore, thus, hence, consequently, etc.; 50 entries for Romanian). When a connector is found, the lemma overlap between the current sentence and the preceding sentence is computed. Only content words (nouns, verbs, adjectives, excluding stop words) participate in the overlap. An overlap ratio below 0.15 flags the connector as decorative: it signals a logical relationship that the actual content does not support. The ratio of decorative connectors to total connectors drives the score: over 40% contributes +2.0, over 20% contributes +1.0.

Sub-check 2, terminological consistency. All nouns in the document are grouped by lemma, and each lemma's surface forms are collected. A lemma with more than three distinct surface forms (e.g., analysis, analyses, Analysis, ANALYSES) is flagged as inconsistent. Each inconsistent term contributes +0.5 to the score, capped at +1.5 (three terms). This sub-check catches the variable terminology typical of AI-generated text, where the same concept appears under different spellings or morphological variants without explicit definition.

Sub-check 3, paragraph-to-paragraph topic jumps. For each adjacent pair of paragraphs, the lemma overlap between the last sentence of the first paragraph and the first sentence of the second is computed. An overlap below 0.10 indicates a topic jump: the new paragraph introduces new content without a bridging reference to the preceding paragraph's theme. Each topic jump contributes +0.5 to the score, capped at +1.0 (two jumps).

Sub-check 4, cross-section transition coherence. The text is partitioned into IMRaD sections. For each adjacent section pair (e.g., Results to Discussion), the lemma overlap between the last sentence of the preceding section and the first sentence of the following section is computed. An overlap below 0.05, stricter than the paragraph-level threshold, flags an abrupt section transition. AI-generated text often lacks coherent bridges between major sections, starting each section as if it were an independent document. Each abrupt transition contributes +0.5 to the score, capped at +1.0 (two transitions).

The four contributions sum to a theoretical maximum of 5.5 (2.0 + 1.5 + 1.0 + 1.0), with a hard clamp at 5.0.

Why this matters

Human writers maintain a thread of shared vocabulary and concepts across adjacent sentences, paragraphs, and sections. When a writer uses therefore, the sentences on both sides of that connector typically share key nouns or verbs. When a writer transitions from Results to Discussion, the first sentence of the Discussion typically references at least one specific finding from the Results. AI-generated text violates these expectations in a characteristic way: it uses connectors fluently but without the underlying semantic continuity that justifies them, and it treats section boundaries as reset points rather than transitions.

Holtzman and colleagues identified the mechanism by which transformer models collapse onto high-probability continuations under restrictive decoding, producing text that is locally fluent but globally incoherent [1]. The decorative connector problem is a direct manifestation of this: the model selects furthermore because it is a high-probability continuation, not because the preceding and following content are semantically additive. C6 catches this by testing the semantic overlap that should justify the connector's presence.

Adi and colleagues demonstrated that local coherence signals, specifically the lexical and discourse relationship between a sentence and its neighbors, are among the most robust features for sentence-level AI-generated text detection, achieving strong cross-domain generalisation where token-level methods fail [2]. C6 operationalises this finding at Layer 2 using deterministic lemma-overlap ratios rather than learned embeddings, trading some precision for full transparency and zero training cost. Kim and colleagues showed that human texts exhibit significantly more structural variability at the discourse level than machine-generated texts, and that hierarchical discourse features improve detection on both in-domain and paraphrased samples [4]. The paragraph and section transition checks in C6 (sub-checks 3 and 4) target precisely this structural variability gap.

Score thresholds

Score Meaning
0 to 1 Connectors link semantically related content, terminology is consistent, paragraphs and sections transition naturally. Typical of well-edited human academic prose.
2 to 3 Moderate coherence issues: some decorative connectors, one or two topic jumps between paragraphs or sections. Common in first drafts and AI-assisted text where connectors were inserted mechanically.
4 to 5 Severe coherence breakdown: the majority of connectors are decorative, terminology is inconsistent, and both paragraph-level and section-level transitions are abrupt. Compatible with AI-generated text produced in a single pass without revision.

Limitations

The connector dictionary is finite. A text that uses connectors outside the dictionary (e.g., in light of the foregoing, be that as it may) will not contribute to sub-check 1 even if those connectors are decorative.

Lemma-based overlap is a coarse measure of semantic continuity. Two sentences that genuinely discuss related themes using different vocabulary (e.g., one using domain-specific terms and the other using common-language paraphrases) will show low lemma overlap and may be incorrectly flagged as decorative connector usage. This is an inherent limitation of bag-of-lemmas approaches and is partially mitigated by the low overlap thresholds (0.15 for sentences, 0.10 for paragraphs, 0.05 for sections).

Terminological consistency (sub-check 2) uses a threshold of three surface forms per lemma. Legitimate morphological variation (e.g., child/children, thesis/theses) can trigger this sub-check on well-written text. The finding severity is set to info rather than warning for this reason.

Cross-section transition coherence requires the IMRaD section classifier to recognise section headers. Documents without standard IMRaD headings will skip sub-check 4 entirely.

References

  1. Holtzman A, Buys J, Du L, Forbes M, Choi Y. The curious case of neural text degeneration. International Conference on Learning Representations (ICLR). 2020. https://arxiv.org/abs/1904.09751
  2. Adi R, Irnawan BR, Suzuki Y, Fukumoto F. GL-CLiC: global-local coherence and lexical complexity for sentence-level AI-generated text detection. Proceedings of the International Joint Conference on Natural Language Processing (IJCNLP-AACL). 2025. https://aclanthology.org/2025.ijcnlp-long.188/
  3. Sheng Z, Zhang T, Jiang C, Kang D. BBScore: a Brownian bridge based metric for assessing text coherence. Proceedings of the AAAI Conference on Artificial Intelligence. 2024. https://ojs.aaai.org/index.php/AAAI/article/view/29879
  4. Kim J, Huang Z, McKeown K. Threads of subtlety: detecting machine-generated texts through discourse motifs. Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (ACL). 2024. https://aclanthology.org/2024.acl-long.300/
  5. Tian Y, Chen Y, Kang D, Ray B. Detecting machine-generated long-form content with latent-space variables. arXiv preprint arXiv:2410.03856. 2024. https://arxiv.org/abs/2410.03856
  6. Trace is in sentences: unbiased lightweight ChatGPT-generated text detector. arXiv preprint arXiv:2509.18535. 2025. https://arxiv.org/abs/2509.18535
  7. Yin Z, Wang S. Span-level detection of AI-generated scientific text via contrastive learning and structural calibration. arXiv preprint arXiv:2510.00890. 2025. https://arxiv.org/abs/2510.00890