ResAIKit
Research Integrity Toolkit
Back to the encyclopedia
C8Text analysisStylisticLayer 1 (Deterministic)

Standard Hedging

Detects formulaic hedging phrases, booster absence, over-reliance on "may" as a default hedge, and cross-section hedge uniformity. AI-generated text defaults to a narrow, repetitive hedging vocabulary and avoids assertive language entirely.

Technical description

C8 operationalises the "hedging" dimension from the Anti-AI Vibe Review spec. It measures four aspects of hedging behaviour: (1) the ratio of generic formulaic hedges to specific evidence-based limitation statements, (2) the complete absence of booster expressions, (3) the dominance of "may" over other modal hedge verbs, and (4) whether IMRaD (Introduction, Methods, Results and Discussion) sections vary their hedge density as expected in human-authored scientific prose. The indicator runs at Layer 1 using only pattern and word-list matching.

How it works

Sub-check 1, generic hedge ratio. The text is scanned against a per-language dictionary of formulaic hedging phrases (13 English entries: it is important to note, further research is needed, results should be interpreted with caution, the study has some limitations, future studies are warranted, etc.; 40 Romanian entries). Simultaneously, the text is scanned for specific evidence-based limitation patterns: explicit cause statements (due to, because of), domain constraints (limited to, restricted to), named biases (selection bias, recall bias, measurement error), and statistical qualifiers (confidence interval, sample size N=). The ratio of generic to total hedges (generic / (generic + specific)) is computed. A ratio above 0.8 contributes +2.0 to the score; above 0.5 contributes +1.0.

Sub-check 2, booster absence. The text is scanned for 20 booster expressions in English (clearly, definitely, certainly, undoubtedly, indeed, obviously, evidently, surely, of course, without doubt, it is clear that, etc.) or 15 in Romanian. On texts longer than 500 words that already contain generic hedges, the complete absence of any booster triggers the sub-check. Shalevska (2024) found that ChatGPT-generated essays used zero boosters across 100 analysed texts, while human writing regularly mixed hedges with measured assertive language. Contributes +1.5 to the score.

Sub-check 3, "may" dominance. The frequencies of four modal hedge verbs (may, might, could, would) are tallied. When at least five modal hedges are present and "may" accounts for more than 70% of them, the sub-check fires. Shalevska (2024) found that ChatGPT used "may" at 4.54 per 1000 tokens versus 1.43 for humans (a 3.2x ratio), while humans distributed hedging across "may", "might", and "could" more evenly. Contributes +1.0 to the score.

Sub-check 4, cross-section hedge uniformity. The text is partitioned into IMRaD sections. For each section, the density of generic hedges per 1000 words is computed. When at least three sections are present and the standard deviation of their hedge densities falls below 1.0, the sub-check fires. Human authors hedge more heavily in Discussion and Conclusions (where interpretation warrants caution) than in Methods (where procedures should be stated directly). Uniform hedging across all sections suggests template-driven prose where the same cautious register is applied regardless of rhetorical purpose. Contributes +0.5 to the score.

The four contributions sum to a theoretical maximum of 5.0 (2.0 + 1.5 + 1.0 + 0.5).

Why this matters

Shalevska's 2024 comparative analysis of 100 ChatGPT-generated and 100 human-written essays found a striking asymmetry in hedging and boosting behaviour [1]. AI-generated text used "may" at 3.2 times the human rate (4.54 vs. 1.43 per 1000 tokens) while using zero boosters entirely, no "clearly", no "definitely", no "undoubtedly". Human writers, by contrast, deployed a balanced mix of hedges and boosters, with "may", "might", and "could" distributed roughly evenly. The AI hedging pattern is mechanical: a single default hedge word applied uniformly, without the assertive language that signals genuine authorial confidence in specific findings.

Foster-Fletcher's 2025 analysis of 150 Securities and Exchange Commission (SEC) 10-K filings from 50 large United States companies found that language drift accelerated by 24.5% after enterprise large language model (LLM) tools became available [2]. The drift was characterised by increased hedging, decreased specificity, fewer named referents, reduced lexical range, and less structural variation, the same signature predicted by controlled experiments on feedback-tuned language models. Importantly, none of the 50 companies disclosed LLM use in drafting.

The "may" dominance sub-check targets a specific, well-replicated finding: LLMs collapse the rich hedging repertoire of human academic English onto a single high-probability token. A human writer might write "these results could indicate...", "this finding might reflect...", or "the data would suggest..." depending on epistemic distance. An LLM writes "these results may indicate..." and repeats the pattern throughout the document.

Score thresholds

Score Meaning
0 to 1 Hedging is predominantly specific and evidence-based. Boosters are present where appropriate. Modal hedges are varied. Hedge density varies naturally across IMRaD sections. Typical of well-written scientific prose.
2 to 3 Generic hedging is elevated relative to specific limitations. Boosters may be absent. "May" dominates the modal hedge distribution. Common in AI-assisted text and formulaic academic writing.
4 to 5 Near-total reliance on generic hedging with no specific limitations stated. Zero boosters despite document length. "May" is the near-exclusive hedge word. Hedge density is flat across all IMRaD sections. Highly consistent with unprompted LLM output.

Limitations

The generic hedge dictionary is finite and language-specific. A text that uses novel hedging constructions outside the dictionary will not fire sub-check 1 even if the hedging style is formulaic. The specific hedge patterns use a small set of pattern matching templates that capture common evidence-based limitations but miss discipline-specific variants (e.g., "this effect was not significant after Bonferroni correction" uses a named statistical correction rather than a general limitation formula).

The booster list is also finite. A text that uses domain-specific assertive language (e.g., "the signal-to-noise ratio confirms") rather than general boosters will be incorrectly flagged as having zero boosters. The sub-check is gated on document length (>500 words) and the presence of generic hedges, so short or purely assertive texts are not affected.

The "may" dominance check is English-specific in its current form; the Romanian modal system uses different constructions (conditional verb forms rather than separate modal auxiliaries) and the sub-check may not generalise well without language-specific adaptation.

Cross-section hedge uniformity requires the IMRaD section classifier to recognise section headers. Documents without standard IMRaD headings skip this sub-check.

References

  1. Shalevska E. Hedges and boosters in AI and human writing: a comparative analysis. Knowledge, International Journal. 2024;65(5):505-510. https://eprints.uklo.edu.mk/id/eprint/10825/
  2. Foster-Fletcher R. What LLM writing patterns look like in SEC filings. 2025. https://fosterfletcher.com/llm-writing-patterns-sec-filings/
  3. Alia M, Aliia AM. How AI tools affect discourse markers when paraphrased. In: IRMA International Conference. 2025.
  4. Almulla A. The use of hedging devices and engagement markers: a comparative analysis of AI and human academic writing. 2025.
C8 Standard Hedging: AI Text analysis detection indicator — ResAIKit