ResAIKit
Research Integrity Toolkit
Back to the encyclopedia
C1Text analysisStylisticLayer 1 (Deterministic)

Generality

Detects overly vague language that avoids specific claims or concrete details, a hallmark of AI-generated text that stays safe by being non-committal.

Technical description

Measures anchor density per sentence by counting concrete referents (proper nouns, numerical values, dates, locations, specific methodology terms) against total sentence count. Uses dictionary-based pattern matching against nominalizations, vague quantifiers ('various', 'numerous', 'several'), and abstract noun phrases. Computes a generality ratio as (vague_phrases / total_sentences) and maps it to a 0-5 score.

How it works

Layer 1 (deterministic): Scans text for dictionary-matched vague quantifiers and nominalizations. Counts concrete anchors (numbers, proper nouns, dates, specific terms). Calculates ratio of vague-to-concrete elements per sentence. Flags sentences with zero concrete anchors.

Why this matters

AI-generated text systematically avoids making specific, falsifiable claims because the model lacks grounding in real experimental details. Human authors writing about their own research naturally include specific measurements, dates, locations, and named entities. A high generality score suggests the text was not written by someone with first-hand knowledge of the subject matter.

Score thresholds

0-1
Rich in specific anchors, measurements, and concrete details
2-3
Some zones of generality mixed with specific claims
4-5
Dominated by generic, non-committal statements throughout

Limitations

Short texts (under 200 words) may have naturally low anchor density. Review articles and theoretical papers may score higher without being AI-generated. Domain-specific jargon may not be recognized as concrete anchors.

References

  1. Gao CA, Howard FM, Markov NS, et al.. (2023). Comparing scientific abstracts generated by ChatGPT to real abstracts with detectors and blinded human reviewers. npj Digital Medicine
  2. Liang W, Zhang Y, Wu Z, et al.. (2024). Mapping the increasing use of LLMs in scientific papers. arXiv:2404.01268
  3. Markowitz DM, Hancock JT, Bailenson JN. (2024). Linguistic markers of inherently false AI communication and intentionally false human communication: evidence from hotel reviews. Journal of Language and Social Psychology
  4. Halliday MAK. (1985). An Introduction to Functional Grammar. Edward Arnold
  5. Hyland K. (2005). Stance and engagement: a model of interaction in academic discourse. Discourse Studies
  6. Cabanac G, Labbe C. (2021). Prevalence of nonsensical algorithmically generated papers in the scientific literature. Journal of the Association for Information Science and Technology