Terminology Analysis
Uses a large language model to assess whether technical terminology is used correctly and consistently, catching misuses that reveal a lack of genuine domain expertise.
Technical description
Sends text to an LLM with domain-aware prompts to evaluate: correct usage of technical terms in context, consistency of terminology throughout the document, appropriate use of field-specific jargon, and whether technical concepts are used in ways that demonstrate genuine understanding versus superficial pattern matching. The LLM identifies specific misuses and inconsistencies.
How it works
Layer 4 (LLM-powered): Sends the text to a language model with a rubric of four dimensions, judged independently: misuse (a term used incorrectly or imprecisely, including overgeneralization, confidence-gated), decorative jargon (terms that sound authoritative without doing real work, judged by meaning in context rather than by frequency), domain fit (vocabulary from the wrong field or register), and consistency (the same concept named inconsistently). The model returns a sub-score and flagged terms per dimension, is told to abstain rather than guess, and low-confidence flags are dropped. Sub-scores combine into one terminology score with the breakdown kept alongside. Runs only when a model is configured.
Why this matters
When language models took up scientific writing, the clearest trace was in vocabulary: a small set of style words rose abruptly, and by 2024 the excess vocabulary was almost entirely style words rather than content. Catching those by frequency belongs with the model-specific vocabulary checks. What frequency cannot tell is whether a term does real work, is the precise term the science needs, or has been stretched past its meaning. Those are judgements about meaning, and a check that asks them also ages better than a fixed word list as the overused words shift year to year.
Score thresholds
- 0-1
- Terminology used correctly and consistently throughout
- 2-3
- Minor inconsistencies in technical language
- 4-5
- Significant terminology misuse suggesting lack of domain expertise
Limitations
Requires a configured LLM provider. The evaluating LLM's domain knowledge limits assessment quality. Interdisciplinary papers may use terminology differently than single-field conventions. Rapidly evolving fields may have terminology that the LLM's training data does not cover.
References
- Kobak D, González-Márquez R, Horvát EÁ, Lause J. (2025). Delving into LLM-assisted writing in biomedical publications through excess vocabulary. Science Advances
- Juzek TS, Ward ZB. (2024). Why does ChatGPT delve so much? Exploring the sources of lexical overrepresentation in large language models. Proceedings of COLING 2025
- Peters U, Chin-Yee B. (2025). Generalization bias in large language model summarization of scientific research. arXiv preprint arXiv:2504.00025
- Thelwall M, Kousha K. (2026). Have LLM-associated terms increased in article full texts in all fields?. arXiv preprint arXiv:2604.07565
- Schroeder K, Wood-Doughty Z. (2024). Can you trust LLM judgments? Reliability of LLM-as-a-judge. arXiv preprint arXiv:2412.12509