ResAIKit
Research Integrity Toolkit
Back to the encyclopedia
C10Text analysisStylisticLayer 1 (Deterministic)

PR Style

Detects promotional language, unsupported superlatives, and AI-characteristic rhetorical contrast patterns. AI-generated text defaults to an inflated marketing register with formulaic "It's not X, it's Y" constructions amplified by reinforcement learning from human feedback (RLHF) training.

Technical description

C10 detects four patterns that distinguish promotional and AI-generated prose from genuine academic writing: (1) the density of signal adjectives per 1000 words, (2) superlative claims made without quantitative or citation support, (3) negative-parallelism contrast templates ("It's not X, it's Y", "Not just X, but Y", the "isn't about X, it's about Y" variant, and Romanian equivalents, selected by detected language), and (4) rule-of-three (tricolon) overuse. The indicator runs at Layer 1 using only pattern and word-list matching.

How it works

Sub-check 1, signal adjective rate. The text is scanned against a per-language dictionary of promotional adjectives (54 English entries: robust, innovative, comprehensive, holistic, cutting-edge, unique, remarkable, revolutionary, unprecedented, groundbreaking, novel, pivotal, state-of-the-art, transformative, paradigm-shifting, game-changing, seamless, unparalleled, vibrant, dynamic, world-class, industry-leading, best-in-class, next-generation, ever-evolving etc.; 40 Romanian entries). Multi-word phrases and hyphenated compounds are matched as substrings; single-word adjectives are matched at word boundaries. The density of matches per 1000 words is computed. A rate above 6 contributes +2.0; above 3 contributes +1.0.

Sub-check 2, unsupported superlatives. Each sentence is checked for superlative constructions (the most, the best, for the first time, most important/significant/effective/promising). When a superlative is found, the containing sentence is checked for supporting evidence: Harvard citations, numeric citations, percentages, sample sizes (N=), or p-values. Superlatives without any support marker are flagged. Each unsupported superlative contributes +0.5 to the score, capped at +2.0.

Sub-check 3, contrast templates. The text is scanned for two formulaic contrast patterns that Lubrano (2025) identified as strongly characteristic of RLHF-trained large language model (LLM) output: the "It's not X, it's Y" pattern (e.g., "it's not the technology that matters, it's the people") and the "Not just X, but Y" pattern (e.g., "not just a tool, but a paradigm shift"). These occur at 27 per 1000 sentences in LLM output versus 5 per 1000 in human benchmarks, a 5.4x amplification effect attributed to RLHF training rewarding rhetorical balance over informational content. Each match contributes +0.3 to the score, capped at +1.0. The pattern set was expanded in 2025 beyond the original two patterns to also catch the cross-sentence form ("It's not X. It's Y."), the "isn't / is not about X ... about Y" variant, "more than just X", and Romanian negative parallelism ("nu e doar X, ci Y" / "nu este vorba despre X, ci despre Y"); the English or Romanian pattern set is chosen from the detected document language.

Sub-check 4, rule of three (tricolon). The text is scanned for parallel three-item constructions of the form "A, B, and C" (and the Romanian "A, B si C"). A single tricolon is unremarkable, so the sub-check scores only the rate: tricolons per 1000 words above 4 contribute up to +1.0, scaled by the excess rate. The rule-of-three cadence is a documented RLHF rhetorical mannerism (often paired with negative parallelism) that becomes a tell only when pervasive.

The four contributions sum to a theoretical maximum that is clamped to 5.0 (2.0 + 2.0 + 1.0 + 1.0).

Why this matters

The boundary between academic and promotional language has blurred in the era of LLM-assisted writing. Models trained with RLHF are rewarded for producing text that human raters find persuasive and well-structured. The cheapest way for a model to appear persuasive is to deploy the rhetorical patterns of marketing copy: inflated adjectives, unsupported superlatives, and balanced contrast constructions that sound profound without committing to a falsifiable claim.

Kobak and colleagues' analysis of 15 million PubMed abstracts found that the post-ChatGPT vocabulary surge was dominated by style words: 66% verbs and 14% adjectives [1]. Many of these are promotional in register. Words such as groundbreaking, revolutionary, unparalleled, transformative, robust, innovative, cutting-edge signal enthusiasm rather than precision. C10's signal adjective dictionary incorporates these findings, expanding from 18 to 54 English entries to cover the full range of AI-characteristic promotional vocabulary identified in 2024-2025 research.

Wikipedia's WikiProject AI Cleanup maintains a formal catalog of AI writing patterns, listing "Promotional and Advertisement-like Language" as a distinct category. Their flagged vocabulary includes boasts, vibrant, profound, showcasing, exemplifies, renowned, breathtaking, stunning, terms that belong in travel brochures and press releases, not in academic papers.

Lubrano's 2025 analysis of rhetorical patterns in LLM output identified the "It's not X, it's Y" construction (emphatic epanorthosis) as occurring at 5.4 times the human rate in RLHF-trained models. The pattern is structurally appealing, since it creates a sense of depth through contrast, but in academic writing it typically signals the absence of a direct, positive argument. A finding should be stated for what it is, not for what it is not.

Bhatnagar's 2026 AI-Generated Copy Risk framework introduced the concept of a Compliance Risk Score that combines claim intensity (superlative density) with evidentiary grounding [3]. C10's sub-check 2 applies the same principle: a superlative without evidence is a risk signal, not a stylistic flourish.

Score thresholds

Score Meaning
0 to 1 Promotional language is minimal or absent. Superlatives, when present, are supported by evidence. No formulaic contrast constructions. Consistent with genuine academic register.
2 to 3 Moderate promotional tone: some signal adjectives detected, one or two unsupported superlatives, possibly a contrast template. Common in AI-assisted text and grant proposals that oversell findings.
4 to 5 Heavy promotional register: high density of marketing adjectives, multiple unsupported superlatives, and formulaic "It's not X, it's Y" patterns. The text reads like a press release or marketing brochure rather than a scientific paper.

Limitations

The signal adjective dictionary targets a specific register of promotional language. A text that uses discipline-specific grandiosity (e.g., "this represents a paradigm shift in our understanding of ribosome biogenesis") may pass if its adjectives fall outside the dictionary. The dictionary is deliberately conservative in avoiding domain-specific evaluative terms that are legitimate in context (e.g., "significant" in a statistical sense is not flagged).

The unsupported superlative check uses a narrow set of superlative patterns and a small set of support patterns. A superlative supported by evidence that does not match the support patterns (e.g., a named statistical test without a p-value format) will be incorrectly flagged.

The contrast-template check now covers the principal negative-parallelism forms in English and Romanian, including the cross-sentence form, the "about" variant and "more than just X", but it remains pattern-based and will miss novel phrasings. The tricolon check is rate-based and deliberately conservative: legitimate three-item enumerations are common in academic prose, so only texts saturated with parallel triplets are flagged. The +1.0 cap on each rhetorical sub-check limits the damage from both false negatives and false positives.

References

  1. Kobak D, Gonzalez-Marquez R, Horvat E-A, Lause J. Delving into LLM-assisted writing in biomedical publications through excess vocabulary. Science Advances. 2025;11(27):eadt3813. DOI: 10.1126/sciadv.adt3813
  2. Lubrano F. Beyond hallucinations: linguistic and textual analysis of LLM-generated texts. Zenodo. 2025. https://zenodo.org/records/16947334
  3. Bhatnagar A. AI-generated copy risk: detecting persuasive but misleading marketing content. TDCommons. 2026. https://www.tdcommons.org/dpubs_series/9384/
  4. Wikipedia WikiProject AI Cleanup. Signs of AI writing: promotional and advertisement-like language. 2025.
  5. Wen Y, Laporte S. Experiential narratives in marketing: a comparison of generative AI and human content. Journal of Public Policy & Marketing. 2025. DOI: 10.1177/07439156241297973
C10 PR Style: AI Text analysis detection indicator — ResAIKit