Friday, August 22, 2025

List of AI Writing Tells

The internet is becoming polluted with slop, content that's been produced by AI tools like ChatGPT or Gemini. The problem with that kind of junk content more than just slpppy writing; large language models can and often do hallucinate, inventing facts that seem plausible but have no basis in reality. 

This is a real problem for Wikipedia, whose purpose is to provide accurate information. To combat this, Wikipedia has created a list of AI tells, things that will help its editors spot content that's been produced by AI and hence requires closer scrutiny. 
This is a list of writing and formatting conventions typical of AI chatbots such as ChatGPT, with real examples taken from Wikipedia articles and drafts. Its purpose is to act as a field guide in helping detect undisclosed AI-generated content. Note that not all text featuring the following indicators is AI-generated; large language models (LLMs), which power AI-chatbots, have been trained on human writing, and some people may share a similar writing style.

The listed observations are empirical statements, not normative statements (except notes on how strong an indicator something should be taken to be). The latter are contained in Wikipedia's policies and guidelines. Any normative content about what kind of formatting or language not to use in articles is not topical here; it might belong in (and is probably already present in) the Manual of Style.

Here's just one item from the list. 

Rule of three

LLMs overuse the 'rule of three'—"the good, the bad, and the ugly". This can take different forms from "adjective, adjective, adjective" to "short phrase, short phrase, and short phrase".

Whilst the 'rule of three', when used sparingly, is considered good writing, LLMs seem to rely heavily on it so the superficial explanations appear more comprehensive. Furthermore, this rule is generally suited to creative or argumentative writing, not purely informational texts.

Examples

The Amaze Conference brings together global SEO professionals, marketing experts, and growth hackers to discuss the latest trends in digital marketing. The event features keynote sessions, panel discussions, and networking opportunities.

When I was working at the TSX, I used Paul Beverley's wonderful FRedit Microsoft Word add-in to scan my documents for words and phrases that I would review and likely change. It could easily be adapted to catch many of the AI signatures (in poker terminology, tells) in a document. Some of Paul's other tools would also be useful in analyzing documents to spot content that has been produced by an LLM. 

What I'd really like is a browser extension that would flag web pages that appeared to be AI-generated. I know there are such tools and may do a bit of digging to find one that would work for me, preferably one that's open source. Suggestions are welcome. 

No comments: