AI training is updated on Nigerians and Kenyans. Here's an essay from a Kenyan about AI stealing the style he was taught. The essay's full of "AI tells" - but it does not at all read like AI, because it's so clearly a human writing, with something to say.

https://marcusolang.substack.com/p/im-kenyan-i-dont-write-like-chatgpt

I'm Kenyan. I Don't Write Like ChatGPT. ChatGPT Writes Like Me.

I'm calm. I'm calm. I promise.

this man's mind
@davidgerard I spotted a typo :) .
"The machine [...] accidentally replicated the linguistic ghost of the British Empire."
Marvellous essay, poignant even, I'd say.
@davidgerard To clarify: The typo is *not* the quote above, which I genuinely like. The typo is something mundanely human.

@davidgerard

> It was the Queen's English, the language of the colonial administrator, the missionary, the headmaster. It was the language of the Bible, of Shakespeare, of the law. It was a tool of power, and we were taught to wield it with precision.

People complaining about the way LLMs write are often shit writers. LLMs write the way they do because of the texts they are trained on. The "not just A but B" pattern and em-dashes are useful for compactly and elegantly communicating complex thoughts.

But most people don't write complex thoughts – heck, before the internet, most people barely wrote at all after reaching adulthood, so LLMs are trained on books and academic sources. Sources that contain different language from what is used by the average person and therefore seems strange.

If LLMs had more reddit in their training body than scientific articles, we'd dismiss any half-thought referring "updoots" or "doggy" or "good sir" as slop. Which we probably should anyway.
@michael @davidgerard i feel like it's more about the excessive, disproportionate use of those features rather than their presence in general? LLMs also have a tendency to use puffery and weasel words when writing about the importance of a subject matter, tend to write very superficial analyses, and overuse specific words (like highlight, delve, boasts, instinct, etc) to the point of losing meaning - i think all of those are features that actively make text less meaningful and more poorly written. there's a lot more about this written at https://en.wikipedia.org/wiki/Wikipedia:Signs_of_AI_writing, which is a very good article that i highly recommend reading
Wikipedia:Signs of AI writing - Wikipedia

You're right that LLM analysis is superficial and often wrong. People who do not read/produce advanced texts make a connection between "tells" learned from advanced texts and claim they are the problem because they only just learned what an em-dash is and can recognize it.

LLM texts are often quite well-written – way better than most people can write without them – their content is just junk, and that's heaps harder to identify than counting em-dashes.

Also, I judge you for using a hyphen/minus in place of an en-dash ;) (jk)
@davidgerard oh, that is where it comes from. i was wondering.
@davidgerard This is consistent with my impression of the LLM writing style. It's not really a new style, it's more or less just the standard style used in formal, academic or technical written works. Someone called me an LLM twice for this reason (interestingly, once for an English post, and once for a Chinese post). The "itemized list" writing style? Pioneered by outline text processors for decades, with a small cult following in tech, Emacs Org Mode being the most famous one. I invite all readers to check a document known as the The Cyphernomicon (1994) by Tim May, a 100,000-word thesis that summarized everything in the 1990s Cypherpunk movement (written in MORE on macOS 9, to my best knowledge, it's likely the longest published outline document, the masterpiece of this genre). The abuse of technical terminology? I personally do it all the time.

The only difference is that an LLM can do it to an extent and consistency that almost no human can match. I can't find a way to abuse a new term in every paragraph, an LLM does it with ease.
Outliner - Wikipedia

@davidgerard interesting. My son's study skills tutor was trying to change his writing style by using connecting words, including the dreaded furthermore. He pushed back, saying that these changes made his work sound AI generated. Reading this article, she obviously wants him to "replicate[d] the linguistic ghost of the British Empire". Being 19 and autistic this is not his natural communication style. Maybe AI can inadvertently bring about a change to the archaic, academic writing doctrine