AI training is updated on Nigerians and Kenyans. Here's an essay from a Kenyan about AI stealing the style he was taught. The essay's full of "AI tells" - but it does not at all read like AI, because it's so clearly a human writing, with something to say.

https://marcusolang.substack.com/p/im-kenyan-i-dont-write-like-chatgpt

I'm Kenyan. I Don't Write Like ChatGPT. ChatGPT Writes Like Me.

I'm calm. I'm calm. I promise.

this man's mind
@davidgerard I spotted a typo :) .
"The machine [...] accidentally replicated the linguistic ghost of the British Empire."
Marvellous essay, poignant even, I'd say.

@davidgerard

> It was the Queen's English, the language of the colonial administrator, the missionary, the headmaster. It was the language of the Bible, of Shakespeare, of the law. It was a tool of power, and we were taught to wield it with precision.

People complaining about the way LLMs write are often shit writers. LLMs write the way they do because of the texts they are trained on. The "not just A but B" pattern and em-dashes are useful for compactly and elegantly communicating complex thoughts.

But most people don't write complex thoughts – heck, before the internet, most people barely wrote at all after reaching adulthood, so LLMs are trained on books and academic sources. Sources that contain different language from what is used by the average person and therefore seems strange.

If LLMs had more reddit in their training body than scientific articles, we'd dismiss any half-thought referring "updoots" or "doggy" or "good sir" as slop. Which we probably should anyway.
@davidgerard oh, that is where it comes from. i was wondering.
@davidgerard This is consistent with my impression of the LLM writing style. It's not really a new style, it's more or less just the standard style used in formal, academic or technical written works. Someone called me an LLM twice for this reason (interestingly, once for an English post, and once for a Chinese post). The "itemized list" writing style? Pioneered by outline text processors for decades, with a small cult following in tech, Emacs Org Mode being the most famous one. I invite all readers to check a document known as the The Cyphernomicon (1994) by Tim May, a 100,000-word thesis that summarized everything in the 1990s Cypherpunk movement (written in MORE on macOS 9, to my best knowledge, it's likely the longest published outline document, the masterpiece of this genre). The abuse of technical terminology? I personally do it all the time.

The only difference is that an LLM can do it to an extent and consistency that almost no human can match. I can't find a way to abuse a new term in every paragraph, an LLM does it with ease.
Outliner - Wikipedia

@davidgerard interesting. My son's study skills tutor was trying to change his writing style by using connecting words, including the dreaded furthermore. He pushed back, saying that these changes made his work sound AI generated. Reading this article, she obviously wants him to "replicate[d] the linguistic ghost of the British Empire". Being 19 and autistic this is not his natural communication style. Maybe AI can inadvertently bring about a change to the archaic, academic writing doctrine