AI training is updated on Nigerians and Kenyans. Here's an essay from a Kenyan about AI stealing the style he was taught. The essay's full of "AI tells" - but it does not at all read like AI, because it's so clearly a human writing, with something to say.

https://marcusolang.substack.com/p/im-kenyan-i-dont-write-like-chatgpt

I'm Kenyan. I Don't Write Like ChatGPT. ChatGPT Writes Like Me.

I'm calm. I'm calm. I promise.

this man's mind
People complaining about the way LLMs write are often shit writers. LLMs write the way they do because of the texts they are trained on. The "not just A but B" pattern and em-dashes are useful for compactly and elegantly communicating complex thoughts.

But most people don't write complex thoughts – heck, before the internet, most people barely wrote at all after reaching adulthood, so LLMs are trained on books and academic sources. Sources that contain different language from what is used by the average person and therefore seems strange.

If LLMs had more reddit in their training body than scientific articles, we'd dismiss any half-thought referring "updoots" or "doggy" or "good sir" as slop. Which we probably should anyway.
@michael @davidgerard i feel like it's more about the excessive, disproportionate use of those features rather than their presence in general? LLMs also have a tendency to use puffery and weasel words when writing about the importance of a subject matter, tend to write very superficial analyses, and overuse specific words (like highlight, delve, boasts, instinct, etc) to the point of losing meaning - i think all of those are features that actively make text less meaningful and more poorly written. there's a lot more about this written at https://en.wikipedia.org/wiki/Wikipedia:Signs_of_AI_writing, which is a very good article that i highly recommend reading
Wikipedia:Signs of AI writing - Wikipedia

You're right that LLM analysis is superficial and often wrong. People who do not read/produce advanced texts make a connection between "tells" learned from advanced texts and claim they are the problem because they only just learned what an em-dash is and can recognize it.

LLM texts are often quite well-written – way better than most people can write without them – their content is just junk, and that's heaps harder to identify than counting em-dashes.

Also, I judge you for using a hyphen/minus in place of an en-dash ;) (jk)