TIL that Kenyan workers have been used so much to train AI systems, that standard writing by Kenyan people is often flagged as AI generated while it is not (which means that they can get discriminated for jobs / exams etc)
https://marcusolang.substack.com/p/im-kenyan-i-dont-write-like-chatgpt

Edit: many people raised below that the article is talking about texts written in very classically trained English detected as AI generated, which is the case for many Kenyans. It is documented that many Kenyan workers have been hired to train LLMs, but I made an assumption that it was the reason for this detection while it may not be. Sorry about that, thanks for the feedback (and feel free to continue the discussion here)

I'm Kenyan. I Don't Write Like ChatGPT. ChatGPT Writes Like Me.

I'm calm. I'm calm. I promise.

this man's mind

@tek
> "Kenyan workers have been used so much to train AI systems"

Where in the linked post does it say that?

@ki @tek it doesn’t. The linked article is still worth reading, but it’s not the synopsis that OP says it is.

It’s ambiguous how to interpret that — one extreme is “Kenyan workers were paid by AI companies to tune model output”; another is “the prose style required of ex-British-Empire students is so common in the online corpus that it shaped LLM prose”.

The latter is supported by the essay.

“the writer from Lagos, from Mumbai, from Kingston, from right here in Nairobi, […] was taught that precision was the highest form of respect for both the language and the reader”

Hear, hear.

@zed @ki @tek