With the Reddit thing, it'd be easy to miss what's going on on StackOverflow. Basically:

1. Site owners allowed LLM-generated content.

2. Mods are on strike.

3. Also, back in March it turns out SO turned off the Creative Commons data pipe, which backs up the site to the Internet Archive, in an attempt to confound using SO for training data.

https://www.vice.com/en/article/4a33dj/stack-overflow-moderators-are-striking-to-stop-garbage-ai-content-from-flooding-the-site

https://meta.stackexchange.com/questions/390106/moderation-strike-update-data-dumps-choosing-representatives-gpt-data-and-wh

Stack Overflow Moderators Are Striking to Stop Garbage AI Content From Flooding the Site

Volunteer moderators of the forum are striking over a policy that says AI-generated content can practically not be moderated.

GPT on the platform: Data, actions, and outcomes

In a meeting with some moderators last week, I committed to releasing the data sets from our initial studies around the efficacy and false positive rates of ChatGPT detectors to them. Tuesday after...

Meta Stack Exchange
@mogul I believe this is addressed in the post I linked. Specifically that the methodology is suspect.
@mttaggart Oh I missed your second link! Sorry about that.