I stopped using Reddit because the company was feeding my words into a large language model, and I stopped using StackOverflow because the company was feeding my words into a large language model, and I will stop using Discord if the company starts feeding my words into a large language model

https://www.theverge.com/apps/673208/discord-ai-forums-anniversary-gamechat

@mcc maybe stop using Mastodon because any LLM can train on your posts 🙂

@ErikJonker @mcc ofc anyone can send LLM ddos ai bot brigade from zombie subnet full of them to harvest data, which is likely will be noticed and action will be taken. (We're in a decentralized realm, right?)

Mastodon at least does not receive a direct profit from you being someone's training dummy for them.

@strlcat @ErikJonker @mcc I agree with you, if I don't have any profit in selling my users data, I will be taking measures so that others don't earn by scraping. Although, I don't know how scrappers work but they might alert some bells to get noticed

@dark_phoenix @ErikJonker @mcc once my gitea instance was literally ddosed by Meta, Anthropic and Huawei almost same time. Last even ignored robots.txt and changed UA strings often. All of those except probably Meta flooded from thousands of different address spaces. This brought down my experimental RISC-V server to unresponsive state multiple times. Two years earlier no such floods were there, my site was calm and still.

Later, after blocking them all, bingbot also went crazy mode around beginning of April, and I banned it too. It disobeyed any directives too. Waiting for Google to start hitting charts...

And I'm not alone.

https://arstechnica.com/ai/2025/03/devs-say-ai-crawlers-dominate-traffic-forcing-blocks-on-entire-countries/

Open source devs say AI crawlers dominate traffic, forcing blocks on entire countries

AI bots hungry for data are taking down FOSS sites by accident, but humans are fighting back.

Ars Technica