I stopped using Reddit because the company was feeding my words into a large language model, and I stopped using StackOverflow because the company was feeding my words into a large language model, and I will stop using Discord if the company starts feeding my words into a large language model

https://www.theverge.com/apps/673208/discord-ai-forums-anniversary-gamechat

This is for two reasons, one, because I find it violating to find my words / art / self mulched into a large language model, and I don't want to give business to a company that is profiting from doing that. But also, if someone is *using* an LLM to "summarize" my words, I do not want them to receive whatever the benefit was they were getting from my speech
@mcc One of these days I need to try out Rocket Chat, Revolt, Spacebar, etc and figure out what my contingency plan is.
@j3rn @mcc XMPP has it's supporters
@fluffykittycat @mcc I'd be interested to try XMPP again and also IRCv3. The real question is what alternative platform I can convince my non-tech friends to adopt.
@j3rn @mcc we need to get better at figuring out how to get it done tech people to adopt better Tech altogether. Communication tools are useless without people to talk to if we can get better at helping people with this we can achieve real growth
@j3rn @mcc like we take it for granted that non-technical people are going to just be stuck with corporate crap but if we can figure out how to fix that it would be the key. I don't have any answers to this at the moment but my thoughts are if they have an old laptop laying around offered to take it install Linux and bends and preload it with a ton of software and give it back to them. Get them over the hump of acquiring and setting up stuff might help a little?

@fluffykittycat Getting set up is definitely part of the problem, since the hardware most folks buy comes with something proprietary.

I walked a friend through installing Fedora Silverblue on his new machine (he actually asked me to, it wasn't even my idea), and it went smoothly enough, but it's rare that he boots into itโ€”the machine also has Windows and he tends to use that.

A big part of the problem is that Linux systems are different from Windows or Mac systems, which means learning (1/2)

@fluffykittycat new ways to use your computer.

On the one hand, I'd argue that the new ways are *better* (e.g. installing packages from a repository instead of downloading stuff from websites). On the other, I'm not sure anyone is going to invest the time to change their behavior unless they really, really want to.

Maybe distros that try to emulate Windows/Mac are (part of) the solution. Maybe Wine should come baked in so you can run arbitrary EXEs and MSIs. (2/2)

@mcc maybe stop using Mastodon because any LLM can train on your posts ๐Ÿ™‚
@ErikJonker @mcc At least with Mastodon you have the option of limiting your posts from being visible to search engines.

@mast0d0nphan @ErikJonker @mcc

Unfortunately it doesn't stop the scrapers tho. Scrapers gonna scrape

@ErikJonker @mcc ofc anyone can send LLM ddos ai bot brigade from zombie subnet full of them to harvest data, which is likely will be noticed and action will be taken. (We're in a decentralized realm, right?)

Mastodon at least does not receive a direct profit from you being someone's training dummy for them.

@strlcat @ErikJonker @mcc I agree with you, if I don't have any profit in selling my users data, I will be taking measures so that others don't earn by scraping. Although, I don't know how scrappers work but they might alert some bells to get noticed

@dark_phoenix @ErikJonker @mcc once my gitea instance was literally ddosed by Meta, Anthropic and Huawei almost same time. Last even ignored robots.txt and changed UA strings often. All of those except probably Meta flooded from thousands of different address spaces. This brought down my experimental RISC-V server to unresponsive state multiple times. Two years earlier no such floods were there, my site was calm and still.

Later, after blocking them all, bingbot also went crazy mode around beginning of April, and I banned it too. It disobeyed any directives too. Waiting for Google to start hitting charts...

And I'm not alone.

https://arstechnica.com/ai/2025/03/devs-say-ai-crawlers-dominate-traffic-forcing-blocks-on-entire-countries/

Open source devs say AI crawlers dominate traffic, forcing blocks on entire countries

AI bots hungry for data are taking down FOSS sites by accident, but humans are fighting back.

Ars Technica
@ErikJonker @mcc
Not legally, robots.txt disallows GPTBot from scraping the instance.
@Andres @mcc true, the problem is not everyone follows the law, also there are other methods maybe, I can just follow a massive amount of users and use all their posts as training data? ๐Ÿค”

@ErikJonker @mcc
robots.txt works like passwords, sure anyone can guess passwords and get access to private data, but it's common agreement that constitutes "hacking", by crawling (even with follow consent) with the intent of train a LLM, it clearly breaks the spirit of the rules set by the server.

Anyway, posting "maybe stop using Mastodon" it's not nice btw.

@Andres @mcc true, I was not serious about not posting on Mastodon, I really like this platform ๐Ÿ˜ƒ
@mcc What, Discord are still claiming they're not doing this? I don't believe it for an instant.
@mcc Stop using Discord regardless
@daskye @mcc ๐Ÿ‘† this so much. i can't believe the success of discord. i know from people which even install their spyware on their desktops.
but most of them did not care about the data usage from discord, why bother now??

@mcc

They literally just said "a conversation is a shareable object." Yikes.

@jadon1 There are contexts in which this is true but I suspect that many people are using Discord specifically *because* it is less true on Discord than it is elsewhere.
@mcc Yeah, 100%. I'm in a couple servers where this is going to be a non-starter.
@jadon1 @mcc Maybe Peter Sellis is a shareable object itself

@jadon1 @mcc

I fucking hate techies who do this. Itโ€™s like marketing and product people who use โ€œlearningsโ€.

@mcc what's this obsession with summarising things anyway? Why on earth would I want a summary of a Discord conversation??

@Tijn @mcc I kinda understand the desire in some ways, because the open web is dead and a lot of information (like, to pick one topic that matters to me, videogame glitches) are only discussed in ephemeral discord conversations, and wikis/websites/etc aren't really maintained.

The amount of information locked up and unsearchable in discord histories is sad, and I wish I had a tool to search everything in my own life (my discords, slacks, etc) without including a giant corporation or LLM :(

@iagox86 @Tijn @mcc

a system that coverts public discord chats into wikis would indeed be the singular magic fix to most of the "discord is killing [...]"

but an llm is not that. being told to put glue on pizza isn't any more helpful in a wiki article than it is in a chat.

and whatever their intention here, the absolutely last thing these fine management specimens want is "allowing folks w/out discord accounts access to this information"

this "shareable integrated object" will be for sale

@maybenot @Tijn @mcc I was strictly answer the question, "why would somebody want a summary of Discord".. so much information is being lost to time on Discord.

Not saying we should use LLMs for it, just that finding a way to maintain that information would be awfully nice.

I will let you know of my future plans then hehe
@mcc I heard this in a training I attended a few days ago. While the lecturer was big on using AI, he also said that a human element of reading, writing, and editing is needed. The component of language and feeling cannot be objectified. Emotion cannot be objectified. He gave the talk to mental health therapists.

@mcc I trust them not at all to make transparent decisions on this.

But at least they admit their current UI/UX for back-n-forth text threads with multiple people is bad.

This is a potential silver lining of AI: it tricks non-transparent people into transparency. How else would Discord go around announcing how bad their product is in this regard.

@mcc they've had "AI summaries" for a year and a half or so
@groxx Sucks. Then I want to opt out of being consumed by them
@mcc I'm very curious to see the Discord meme-and-shitpost-channel trained AI turn out horror after horror.
@mcc The web is indexed by design, that's part of its original purpose. Model training is a collateral effect of open protocols. Being against indexing feels at odds with the idea of an open web, doesnโ€™t it?
@a @mcc Indexing is a very different operation from training an LLM.

@larsmb

Many are fine with web crawling open data being used for search indexing or n-gram analysis or even NLP model building. But LLM training on it faces pushback. It' s open data why making a difference?

@mcc

@a @mcc I've been afraid to ask this question.

@mcc The nice thing about Mastodon is that your words are being fed into a large language model, but mastodon.social isn't doing that on purpose.

... mastodon.social being open and scrapable is doing that instead. ๐Ÿ˜‰

@mark My goal is to avoid driving business to any entity which is profiting from the AI scam. If Reddit/StackOverflow had simply been scrapable by AI instead of taking money from the AI companies to provide a direct feed, I would have continued using those sites.
@mcc I thought discord was already doing this? about 1.5 years ago they updated their TOS to share data with 3rd parties in some instances to "provide services" and were integrating AI bots into their platform already, from my understanding they have been leveraging AI for moderation since then, so technically it's going into some system for that anyway

they've also had automated profanity detection on images since forever, which from what I understand is provided by a 3rd party using an image based AI network too

@mcc

They already do this internally but don't show it in the official client. Third party clients like Dissent do show summaries of the conversations, so they definitely already exist.

@boo_ So do I need to start boycotting Discord now?

@mcc

If you are able to, I say go for it. I was really creeped out when I saw that they do it, personally. I just rely on it for too many things atm to be able to instantly drop it.

@mcc

I wish I could show you screenshots of it as proof but we were having some pretty personal conversations when I first discovered that it had been summarizing them.

@mcc I stopped using so many things for basically these reasons that I feel Iโ€™ve accidentally isolated myself from the world

@mcc

Love to have a "poorly structured shareable object" with my friends; it's a great way to spend a couple hours. /sarcasm

A conversation is for its participants, not for "sharing" with random strangers.

@Kathmandu @mcc how did we ever live before we had the ability to boil the soul out of every human interaction? /s
@mcc finally a way to turn Useless Thing (genuine conversation shared between friends) into Useful Thing (blob of unverified information to be sold and regurgitated as gospel)

@mcc The concerning thing is usually the companies that do this think they own your stuff because you used their platform and even if you delete your account, they're probably not going to delete the data before it goes into the LLM training data.

I had a feeling they were going to go this way, so deleted my account quite a while back when they might still actually delete it, but even so I wonder if I made it in time. Very possibly not.

@mcc
Would you mind if I feed your words into aโ€ฆ oh, nevermind, you wouldn't want to know. ๐Ÿ˜‰

@mcc and, of course, as of now, LLMs are not particularly accurate in summarisation compared to human editors.

e.g. this recent study is about summarising scientific papers:

"Artificial intelligence chatbots driven by large language models (LLMs) have the potential to increase public science literacy and support scientific research, as they can quickly summarize complex scientific information in accessible terms. However, when summarizing scientific texts, LLMs may omit details that limit the scope of research conclusions, leading to generalizations of results broader than warranted by the original study. We tested 10 prominent LLMs, including ChatGPT-4o, ChatGPT-4.5, DeepSeek, LLaMA 3.3 70B, and Claude 3.7 Sonnet, comparing 4900 LLM-generated summaries to their original scientific texts. Even when explicitly prompted for accuracy, most LLMs produced broader generalizations of scientific results than those in the original texts, with DeepSeek, ChatGPT-4o, and LLaMA 3.3 70B overgeneralizing in 26โ€“73% of cases. In a direct comparison of LLM-generated and human-authored science summaries, LLM summaries were nearly five times more likely to contain broad generalizations (odds ratio = 4.85, 95% CI [3.06, 7.70], p < 0.001). Notably, newer models tended to perform worse in generalization accuracy than earlier ones. Our results indicate a strong bias in many widely used LLMs towards overgeneralizing scientific conclusions, posing a significant risk of large-scale misinterpretations of research findings. We highlight potential mitigation strategies, including lowering LLM temperature settings and benchmarking LLMs for generalization accuracy."

https://royalsocietypublishing.org/doi/10.1098/rsos.241776

@marjolica I'm seeing people using Discord is already semi-secretly doing this for moderation purposes which if true is a dizzying mistake

@mcc

I would not trust an LLM to "summarize".

The LLM can make something shorter -- but to know what should be kept versus what can be omitted requires understanding and judgment that I don't think an LLM can imitate.

@TerryHancock @mcc

Yes, this. What is the speaker's intended point, what is most relevant to the listener's interests, and what gets the most time/repetition, are three *different* questions. And distinguishing them requires human judgement.

@mcc I _believe_ they had already started doing this at some point? There's the summary notifications and such. https://support.discord.com/hc/en-us/articles/12926016807575-In-Channel-Conversation-Summaries#h_01HEJRG4MR97JBQDQYQFYWYM07 I've seen it in third party clients in servers where I don't even see it in the actual UI :/
@mcc Haven't they had AI summaries of conversations as an experiment for a while now?

@mcc Last I knew, there was no way for discord users or admins to archive their chats that wasnโ€™t an insecure 3rd party tool probably full of malware.

The idea that theyโ€™re thinking of piping our chats to an LLM when we cannot get that data ourselves is truly maddening.