Mastodawn

@tante

None of these are true if you run your own LLMs on your own hardware, using FLOSS models.

But the #MastodonHOA has deemed all AI to be abhorrent as a blanket decision.

And frankly, if you exist in a capitalist society, and you're not an owner, there is 100% chance you are exploited. The capitalist system requires it.

tante 13h ago

@crankylinuxuser FLOSS Models (which are only freeware) fulfill most of those boxes. Trained on stolen data, massaged by people in global majority countries, trained in environmentally harmful data centers, outsourcing skills to the freeware product a company dumped on me, using a tool that is imbued and trained for how big tech wants to see the world, and effort could have gone to something meaningful. So yeah nope.

"Trained on stolen data". Its at best a copyright violation. And I view things like Anna's Archive and Libgen to be internationally renowned Public Libraries.

"Massaged by people in global majority countries" - yes, people work in capitalism. And guess what... You're exploited.

"Trained in environmentally harmful data centers". This assumes that training is always needed, and its not. You can train once, and run X times. Again, you're stretching to make local LLM look horrible.

And really, the rest of these are poor excuses. I won't use poop smear(anthropic), or OpenAI, or other SaaS token companies. I run local, and does not have those things you claim.

Except for the copyright issue. But again, I dont have that much respect for current US copyright.

Epic Null 13h ago

@crankylinuxuser @tante

Its at best a copyright violation

This may be true for published and public data... but that's not the only data that goes into these things. Any data that comes from breaches, users private cameras, and anything else stored with an expectation of privacy is much worse than a copyright violation.

And yes, that is a big issue with the SaaS token vendors. Claude, OpenAI, MS, and the rest do use whatever user data they can get. I am not arguing their horrific behavior.

I'm talking about locally running Qwen, or Deepseek, or other FLOSS models.

That local LLM running on my machine only sees and uses data I provide. And a control-c in the relevant console window kills the LLM.

What folks do not realize is this is #Leibniz's ultimate dream, of being able to do #calculus with words, sentences, and more. He tried to do single word-vectors, but even that had to wait for Word2Vec in 2012.

Grant 11h ago

@Epic_Null @crankylinuxuser @tante “local” models are as reliant on illegal data acquisition, because they depend on the larger mainstream models to reach any level of tolerable performance. Whether it’s for training, fine tuning, distillation, or another method, that dependency means anything that goes into the development of the nonlocal model is also a requirement for the development of the local versions.

Deepseek and Qwen are no exception.

@Epic_Null @crankylinuxuser @tante

komali_2 13h ago

Data wants to be free. This argument simply doesn't work for those of us that have always been open data, anti copyright.

Epic Null 12h ago

@komali_2 @crankylinuxuser @tante Every message between you and your doctor or you and your loved ones is data.

@Epic_Null @crankylinuxuser @tante

yup, so you better e2e encrypt that sort of thing

I don't care about LLMs being trained on things I want everyone to have access to because in order for everyone to have access to those things, they have to be available in a way that LLMs have access.

I'd prefer the frontier LLMs companies collapse into a black hole of capitalism but that's just because I hate corpos, not LLMs.

Epic Null 4h ago

@komali_2 @crankylinuxuser @tante I will admit that I reserve the right to be interested in AI once the bubble bursts and it's no longer being shoved into literally everything and forcefed to everyone.

Until then though, I am a hard out.

@Epic_Null @crankylinuxuser @tante I strongly recommend trying the PRC models then since they're built from tech on tech violence lol. They're distilled from the frontier models. Using them represents a capitalistic harm to openai etc

@komali_2 @Epic_Null @tante

3h ago

Check out https://github.com/brontoguana/krasis and Qwen3-coder-next . 80B param model.

Also check out safetensor variants of https://huggingface.co/collections/huihui-ai/qwen35-abliterated . Gets rid of the pesky chinese laws and moral bullshit.

GitHub - brontoguana/krasis: Krasis is a Hybrid LLM runtime which focuses on efficient running of larger models on consumer grade VRAM limited hardware

Krasis is a Hybrid LLM runtime which focuses on efficient running of larger models on consumer grade VRAM limited hardware - brontoguana/krasis

GitHub

Epic Null 3h ago

@komali_2 @crankylinuxuser @tante Until the bubble bursts and it's no longer being shoved into literally everything and forcefed to everyone, I will not be taking interest in AI from any model.

@Epic_Null @crankylinuxuser @tante 🤷 ok

@komali_2 @Epic_Null @tante

3h ago

I dont think you understand.

I'm running a local LLM, 80b parameter, on my local machine. It doesnt even have a webserver, and I run OpenWebUI on the same machine, and use a 127.0.0.1 address to connect.

I also run an abliterated model. Means it will do anything I ask. No corpo models are abliterated.

@crankylinuxuser @Epic_Null @tante that sounds great, what kind of hardware?

Ratsnake Games 🔞10h ago

@komali_2 @Epic_Null @crankylinuxuser @tante Data does not "want" anything. Data is incapable of having wants.

@ratsnakegames @Epic_Null @crankylinuxuser @tante

water wants to flow downhill

some kind of orange shape 9h ago

@komali_2 For some reason, nobody ever brings up the other part of the quote:

On the one hand you have—the point you’re making Woz—is that information sort of wants to be expensive because it is so valuable—the right information in the right place just changes your life. On the other hand, information almost wants to be free because the costs of getting it out is getting lower and lower all of the time. So you have these two things fighting against each other.

Stewart Brand, at the first Hackers Conference in 1984

@clayote "value" is an overinflated term. Instrumental value? Allocation value? Preference? Attachment?

On that note arguments around capitalistic value probably aren't interesting to anarchists. By all means, debate the number of dollarydoos that should be exchanged for, lol, bits on a disk

some kind of orange shape 4h ago

@komali_2 If you're not interested in the monetary value of data, then you're not interested in what Stewart Brand meant when he said it wants to be free, but, well, death of the author and all that

Do you value privacy at all? If so, then you might want to find some solidarity with people who've ended up sharing more than they intended to on the open web, such as a lot of the queers and sex workers on Tumblr, who objected to ArchiveTeam scraping their blogs. Inasmuch as inanimate data can be said to "want" things, the fact that their writing is now available to any interested fascist in power in the USA is what that writing wants, but it's not what the authors want.

If your anarchism has more loyalty to the rights of data than those of the people who produced it, it's shit.

@clayote

I sympathize with the pain of anyone facing violence at the hands of fascists, and my commitment to put my body in the way of that violence remains the same, and remains proven. I've done it before, and I'll keep doing it.

That said this is why we've been telling people for the last 20 years that it's not a good idea to put PID info next to the kinds of things christo fascists target. We've been warning about *exactly this outcome*.

@clayote privacy delivered by law or contract is just privacy for corporations, not for people.

True privacy can only exist through encryption.

Encrypt things you want private, everything else should be free data available to anyone, including frontier AI companies who I am very much looking forward to the catastrophic collapse of.

@komali_2 Sure. Fine. It would be better if they hadn't put that info on the public internet in the first place. They did so, either out of naïvete, or because the blogging tools available to them didn't offer the grain of privacy control they wanted, and they made the pragmatic decision to risk exposure to the wrong people, in order to be read by the right people.

Now that their data is being abused by Anthropic, they're trying to do something to limit the harm, and are using the tools available to them, which are not necessarily the tools that they want. You, as an anarchist, should support them in that effort, and that means supporting them in getting their copyright enforced -- whether or not you think copyright should exist, in the abstract.

@komali_2 It's not that different from being against violence generally, but supporting Kurdish fighters in Rojava

@clayote there's a thread there worth exploring that I need to think about

@clayote I support them attacking corpos however they please, but I argue that using copyright to do so will be at best ineffective, at worst long term harmful.

Copyright is a tool belonging to the Capital class. Its "protections" of normal people are a part of the mythology of it so that we tolerate something absurd, the idea that only certain people (read: companies) are allowed to do things with information, stories, characters.

It won't work because corpos can just ignore it, worse case.

@clayote trying to stop a corporation from using your data to train an LLM using copyright law would be like a minor version (very, very minor) of a slave sueing a slaveholder. It probably won't work, but also it's an absurdity because the entire system exists to support the slaveholder, and will happily engage in hypocrisy and paradox to do so.

@komali_2 There are lots of ways to use the law other than suing your enemy under that same law. For instance, the Writer's Guild of America got it written into their standard contracts that the studios can't train AI on the scripts produced under those contracts.

Case law that results from litigating those contracts is still copyright law.

@clayote I'm happy for them, but

1. LLMs are still being trained on those scripts because there's no way to catch them doing so, and if they get caught they'll get away with it by arguing it was oopsie accident here's .00001% of our profit as a fine
2. Their rights will continue being degraded in a constant battle against exploitation
3. Corpos will still foolishly try to replace them with LLMs regardless

My point is that LLMs are a symptom of a far greater problem

@clayote none of that changes the fact that people should be allowed to create art around the characters and stories that comprise our cultural mythology, something copyright law prevents. Or should be allowed to do whatever we want with the software on our computers, something copyright law prevents.

@komali_2 That's an agreeable point

You replied to a post that says:

Any data that comes from breaches, users private cameras, and anything else stored with an expectation of privacy is much worse than a copyright violation.

And your reply said:

Data wants to be free. This argument simply doesn't work for those of us that have always been open data, anti copyright.

I think practically everyone who reads your reply, including people like me who turn out to agree with you, will get the impression that you're uninterested in mitigating present harms to actual people's privacy.

I'm unhappy that the State has captured all the bodies that regulate privacy, I'd like them to function independently, and any data that comes from breaches, users private cameras, and anything else stored with an expectation of privacy is much worse than a copyright violation.

@clayote I understand, you're right, my response makes it seem that way, when my intention was to respond more generally to data trained in a copyright violating way (books, lectures, speeches). I should have been more clear about that earlier, thanks for pointing that out. No wonder people are responding strongly!

Dingswart 7h ago

@crankylinuxuser @tante I always thought, copyright violation is a special type of mass theft:
A right is stolen - the right to determine who is allowed to get a copy under what conditions.
By that each not explicitly allowed copy is a stolen item.
And to my knowledge theft is not about not paying for an item but to bring it out of that type of control that is commonly expected by possessing a thing.

But I'm not a lawyer. World may afford steeling from the rich but not from the inventors.

Gustavo 13h ago

@tante @crankylinuxuser I guess some people have zero idea of how AI model training works. They have the impression that "if I run this HuggingFace model in my hardware, it's ethical" but kinda think those models got uploaded there out of thin air, without any implications.

Gustavo 13h ago

@tante @crankylinuxuser Example: I use Whisper for audio transcription (mostly for accessibility issues, it's harder for me to understand audio messages than text messages), so I know using it, even self-hosted, tick most boxes.

I'm sure it was trained on stolen data (as it constantly returns things like "subtitles by example.com"), I'm sure training it hurt the environment, I'm sure the company behind it (OpenAI) does not have a viable business model (but, to be fair, I don't care about that, governments also don't have a viable business model, they don't have to).

But, since I'm using it for accessibility and there is no alternatives, we need to consider the trade offs and promote research that reduces those issues ethically. Saying "bUt I Am RuNNinG iT LoCAllY so ITs eThICAl" is dumb.

Gustavo 13h ago

@tante @crankylinuxuser So, my objetive here: sure, current AI is truly unethical and sadly we have lots of people that want to be blind about its issues, but, not all from it is bad.

I can't just say to a illiterate person "can you write for me instead of speaking?" because they just can't do that. I talk with lots of illiterate people, I'm in the construction business, lots of workers only know numbers and how to write their own name. So, Whisper, despite not being ethical, is what I use.

But, are there ethical alternatives? At the moment, I didn't find anything as reliable as Whisper, but there's the Common Voice dataset, which is free, which could be used to solve the issue of being trained on stolen data (but not the environmental issues).

paelnever 11h ago

@tante @crankylinuxuser
And what if i train my own model in my own computer powered by solar with my own data?

Ratsnake Games 🔞10h ago

@paelnever @tante @crankylinuxuser then a lot of the concerns would be alleviated but the model would most likely be garbage. If AI didn't need horrendous amounts of hardware, energy and stolen data to "function", AI companies would not use horrendous amounts of hardware, energy and stolen data.

@ratsnakegames @paelnever @tante

9h ago

A PID loop is "AI".

3 Neurons: differential, integral, proportional. Has a training phase and a recitation phase.

And yes, you use it. Its everywhere control systems are used.

Spellcheck was OHHHH SCARRRRYYYY AAAAIIIIII in the late 1980's. Now? Its just spell check. Nobody cows about AI with that now.

K-Nearest-Neighbors is multidimensional linear regression. Same with training and recitation.

Even Word2Vec and tokenization was theorized by Leibniz (you know, that OTHER calculus founder), as a way to do calculus with words.

"AI" only means scary unknown thing until we understand it.