https://smallsheds.garden/blog/2026/on-the-acceptance-of-genai/
None of these are true if you run your own LLMs on your own hardware, using FLOSS models.
But the #MastodonHOA has deemed all AI to be abhorrent as a blanket decision.
And frankly, if you exist in a capitalist society, and you're not an owner, there is 100% chance you are exploited. The capitalist system requires it.
"Trained on stolen data". Its at best a copyright violation. And I view things like Anna's Archive and Libgen to be internationally renowned Public Libraries.
"Massaged by people in global majority countries" - yes, people work in capitalism. And guess what... You're exploited.
"Trained in environmentally harmful data centers". This assumes that training is always needed, and its not. You can train once, and run X times. Again, you're stretching to make local LLM look horrible.
And really, the rest of these are poor excuses. I won't use poop smear(anthropic), or OpenAI, or other SaaS token companies. I run local, and does not have those things you claim.
Except for the copyright issue. But again, I dont have that much respect for current US copyright.
Its at best a copyright violation
This may be true for published and public data... but that's not the only data that goes into these things. Any data that comes from breaches, users private cameras, and anything else stored with an expectation of privacy is much worse than a copyright violation.
And yes, that is a big issue with the SaaS token vendors. Claude, OpenAI, MS, and the rest do use whatever user data they can get. I am not arguing their horrific behavior.
I'm talking about locally running Qwen, or Deepseek, or other FLOSS models.
That local LLM running on my machine only sees and uses data I provide. And a control-c in the relevant console window kills the LLM.
What folks do not realize is this is #Leibniz's ultimate dream, of being able to do #calculus with words, sentences, and more. He tried to do single word-vectors, but even that had to wait for Word2Vec in 2012.
@Epic_Null @crankylinuxuser @tante “local” models are as reliant on illegal data acquisition, because they depend on the larger mainstream models to reach any level of tolerable performance. Whether it’s for training, fine tuning, distillation, or another method, that dependency means anything that goes into the development of the nonlocal model is also a requirement for the development of the local versions.
Deepseek and Qwen are no exception.