Mastodawn

"…a damning new study could put #AI companies on the defensive. In it, #Stanford and #Yale researchers found compelling evidence that #AImodels are actually copying all that data, not “learning” from it. Specifically, four prominent LLMs — OpenAI’s GPT-4.1, Google’s Gemini 2.5 Pro, xAI’s Grok 3, and Anthropic’s Claude 3.7 Sonnet — happily #reproduced lengthy excerpts from #popular — and #protected — #works, with a stunning degree of #accuracy."

https://futurism.com/artificial-intelligence/ai-industry-recall-copyright-books

Researchers Just Found Something That Could Shake the AI Industry to Its Core

Researchers found compelling evidence that AI models are actually copying copyrighted data, not "learning" from it.

Futurism

Show thread

Open Risk Apr 2

@josemurilo

the fact that those "models" require hundreds of billions of parameters to work is sort of a smoking gun.

While there is probably some degree of "learning" involved, in that the model size is not the same as the input data size (😂 ), this is orders or magnitude away from what we normally call a "model" of something: a parsimonious representation.

And because LLM's don't really learn, we also don't really learn by inspecting them (which again is the hallmark of a useful model)

Show thread

Old Man in the Shoe Apr 3

@openrisk @josemurilo you'd probably not need that much training to be ready to answer questions on any topic

Show thread

mau 🏳️‍🌈 #EndFossilFuels Apr 2

@josemurilo This is part of the leaked system prompt in Claude Code: "Do not produce or reproduce exact song lyrics" - which goes to show some desperate engineer had to try to hide this fact by begging the thing to stop spilling the beans.

Show thread

Su_G Apr 3

@mzedp
Interesting proof & makes it crystal clear that we’re dealing with GrandTheftAutoComplete: “Claude Code: "Do not produce or reproduce exact song lyrics" - which goes to show some desperate engineer had to try to hide this fact by begging the thing to stop spilling the beans.”

#GrandTheftAutoComplete #ClaudeCode #LLMs #AI
@josemurilo

Show thread

Old Man in the Shoe Apr 3

@Su_G @mzedp @josemurilo This is the best example of why it's pointless to argue but lol

Show thread

Ray McCarthy Apr 2

@josemurilo
Of course they are. It's a total lie to use the phrase "learning". It's analogous to a distributed data flow database. It's why it needs so much RAM for reasonable performance.

They are plagiarism machines that only give useful results when regurgitating.

Better real search engines pointing to real source would be honest and more useful.

Simple fines or compensation isn't enough. They need to be opt-in only for content and all illicitly obtained content / models deleted.

Show thread

Toni Aittoniemi Apr 2

@josemurilo A statistical inference engine cannot be doing anything else than copying.

The model is essentially a ”compression”, capable of more or less reproducing it’s inputs. That’s the whole point.

Calling this process of compression/optimisation ”thinking” is the greatest scam ever pulled in IT technology!

Show thread

Charming Malcontent Apr 3

@josemurilo
A.P. (not I)

plagurism

Show thread

TheNovemberFella ✊🏳️‍🌈 🇺🇦☸️🛰️🚀Apr 3

@josemurilo I love this 😍

Show thread

Some Guy Apr 3

@josemurilo Color me shocked.

There is very little (I am being generous) intelligence with these tools. LLMs are nothing more than expensive text generators.

Show thread

El Duvelle Apr 3

@josemurilo what's surprising is that we need a study to "prove" this.. this is just how these programs work.. sometimes it looks like people actually think calling it "AI" means the program is actually intelligent? 😬

Show thread