"…a damning new study could put #AI companies on the defensive. In it, #Stanford and #Yale researchers found compelling evidence that #AImodels are actually copying all that data, not “learning” from it. Specifically, four prominent LLMs — OpenAI’s GPT-4.1, Google’s Gemini 2.5 Pro, xAI’s Grok 3, and Anthropic’s Claude 3.7 Sonnet — happily #reproduced lengthy excerpts from #popular — and #protected#works, with a stunning degree of #accuracy."

https://futurism.com/artificial-intelligence/ai-industry-recall-copyright-books

Researchers Just Found Something That Could Shake the AI Industry to Its Core

Researchers found compelling evidence that AI models are actually copying copyrighted data, not "learning" from it.

Futurism

@josemurilo

the fact that those "models" require hundreds of billions of parameters to work is sort of a smoking gun.

While there is probably some degree of "learning" involved, in that the model size is not the same as the input data size (😂 ), this is orders or magnitude away from what we normally call a "model" of something: a parsimonious representation.

And because LLM's don't really learn, we also don't really learn by inspecting them (which again is the hallmark of a useful model)

@openrisk @josemurilo you'd probably not need that much training to be ready to answer questions on any topic