Mastodawn

Zac Yu 🌈Nov 29

Viss Nov 28

i hope this email finds you well

Zac Yu 🌈Aug 30, 2025

Coffeedate with ADHD Aug 28, 2025

Projects only have 2 states

Zac Yu 🌈Aug 17, 2025

Tessette player Aug 16, 2025

No, I didn't "forget your name", I didn't store it in the first place, it's called being GDPR compliant.

Zac Yu 🌈Jul 27, 2025

goldship official account Jul 25, 2025

Zac Yu 🌈Jul 23, 2025

Heliograph Jul 22, 2025

#decency #baseline #hellyea

Zac Yu 🌈Jun 8, 2025

uwu1ba Jun 5, 2025

Zac Yu 🌈Jun 7, 2025

j_bertolotti Jun 6, 2025

"AI companies claim their tools couldn't exist without training on copyrighted material. It turns out, they could — it's just really hard. To prove it, AI researchers trained a new model that's less powerful but much more ethical. That's because the LLM's dataset uses only public domain and openly licensed material."

tl;dr: If you use public domain data (i.e. you don't steal from authors and creators) you can train a LLM just as good as what was cutting edge a couple of years ago. What makes it difficult is curating the data, but once the data has been curated once, in principle everyone can use it without having to go through the painful part.
So the whole "we have to violate copyright and steal intellectual property" is (as everybody already knew) total BS.

https://www.engadget.com/ai/it-turns-out-you-can-train-ai-models-without-copyrighted-material-174016619.html?src=rss