Hypocrites.

You built an industry on scraping the internet and now you’re shocked someone scraped you. You normalized the idea that anything publicly accessible is fair game for training. That’s the precedent you set. I don’t want to hear you bitch about it now.

@simonbs I wonder if there was much less backlash against LLMs if they were a commons and fully open rather than proprietary with massive commercial motives.

@nighthawk Porbably a little less, at least from the software industry, but I think authors and artists would give the same backlash.

Just to be clear: my post isn't meant as backlash, per se. I use AI tools frequently. I just don't want Anthropic etc. to cry about someone stealing from them when that's their entire business.

@simonbs @nighthawk recursion babyyyyyyyy!! 😍
@simonbs @nighthawk Bank Robbers get mugged during getaway!!! Read all about it!!!
@simonbs @nighthawk It's somewhat comical. Part of why they are worked up about it likely that they assess distillation will be easier to defend legally than direct rights-violating use of training data. The reality is there are many strong technical teams in China and elsewhere, and their technical progress threatens the margins of Anthropic etc.
@simonbs @nighthawk I wonder if authors and artists would have something to backlash against if it were open source. Maybe someone in the *distributed, social* decision making process might have considered that it wasn't really right to take advantage of people.

@nighthawk @simonbs

https://tante.cc/2024/10/16/does-open-source-ai-really-exist/

This article might be of interest to you then. Core idea is that that AI in its current meaning is antithetical to open source as it has been understood or free software as it was defined by the FSF. You know, because if they told you everything they trained on, someone motivated should start looking for the entire Harry Potter universe in there just to name a prominent example that has already been in the news for Anthropic specifically.

Does Open Source AI really exist?

The OSI has presented their definition of Open Source AI and a closer reading only shows that "Open Source AI" probably just isn't a thing that can exist.

Smashing Frames

@hiiaminfi @nighthawk @simonbs

now always closed open but minded

@simonbs "BUT WE STOLE IT FIRST" is an interesting legal strategy.
@grumble209 @simonbs
tbf. There are entire nations founded on this principle.

@Andii @grumble209 @simonbs

[carefully avoids glancing at the US for fear of the fascists]

@simonbs I love the sweet sweet tears of the AI industry in the morning.
@simonbs it’s like a thief being robbed by another thief, but demanding ethics and a fair game.
Shocked that gambling is happening - Casablanca (1942)

YouTube
@simonbs if i was anthropic i'd stfu

@simonbs

#TranslatedFromTheRepublican

"Our stolen car was used by a car thief to steal more cars. "

@simonbs Rofl, I'd play the world's tiniest violin for them, but apparently they're down to tardigrade size now according to the memes, so I can't even do that. Oh well.

@simonbs Lets just forget about copyright.

If it is accessible it should be free to be copied.

I hope the EU starts the wave by abolishing US copyright in EU.

@UlrikNyman Okay, I'm a creator. I write words and music, and I'm a photographer. The only way to market my work is to make it accessible at least in some form.

So how do I get paid if everything accessible should be free to be copied? If I can't use #copyright to protect my work and claim legal ownership so I get paid for my work, then why should I create?

Something has to pay the bills for creators, authors, and artists, or else you won't have any creators, authors, or artists.

@simonbs

@DanielMReck @simonbs

If we realistically should do this I think fundraising before starting projects (kickstarter) or pay as you go for extra content (patreon) would be a solution.

If I should be more realistic I would actually like us to have 5 or 10 years of copyright and not close to infinite.

@DanielMReck @simonbs

And we could also do it with a version of universal basic income.

Copyright is not a law of nature. It was a concept invented for the age of the printing press.

@simonbs Kind of difficult to not root for the scrapers. Please, scrape each other even harder, anything that makes this whole thing collapse a little quicker.
@simonbs they don’t say anything one way or another. Maybe they are proud of them 😂

@imyke Haha, fair point! I was curious enough about their stance to log into Twitter and read their own replies.

They acknowledge the utility of “distillation" but point out that foreign labs should not distill American models.

I mean, I'm pretty sure Anthropic's models aren't trained on American data only 🙃

Anthropic is lying to us.

YouTube
@simonbs Will someone please think of the billionaires 😭
Drug Commercial - You Alright I Learned it From Watching You - 15 Second Spot (1991)

YouTube

@simonbs You greedy little bastards! Stop whining and get a life!

This whole GenAI LLM industry is built on stolen work and stolen creativity, This is organised crime, and you are the mobsters.

Now your scum colleagues scrape your models, and you feel betrayed? You're thieves like them, and all of your bunch can rot in hell!

@simonbs I bet my left nut that Anthropic, OpenAI, etc. are doing the same with all their competitors as well.

I hope it causes model collapse.

@simonbs So much use for this meme pic lately.
@simonbs Someone is stealing from the thieves? Oh no.

@simonbs No, you don't understand. Anthropic is a WESTERN company so by definition anything they do is good! It's when predominantly non-white people do it that it's bad!

Get with the program!

@simonbs isn't the standard phrase "thoughts and prayers"?
@simonbs but they point to a.great problem. Alibaba uses its endpoint to simulate millions of users to scrape the internet. Especially open source and free software in a barbarian way, Vandals to be precise, externalizing their cost of their assaults to the people of the internet. And I welcome the unethical to tell us who is even worse and the moralless. Please continue calling one the other out.
@simonbs i mean it's even worse, scraping usually involves extracting data for free. Claude on the other hand is not a free service.
They got paid and still crying about it.
@simonbs This is so breathtakingly tone deaf, I have to verify it's even legitimate. It's incredible in the most literal sense.

@simonbs

Anthropic Assholes

You built an industry on scraping the internet and now you’re shocked someone scraped you.

But But
But anyone from trumpistan is a good thief, spy. pedo, rapist self enriching moronic klan thug et al.

@simonbs
1. *tardigrade-smallest-violin.gif*

2. We already know, and have studied and demonstrated, how training one model off the output of another, doesn't work. This is a big sad boo-hoo over nothing at all.

3. Their entire industry is founded on stealing all IP ever, from everywhere. Suck it.

4. *loughborough-university-smallest-violin.gif*

@simonbs

I can't seem to find my tiny violin.

@simonbs This is doubly ironic considering they at least got paid when their competitors accessed their content.
@simonbs I hope someone gave them this reply in their original post...

@simonbs This is just corporate BS. Publicly announcing this in case at some point in the future there’s some whackadoo law that says you can copyright AI training or some nonsense.

It’s as dumb as most patents or “you have to defend your trademark to keep it” - all just there to prop up the corporate lawyers.

@simonbs yeah that does seem like a reasonable argument!
@simonbs so ... wait ... the AIs are cannibalizing each other to learn from each other to be better cannibals?

@simonbs Seems Anthropic hasn't considered the effect that making AI companies pay for the IP they produce derivative works from would have. It would totally kill the AI industry I hear. All that profit would then just go poof.

Someone needs to go learn them some economy 101.

@simonbs "distillation attacks" lmao.