Oh, so now scraping data without permission is bad for AI training? 😂 how ironic 😉

Anthropic accuses Alibaba of using thousands fraudulent accounts to extract Claude AI model capabiliti and data. Anthropic urged Congress to penalise the companies behind scrapping attacks like this and to ramp up measures to prevent US tech from being stolen. https://www.bbc.com/news/articles/cwyklykn5dwo
How about Anthropic pay first for stolen books, and all content out there for its shity ai?

Anthropic accuses Chinese rival Alibaba of illicitly extracting AI capabilities

The firm alleged that Alibaba used fraudulent accounts to access data from its Claude AI model.

@nixCraft @alexmu Model distillation is not scrapping and your source understandably doesn't even mention the latter. Trying to equate them is bad reporting and muddies the waters.

Scraped content can be protected by copyright, while LLM outputs aren't. Anthropic presents distillation as an "attack" because they're worried they'll fall behind the Chinese. This is in line with their previous policy (calling their model a "cyberweapon" and so on). They're fishing for regulation and protectionism.

@lnicola @nixCraft In what way is the difference meaningful?
@alexmu @nixCraft It doesn't involve scrapping copyrighted content (like downloading Harry Potter and training on it), it harms no-one because you're paying for your usage and you respect the rate limits (unlike hacking around OAuth to use your subscription with another harness, which is against their terms of use) so it's not an attack, and they can't even tell if you're doing something "wrong" (distilling a model) or something harmless like evaluating other models.