Oh, so now scraping data without permission is bad for AI training? 😂 how ironic 😉

Anthropic accuses Alibaba of using thousands fraudulent accounts to extract Claude AI model capabiliti and data. Anthropic urged Congress to penalise the companies behind scrapping attacks like this and to ramp up measures to prevent US tech from being stolen. https://www.bbc.com/news/articles/cwyklykn5dwo
How about Anthropic pay first for stolen books, and all content out there for its shity ai?

Anthropic accuses Chinese rival Alibaba of illicitly extracting AI capabilities

The firm alleged that Alibaba used fraudulent accounts to access data from its Claude AI model.

@nixCraft @alexmu Model distillation is not scrapping and your source understandably doesn't even mention the latter. Trying to equate them is bad reporting and muddies the waters.

Scraped content can be protected by copyright, while LLM outputs aren't. Anthropic presents distillation as an "attack" because they're worried they'll fall behind the Chinese. This is in line with their previous policy (calling their model a "cyberweapon" and so on). They're fishing for regulation and protectionism.

@nixCraft @alexmu So imagine you're a tech news tabloid using Claude to write articles. You want to save money so you ask Claude for 1000 topics from the past, then you ask GLM, DeepSeek and Kimi to write articles on them in your preferred style, and use Claude to rate which one is best.

To Anthropic, this is indistinguishable from distillation, which they've just called an attack.