We were not crazy. We were right.

Amazing work by our @robb corroborated by extensive analysis at Wired:

Perplexity Is a Bullshit Machine https://www.wired.com/story/perplexity-is-a-bullshit-machine/

Perplexity Is a Bullshit Machine

A WIRED investigation shows that the AI-powered search startup Forbes has accused of stealing its content is surreptitiously scraping—and making things up out of thin air.

WIRED

Regulation in this space cannot come soon enough.

AI companies that want to scrape the web for training purposes, or use their bots to summarize webpages, should follow a strict set of guidelines with identifiable user-agents and IP addresses.

Publishers should have a right to opt out of any AI access, request details as to whether their copyrighted content is included in any model, and if so, request that its gets removed and the model re-trained.

Hopefully the EU's AI Act will help.

Most of all, we need to let go of this notion that open web = okay for commercial companies to scrape, ingest, and train their models.

If I wanted to open an English school, I would have opened a school to teach the English language. But I didn't.

I have a website, which is free to read, but my copyrighted material is mine and shouldn't serve as the foundation of any other commercial product.

I wish more people would understand this concept.

@viticci

Actually, that depends upon your jurisdiction and what your copyright law says. Although AI training seems to be a new kind of usage, it's probably not really (scrape, process the data, have some output that is statistically depends on the scraped data has been done for years, if not decades now)

The EU copyright actually has an exemption for copying stuff for educational purposes.

That's why you nowadays usually get everything you need as a student via Moodle.

@viticci Back then in the days of my first studies (1990s) my parents literally spent tons on textbooks for me. (Especially Medicine was painful, inflation corrected, €2000-3000 per semester for books was quite realistic. Free university != free books)

Basically that's also why most courses on uni moodles are behind a registration wall → the copyright exemption is only for students you teach.