We were not crazy. We were right.
Amazing work by our @robb corroborated by extensive analysis at Wired:
Perplexity Is a Bullshit Machine https://www.wired.com/story/perplexity-is-a-bullshit-machine/
We were not crazy. We were right.
Amazing work by our @robb corroborated by extensive analysis at Wired:
Perplexity Is a Bullshit Machine https://www.wired.com/story/perplexity-is-a-bullshit-machine/
Regulation in this space cannot come soon enough.
AI companies that want to scrape the web for training purposes, or use their bots to summarize webpages, should follow a strict set of guidelines with identifiable user-agents and IP addresses.
Publishers should have a right to opt out of any AI access, request details as to whether their copyrighted content is included in any model, and if so, request that its gets removed and the model re-trained.
Hopefully the EU's AI Act will help.
Most of all, we need to let go of this notion that open web = okay for commercial companies to scrape, ingest, and train their models.
If I wanted to open an English school, I would have opened a school to teach the English language. But I didn't.
I have a website, which is free to read, but my copyrighted material is mine and shouldn't serve as the foundation of any other commercial product.
I wish more people would understand this concept.
@cabel @viticci I’m not a content creator, but I really do appreciate the thoughts expressed by @ismh86 and @jsnell. It’s a complicated situation, there’s no easy solution, and it’s okay to have complicated, opposing opinions.
I also respect that others, such as yourself, Federico, can feel differently. You’re a creator so you have a completely different perspective than I do.
@cabel @viticci I agree with you on explicit permissions, but we need to make a distinction between "permission to use the content to train an LLM model" and "permission to simply access the content (without training)".
Tools like Perplexity may be denied the permission to train the LLM on the content of your page, but how do you prevent them from reading + summarising the page?
If you don't want the content to be summarised it's fine, but there should be a different permission to ask.
@cabel @viticci
If you try to refer to Rotten Fruits here, you misunderstand Apple's position on privacy.
The like privacy for PR purposes. And being better than Android a tiny little bit. But the Apple apps are as privacy invasive (tne tiny bit more privacy protection does not apply to your relation with your primary cult, Apple, you wouldn't want to keep your sins from your digital pastor?) as Google.
@viticci oh but they DO understand, they are just selective when it comes to enforcement.
Try and use their music, video, software, or whatever without their permission and *then* it’s a crime.
Actually, that depends upon your jurisdiction and what your copyright law says. Although AI training seems to be a new kind of usage, it's probably not really (scrape, process the data, have some output that is statistically depends on the scraped data has been done for years, if not decades now)
The EU copyright actually has an exemption for copying stuff for educational purposes.
That's why you nowadays usually get everything you need as a student via Moodle.
@viticci Back then in the days of my first studies (1990s) my parents literally spent tons on textbooks for me. (Especially Medicine was painful, inflation corrected, €2000-3000 per semester for books was quite realistic. Free university != free books)
Basically that's also why most courses on uni moodles are behind a registration wall → the copyright exemption is only for students you teach.
@viticci they will understand, as soon as you take something they make that’s available for free and turn it into a different product … 😒
The hypocrisy! 😤
@viticci @Gargron
I want to throw a book at them and ask them if they own the book.
• They own that mass.
• They can read that mass.
• They can copy that mass for personal use.
• They cannot copy that mass for selling.
• They own a thing. They also own a copy of an idea of a thing. They don’t own the idea of that thing.
• You may visit my website.
• You may read my website.
• As it is public you may even scape my website.
• You can also build an AI off of my website. You really can… but for personal use.
BUT
• Your AI contains part of my website so if you want to sell it you’ve got to ask me and all the owners first.
• WTF is the argument that that isn’t practical. No it isn’t. It just means what you did was stupid.
AI in Silicon valley has gotten where it is on rich white male privilege expressed in the legal framework.
Now, if you find the A in #AI offensive and you declare it Electrical Intelligence (#EI) as life with human rights and learning, then we can have an ethics conversation.
But as long as you say you own it, it isn’t learning, it’s processing and you can fuck off.
@viticci It seems obvious to me that creating an LLM by training it with a bunch of inputs makes it a derived work of those inputs. Output of the LLM is then a derived work of the LLM. Distributing the LLM or that output would then violate copyright of all the inputs unless it falls under fair use, and it doesn’t seem like most LLM usage would.
I suspect the law would also find this obvious if not for the fact that it’s big businesses doing it.
@viticci AI act is already done... Approved and published (https://eur-lex.europa.eu/legal-content/EN/TXT/?uri=CELEX:52021AE2482)
So is the Copyright Directive:
Directive (EU) 2019/790 of the European Parliament and of the Council of 17 April 2019 on copyright and related rights in the Digital Single Market and amending Directives 96/9/EC and 2001/29/EC
https://eur-lex.europa.eu/legal-content/EN/TXT/?uri=celex%3A32019L0790