All it would take for AI to completely collapse is a ruling in the US saying these companies have to licence the content they used to train these tools.

They simply would never reach a sustainable business model if they had to fairly compensate all the people who wrote, drew, edited, sang or just created the content they use.

Simply being forced to respect attribution and licenses would kill them. Will that ruling ever happen? Maybe not. Should it? I think so.

@thelinuxEXP I firmly believe we have to start regulating AI asap.
@benmezger Absolutely agree. I’m not denying these could be insanely useful in a lot of areas, but we can’t let these tools be built on the back of content that they might not have the rights to use.
@thelinuxEXP what worries me is that we haven’t even regulated the internet properly, so regulating AI seems far from reality 😔
@benmezger @thelinuxEXP what do you mean with "haven't regulated the internet properly"? The same rules from outside the internet apply for the internet as well and there are a lot of extra laws specific for it.
@thelinuxEXP
They would just move to other language corpuses, no?
@lepapierblanc They would either have to pay the people who make the content, or use completely copyright free / license free material, which would basically render them pretty useless.
@thelinuxEXP What if they train it on chinese language corpus? With some Chinese state license they would be harbored against copyright claims.
@lepapierblanc @thelinuxEXP The PRC doesn't own the Chinese language. Plenty of people outside mainland China use it. If there's any chance the corpus contains anything any of them wrote, it'd be the same problem again for the LLM companies.

@lepapierblanc @thelinuxEXP

There are plenty of English corpora that are either publicly available or for which the license holders would likely be happy to partner for limited use cases. This is not some doomsday scenario for ML, it's just a doomsday scenario for Big AI.

@thelinuxEXP @lepapierblanc I'm not sure I'd like that kind of world... I'm already dubious when the try to jail torrent users. To say that copying is theft is not a solution.
I prefer the: if you make profit with the copy, then you owe a percentage to the original author. But even that, will be difficult to apply in tech.
So my personal choice is: Universal Basic Income. Then if you want to pass your life creating, then do it!!
@egermond @thelinuxEXP @lepapierblanc It's not that copying is theft, it's that they haven't sought permission for reuse in a commercial setting. That's already the norm in *most* areas, whether physical or digital (I can't just wholesale copy a physical book and sell it under another name, but I can photocopy stuff to give excerpts to students).
@chiraag @thelinuxEXP @lepapierblanc
I was used to be greeted by those kind of videos when I bought a DVD.
So, yes, people do say that copying is theft!!
https://www.youtube.com/watch?v=HmZm8vNHBSU
Piracy it's a crime

YouTube
@egermond @thelinuxEXP @lepapierblanc And what I'm trying to get across is that *even if you reject that premise*, what these AI companies are doing is blatantly unethical and violates all norms of reuse.
@thelinuxEXP @lepapierblanc That's a worrying part, the fact that #FreeCulture / #PublicDomain corpora are so behind the times that they can't viably replace "fair use" corpora

@thelinuxEXP @lepapierblanc wouldn't large tech companies just amend their standard terms of service to permit machine learning on uploaded material? Like what many tech companies have already done?

I'm sure it would hinder AI development, but all of FAANG seem very interested in this tech already - they'll be hurt, but anyone attempting to create a non-commercial model or compete with these huge firms would be killed.

@lepapierblanc @thelinuxEXP

They could use nearly everything that is older than 100 years (and therefore in the public domain). I think the resulting chatbots would really be interesting.

@Life_is @lepapierblanc @thelinuxEXP it's hard enough to make them less racist *now*.

@thelinuxEXP

Big companies when they see someone using their 57 years old 2 second long sound effect: GO TO JAIL

Big companies stealing every bit of creative content from the internet without permission from the small creators: 

@mahbub « It’s different, we’re not copying the content, we’re creating something derivative so it’s ok », they say, as they refuse to acknowledge licenses

@thelinuxEXP Ironically, US copyright law punish people for making derivatives, but somehow AI companies are exempted.

"First, the derivative work has protection under the copyright of the original work. Copyright protection for the owner of the original copyright extends to derivative works. This means that the copyright owner of the original work also owns the rights to derivative works." - LegalZoom (22 MAR 2023)

@thelinuxEXP what about non American or non-Western entities though? As much as I don't like the idea of American firms scraping everything to produce products using our work without paying us, I'm even less fond of the idea of China taking over and marching ahead without competition.

@sysop408 These companies are mainly US-based, and I would argue the US is the biggest repository of works they use, so this would put a stop to most efforts.

I would also love to see rulings in other areas of the world, though. I live in the EU, and I would be very happy to see the European Commission making it illegal to use EU produced content to train AIs without licensing rights.

@thelinuxEXP I work on one website that gets served into China by way of some special proxies and the amount and kind of dodgy traffic that site receives is extremely unsettling. I'm sure its been scraped to hell and back already. One of the most bizarre things I've seen on that site is that it gets lots of distributed referral traffic from itself... but from a version of itself that hasn't existed for 4 years.

@thelinuxEXP @sysop408

China is very active in AI. They're training video generators on all the TikTok content, and they've released large language models.

@thelinuxEXP @sysop408

China has a huge incentive to develop AI in part because of it's demographics: male surpluss, and elderly surpluss.

Japan, Korea are in a similar boat

@thelinuxEXP To play the devil's advocate a bit here, but people also learn in a similar way. You have to read to learn how to write. You have to listen to music to learn how to make your own, etc.

I think there are at least 2 main differences. The first one is that a human can only produce so much work on their own, while AI can mass produce.

@thelinuxEXP The other one how derivative the work is. This is hard to tell. A lot of the work that humans produce is derivative too. It's just that we don't normally publish most of it.

With AI somebody can craft an elaborate prompt to make the AI generate very derivative work and then publish it or claim that it's bad.

I'm not on the side of big tech here, but I want to point out that the question is more complex and more nuanced than just "copying is bad".

@ivt @thelinuxEXP i think there is also the fact that, as far as i know, no one has been able to prove ai has ever been creative. it does a whole lot of remixing, but it's not creation, as it's only imitating other art. humans do that too, but they have feelings and life experiences to add onto the art they imitate -- we're not making new songs based solely on previous songs we've heard!

@yukijoou @thelinuxEXP Creativity is hard to define and even harder to measure. I don't think it's suitable to base policy on that.

AI is one of the most dynamic fields right now. Things change in months. Basing policy on its current status (e.g. how it learns, or how it works) is also pointless.

My current thinking is that we should focus on what it produces and whether that is original, rather than on how it was trained.

@ivt @thelinuxEXP People do not learn in a similar way. People do not need to listen to music to make their own. People do not need to read thousands of books to learn how to read. People do not need to remember everything that was ever said to them to learn a language. A child does not need to see a single drawing, let alone thousands, before they can learn how to draw. A person consumes a miniscule amount of energy to function compared to "AI". Nothing about the process is remotely similar.

@TapiocaPearl @thelinuxEXP I must be living in an alternative universe. AFAICT this is a big part of how people learn new skills.

I agree about the energy, but that's another topic. Also the amount of data that people need to learn something is generally smaller than what AI needs. And the process of learning is similar, but not the same. But this doesn't invalidate my points.

@ivt @thelinuxEXP 'AI' tools aren't people and don't learn, they replicate patterns. They cannot synthesize new ideas, nor could they test any ideas in the real world anyway. Memorization isn't learning.

@mayadev @thelinuxEXP I think this is a bit narrow view. Memorization is certainly part of learning even for humans. AI is doing more than just that though. If that was all it needed to do, just saving the data would be enough and computers can do that in 1 go.

For example there's this task: develop a program that can recognize hand-written digits. It's practically impossible to write a good one in the old-fashioned way with rules and stuff. With ML it's trivial. It's the "hello world" of AI.

@ivt @thelinuxEXP Where does the "it's learning like a human" come in?

@mayadev @thelinuxEXP The learning is similar not the same. Both:
- need positive and negative examples
- need external or internal feedback
- extract patterns so they can apply the knowledge to new situations.

The main differences for me are:
- AI needs many more examples
- teaching AI is expensive. to make it acceptable AI is trained "in the lab". The trained models are then used, but they don't learn new stuff while they are used.

I'm sure experts are working on both.

@mayadev @thelinuxEXP I'd consider this "learning". You may disagree of course. But please answer then what is learning?

Creativity is another interesting topic. What if a painter just "drip" paint on the canvas? Would that be creative or just random?

True, AI still doesn't learn like a human. And maybe it never will. But you can't make policy based on its current state. Things change too fast for that.

@thelinuxEXP while I'd love to see such a ruling happen, it might be detrimental in the long run. It would just give nation like China or Russia, who don't care about licence or intellectual property a massive advantage in this field.

Although, as I musician myself who has a lot of friends in art, I think generative AI must be regulated but not destroyed

@thelinuxEXP I would be very surprised if that ruling ever came.
@thelinuxEXP their CURRENT business model is unsustainable. They are all losing a lot of money
@thelinuxEXP if we're looking from legal position - yes, AI developers use some others' creations by downloading it and using in their work without attribution. It directly violates copyright norms, at least legal ones. But if we consider that modern neural nets are learning in a way like humans do (and I propose that it's getting closer and closer to this situation), it sounds strange: should we also pay to creator if we (we, as humans) learned something by consuming its' creations from open sources?
@fedorchib @thelinuxEXP that’s an argument for free schooling. to continue the analogy, not all training material is free. some is, but we’re back to some sort of licensing regime

@osi @thelinuxEXP yes. But as I understand, AI training doesn't uses materials with paid/limited access – in this case there wouldn't be such a discussion that Nick started – it takes only things that can be accessed without barriers. And I think, that if some information is just lying in the web without paywalls or other access restrictions, it can be used freely as basis to create new information.

For me, the first and main sin – let's call it so – is taking some others' materials and then claiming it, without any modification, as your creation. But if you saw some art online and then drew similar – there's nothing wrong with it.

@thelinuxEXP I don't think it would...

In fact, it could make things worse...because those with power and money would be able to acquire (and perhaps even force exclusivity agreements) with those who own the data. Unfortunately, data ownership has become significantly centralized.

They would certainly lose access to some data, but ultimately I don't think it would stop these companies, it would simply provide them another way to limit competition.

@thelinuxEXP This is trickier than you are making it out to be. When an object is used to train a network, it isn't being copied. But information regarding that object is captured in the network 'anonymously' and 'abstractly'. So, as an analogy - you definitely own your beard. But do you also have a right to a picture of your beard that I took in the wild? Or if someone wrote an article describing a beard that looks like yours... Do you also own that article?
@vartak I do own the rights to a picture of my beard that you took, yeah ;) That’s the general rule for pictures of people and buildings

@thelinuxEXP @vartak That’s definitely not the rule, Nick. If it’s in public, it’s legal to photograph and the photo belongs to whomever took it.

Barbra Streisand learned that rule the hard way.

@bouncing @vartak Nope. Try to sell a picture of the Eiffel Tower, or a painting displayed publicly, or to publish a video of people walking in the street without their consent, and see how fast you’ll have to pay damages ;)

@thelinuxEXP @bouncing @vartak In the US, you're generallyy free to take and publish pictures of anything visible in public including people and buildings, and you own the copyright to those photos. Where things get complicated is when you want to use those photos commercially. For example, you can't go around taking photos of people's faces and sell them as stock photography without their permission.

More/better info: https://jmpeltier.com/photographing-people-in-public-legal-ethical-considerations/

@thelinuxEXP @vartak Looks like that thing about the Eiffel Tower is only at night, only in France, and completely untested in court: https://www.snopes.com/fact-check/photographs-of-eiffel-tower-at-night/

See also, from @jimvernon’s link, https://en.wikipedia.org/wiki/Nussenzweig_v._DiCorcia

I know there are some stronger privacy protections in some countries in Europe, though. Eg, it isn’t always legal in the EU to photograph someone’s domicile, IIRC.

FACT CHECK: Is It Illegal to Take Photographs of the Eiffel Tower at Night?

It has all the makings of an urban legend, but this one is actually true — although it is virtually impossible to enforce.

Snopes
@thelinuxEXP No you don't, unless if it was a portrait. You are missing the point. You would have to prove that it was your beard from a scrambled set of pixels.
@thelinuxEXP The first problem here you will have in a legal sense is to prove that your work was used to train a model. There is pretty much no way to trace original individual training samples from a transformer model. So you lose right there…Even if a law existed that licenses had to be respected, it is unenforceable.
@vartak The NYT proves that pretty competently already, ChatGPT can just spit out entire parts of their articles ;)
@thelinuxEXP Nope. Almost all language has common phrases. And we are all using language the way someone else used it. That's how we learnt it. And this is much much more difficult with image generators like Stable Diffusion.

@thelinuxEXP
I feel like it's getting too late at this point. Many companies have started adding weird clauses where if you post anything on their website, they own all intellectual property to that content.

So while it be a bit more expensive, the AI companies will still get your data to by licensing companies for the data (this will still be cheaper than fair compensation). Of course the added expense will simply be passed on to the consumer and all blame will be placed on the regulations.

@thelinuxEXP model creation must be explicitly, knowingly opt-in
AI has destroyed the symbiotic relationship that existed between content creators and search engines, there's no retribution loop anymore. The current state of AGI is of parasitism. Without incentives for creating new content, who is going to create new content in the future? The retribution loop needs to be restored somehow.
@thelinuxEXP Creators of information or content could license the end user using the AI as a tool (like a can opener) to open the information or content and unlock its potential for private use only. Example: the end user would be responsible for buying the content license for a book and for buying a license for using the AI. The. The user could instruct the AI to read the book and provide a summary for the sole and exclusive use of the user. All interaction cataloged/tracked by blockchain.

@thelinuxEXP

I am in complete agreement with this