“forcing OpenAI to identify its use of copyrighted data would expose the company to potential lawsuits. Generative AI systems are trained using large amounts of data scraped from the web, much of it copyright protected… [disclosing sources leaves us] open to legal challenges.”

Looking forward to new defences in court saying if they’re forced to explain exactly where they got this car boot stall full of nappies and DVDs from, they’ll be subject to “legal challenges”

https://www.theverge.com/2023/5/25/23737116/openai-ai-regulation-eu-ai-act-cease-operating

OpenAI says it could ‘cease operating’ in the EU if it can’t comply with future regulation

OpenAI CEO Sam Altman has warned that the company could pull its services from the EU if it finds upcoming regulations too onerous. The EU AI Act is currently being finalized by lawmakers and should become law next year.

The Verge
Like, how is this OK? How is it OK to say "yeah, we did crimes, and if you force us to say what we did, we'll have to say we did crimes", and then nothing else happens? This is like politicians saying "yeah, I did coke, what are you going to do about it" while all around them people are going to prison for the same thing that famous people and big companies can just laugh off.

@sil

Keep in mind that depending on jurisdiction copyright does not mean necessarily that it’s illegal to use the content. It’s more about effectively claiming ownership of somebody else’s content, but you’re still free to use it.

So it’s not that they did crimes.

@volkris sure. And if they thought that, they'd be arguing for "we're allowed to do this anyway". But "if you make us say what we took, we'll bail on your whole continent rather than do so" may not necessarily guarantee that they know what they did will be viewed to be wrong, but it strongly implies it.

@sil

From what I read in the article it doesn’t sound like violation of any copyright is their issue but rather their not wanting to reveal trade secrets, the training that produced their AI systems.

They don’t care that we know they used copyrighted material. Obviously they did. There’s no issue with that. But they want to protect the AI model they invested in against anyone else who might follow their exact footsteps to make their own competitor.

@sil It shows that they had literally no plan, going into this. They just hoped that either nobody would notice or that they would make so much money that they could buy off the relevant lawmakers.

Granted, it's not as bad as Uber's "we're going to operate here, because we don't really care what the law says" approach.

@sil "don't impinge on our free way of making money"

If these systems made an effort to only use proper sources they'd be fine, but that clearly would be far more expensive.

@sil It’s because this is new ground. You are assuming a crime was committed. This would be civil not criminal. They did not copy whole books in a counterfeit manner (at least as traditionally viewed). Instead, they trained (at most) a derivative from it. If you read a book and write a book review citing the plot or specific passages did you commit some kind of crime or plagiarism? Stuff like that is why this is tough. It hasn’t really been this kind of issue before. I’m going to armchair lawyer guess it depends where the line “derivative work” is.
There's nothing "open" about OpenAI. They are a for-profit, closed-source software using corporation trying to make a buck off the hard work of others for free. Fuck them and their neutered shitty LLM AI chat bot.

@sil this company is like the master of framing the narrative.

GPT-9 may end the world in the next 5 years, but they're not even working on GPT-5.

They're totally open to regulation, but only the ones they approve of.

They're totally working for "humanity's" best interest, but also you are never allowed to audit or question what they are doing ever or it'll ruin all the great work they say they are doing

@sil If they are banned in Europe, someone will come along and do everything right, training only on data that they are legally entitled to use, more carefully curated so it isn't just fed any garbage that can be scraped. This new model will still probably make shit up, but it will address the permission problem.
@sil yeah. Good. How can you run a legitimate business outside of any law or regulation?
@sil Exactly. “We stole it but we built this thing with it and we want to keep this thing as our thing even though it’s made from everyone else’s things.”

@sil @lisamelton “If you force me to admit I’m committing crimes, it might have consequences” is not the iron clad defense they seem to think it is.

Yes, that’s the point.

@sil 500 million of the world's richest consumers. Sounds like an empty threat to me.