“forcing OpenAI to identify its use of copyrighted data would expose the company to potential lawsuits. Generative AI systems are trained using large amounts of data scraped from the web, much of it copyright protected… [disclosing sources leaves us] open to legal challenges.”

Looking forward to new defences in court saying if they’re forced to explain exactly where they got this car boot stall full of nappies and DVDs from, they’ll be subject to “legal challenges”

https://www.theverge.com/2023/5/25/23737116/openai-ai-regulation-eu-ai-act-cease-operating

OpenAI says it could ‘cease operating’ in the EU if it can’t comply with future regulation

OpenAI CEO Sam Altman has warned that the company could pull its services from the EU if it finds upcoming regulations too onerous. The EU AI Act is currently being finalized by lawmakers and should become law next year.

The Verge
Like, how is this OK? How is it OK to say "yeah, we did crimes, and if you force us to say what we did, we'll have to say we did crimes", and then nothing else happens? This is like politicians saying "yeah, I did coke, what are you going to do about it" while all around them people are going to prison for the same thing that famous people and big companies can just laugh off.
@sil It’s because this is new ground. You are assuming a crime was committed. This would be civil not criminal. They did not copy whole books in a counterfeit manner (at least as traditionally viewed). Instead, they trained (at most) a derivative from it. If you read a book and write a book review citing the plot or specific passages did you commit some kind of crime or plagiarism? Stuff like that is why this is tough. It hasn’t really been this kind of issue before. I’m going to armchair lawyer guess it depends where the line “derivative work” is.