2 authors say OpenAI 'ingested' their books to train ChatGPT. Now they're suing, and a 'wave' of similar court cases may follow.

https://lemmy.world/post/1246165

2 authors say OpenAI 'ingested' their books to train ChatGPT. Now they're suing, and a 'wave' of similar court cases may follow. - Lemmy.world

Two authors sued OpenAI, accusing the company of violating copyright law. They say OpenAI used their work to train ChatGPT without their consent.

If I read a book to inform myself, put my notes in a database, and then write articles, it is called "research". If I write a computer program to read a book to put the notes in my database, it is called "copyright infringement". Is the problem that there just isn't a meatware component? Or is it that the OpenAI computer isn't going a good enough job of following the "three references" rule to avoid plagiarism?

The fear is that the books are in one way or another encoded into the machine learning model, and that the model can somehow retrieve excerpts of these books.

Part of the training process of the model is to learn how to plagiarize the text word for word. The training input is basically “guess the next word of this excerpt”. This is quite different compared to how humans do research.

To what extent the books are encoded in the model is difficult to know. OpenAI isn’t exactly open about their models. Can you make ChatGPT print out entire excerpts of a book?

It’s quite a legal gray zone. I think it’s good that this is tried in court, but I’m afraid the court might have too little technical competence to make a ruling.