I've seen some people argue that the NYT's new lawsuit against OpenAI/Microsoft is the strongest such lawsuit yet, but... I don't think so. I also don't think the NY Times would actually like the world if it wins, because the NYT *itself* does what it is accusing OpenAI of doing.
https://www.techdirt.com/2023/12/28/the-ny-times-lawsuit-against-openai-would-open-up-the-ny-times-to-all-sorts-of-lawsuits-should-it-win/
The NY Times Lawsuit Against OpenAI Would Open Up The NY Times To All Sorts Of Lawsuits Should It Win

This week the NY Times somehow broke the story of… well, the NY Times suing OpenAI and Microsoft. I wonder who tipped them off. Anyhoo, the lawsuit in many ways is similar to some of the over a doz…

Techdirt
@mmasnick
The NYT isn't worth rooting for the simple fact they've been constantly pushing transphobic articles for ages and doubled down on it when they were called out. Seriously, fuck em.

@mmasnick Summarizing someone else's published article is not a copyright violation. Facts are not protected by copyright. Downloading the article and storing a copy on your own system (beyond incidental caching), however, often is a copyright violation.

And "this is how generative AI works" is not a good defence against copyright violation when the output is a copy of a copyrighted work - even if and when the NYT set a trap by using specific prompts.

@mmasnick thanks for sharing. I don’t know enough about copyright law to comment on merits of either argument but for me feels like a substantial difference in scale when a bot can crawl millions of pages of content in a few minutes versus a human reading at regular speed. And most importantly it’s not just about reading to provide summaries but “reading” so now this model “knows” language grammar and conceptual relationships between words

@mmasnick I bet NYT is hiding that every-time they restarted their GPT session and ask the exact same prompt, they likely got slightly different answers...

OpenAI has the log of the chat sessions and can show how many prompts NYT tried until they got the answer they wanted.

@mmasnick Reading/ingesting/training the models doesn't violate copyright, but outputting the news without recompense or attribution is unfair, and copyright law ought to catch up with that? This seems similar to Google snippets just giving the answer and removing all context (at least google had robots.txt from the beginning).
I dunno if its copyright infringement under current law, but it sure as heck is unfair in the specific case of LLMs being used for information retrieval/search.

@mmasnick

The lawsuit might be on much stronger grounds regarding AI 'hallucinations' that misattribute quotes or info to the NYT - especially when these are false or even dangerous. EG. "...provided incorrect information that was said to have come from The Times, including results for “the 15 most heart-healthy foods,” 12 of which were not mentioned in an article by the paper." https://www.nytimes.com/2023/12/27/business/media/new-york-times-open-ai-microsoft-lawsuit.html

New York Times Sues OpenAI and Microsoft Over Use of Copyrighted Work

Millions of articles from The New York Times were used to train chatbots that now compete with it, the lawsuit said.

The New York Times

@mmasnick So the NY Times is claiming that their data is stored retrievably in its entirety. They've demonstrated that their data is stored retrievably in its entirety. And you have described that, but with a condescending framing that this is the INEVITABLE result of the language modeling objective, which they would agree with.

Where does the NY Times reprint nearly the exact text for paragraphs on end from other outlets? How does this open them up to lawsuits?