Pamela Samuelson has a nice write-up of my paper with my Stanford CS colleagues on copyright issues in generative AI outputs in JOTWELL

Write-up here:

https://ip.jotwell.com/generative-ai-meets-copyright/?_gl=1*pesu0f*_ga*NjgzMzYwODI2LjE2ODkwOTExNDY.*_ga_BXXRV43J3Z*MTY4OTA5MTE0Ni4xLjEuMTY4OTA5MTE0Ni4wLjAuMA..

Underlying paper here:
https://arxiv.org/abs/2303.15715

Generative AI Meets Copyright - Intellectual Property

Peter Henderson, Xuechen Li, Dan Jurafsky, Tatsunori Hashimoto, Mark A. Lemley & Percy Liang, Foundation Models and Fair Use, available at SSRN (Mar. 27, 2023).Pamela SamuelsonChatGPT, Midjourney, and Copilot are among the numerous generative AI systems launched in the last year or so. They have attracted a huge number of users as well as several lawsuits. Among the lawsuits’ claims are that the makers of these systems are direct and indirect infringers of copyright because of their use of [...]

Intellectual Property
@marklemley The idea that the only way to avoid being scraped is to avoid being linked (by editing robots.txt) sounds bad to me.

@DanaBlankenhorn @marklemley robots.txt doesn't affect linking -- it only affects scraping.

The main downside is that web search engines rely on scraping to build their search index, so if you lock out all scraping then you also remove yourself from Google/Bing/etc.

Or you can allow in only, say, Google, but not any other scrapers... but then if you want to appear in Google's search results, you might have to also let them train on your data 😕