Among the many things Doctorow gets wrong in That Post is this:

"It's not 'unethical' to scrape the web in order to create and analyze data-sets. That's just 'a search engine.'"

Apart from the fact that AI companies are particularly malicious in the way they scrape the web, I'd say we accept search engine scraping mostly on the premise that it's done for the benefit of the scraped sites. There's no such principle of mutual benefit in AI scraping — the AI company gets the value of the data scraped and you get bupkis at best, and possibly DDoS'd

@lrhodes plus the idea of people being able to establish consent and boundaries for what should be scrape-able and what shouldn't on their web sites has been around since basically the beginning of the web (robots.txt). the only ethical reasons i can think of for ignoring robots.txt would be things like holding corporations and governments to account. just "creating and analyzing data sets" on its own isn't enough justification