After spending the last few years working really hard to beef up longer format histories of abandoned locations on my website I am seriously considering pulling all but, say, three paragraphs of each and putting the rest on Patreon for subs because of how disgusted I am that the work I've done will be used in Google's AI training and search results without my consent. It is a very frustrating situation to be placed in.

To be clear, I would really rather not have to do the extra work and paywalling content is something that runs against what I've wanted to do with my work from the beginning, which is make info accessible to whoever wants to view it.

But I also really abhor the idea of my writing being absorbed and plagiarized by search summaries that encourage bypassing my site entirely and I don't know of another way to prevent it.

I would bet a not insignificant amount of money that a good number of other small websites that are looking at the same predicament are mulling what their options are to prevent data scraping of decades of their work and deciding whether paywalling is the only way to prevent it. This will absolutely devastate the internet as we know it. I'm well aware of the social media debacles here but the issue stretches far beyond corporate behemoths trying to monetize APIs
Anyway, if you're looking for a time sink there are tons of photos and site histories on the Abandoned America website that you can check out right now, get 'em while they're hot
https://www.abandonedamerica.us/
Abandoned America

Matthew Christopher's Abandoned America: a hauntingly beautiful urban exploration chronicle of the abandoned buildings in our midst and their fascinating histories.

@AbandonedAmerica depending on your setup and skill, it might be possible to implement a simple free user system, so the content is still available but sits behind a login. That will stop automated scrapers. A small hassle for users who have to register, but keeps in the spirit of what you want. A wordpress site could do this easily. (There's also no problem if you decide you DO want to charge people!)
@philbetts @AbandonedAmerica It might be even simpler than they... A CAPTCHA in front of the full content plus terms of use that say the content can't be used in AI models is probably sufficient. No need for individual logins I don't think.
@jik @AbandonedAmerica CAPTCHA is owned by Google, so I don't imagine it interferes with their indexing. Terms of Use won't make a difference - almost all big AI models are trained on copyrighted material anyway - there are a few cases before the courts, but lots of the damage is already done.
@philbetts @AbandonedAmerica 1) Google does not index pages blocked by CAPTCHAs.
2) CAPTCHA is not "owned" by Google. It's a generic industry term, and there are CAPTCHA implementations from many sources other than Google. 1/2
@philbetts @AbandonedAmerica
3) Google and the other LLM vendors are basically saying, "We scrape everything that lets us." CAPTCHA plus ToS is clear, explicit indication that you don't let them. If sites do that and they ignore it they'll have a big honking class action lawsuit or GDPR enforcement in their hands. Not to mention potentially federal criminal charges for unauthorized access. They don't want that. 2/2
@jik @AbandonedAmerica yeah fair, I was thinking reCAPTCHA. ToS is absolutely meaningless though. Maybe robots.txt would work for Google, but the class actions are already happening. Was listening to a podcast about the GitHub Copilot suit today. https://www.gizmodo.com.au/2023/07/a-new-class-action-lawsuit-adds-to-openais-growing-legal-troubles/
A New Class Action Lawsuit Adds to OpenAI's Growing Legal Troubles

A new class action lawsuit accuses ChatGPT creator OpenAI of criminally scraping data from all over the internet, then using...

Gizmodo Australia
@philbetts ToS is absolutely not meaningless. Every class action lawyer and privacy regulator in the country would salivate over a large pool of websites with anti-AI ToS plus CAPTCHA or even just robots.txt that were scraped for training despite them. It makes the case overwhelmingly stronger. And since Congress is mostly broken, lawsuits are probably the only thing that is going to do any good about this in the US.

@doot

Are abandoned buildings as cute as clams? :-?

@AbandonedAmerica
Very cool website. I'm having fun digging into it. Reminds me of how when John Hillcoat was in pre-production for The Road, they didn't build sets because there are a lot of abandoned places in America to mine.