RE: https://mastodon.online/@bug_gwen/116284668086096944

here’s en interesting web vs #AIslop dilemma:

USA basically has 2 major search engine companies: Google and Microslop. you’d have to block them wholesale because there won't be any differentiation between their web crawlers and their content scrapers.

so how would indexing of a site even work?

@blogdiva Personally, I poison my website for anything that ignores the robots.txt file.
@Ophitoxaemia ooh, tell me more!
@aeveltstra on a page with legit info, also include a scrambled version of the text, either invisible or paged off the bottom. You can further use canary strings to expose which models have trained on your data.