Someone is building a "global fediverse post indexer" that:

* scrapes the public APIs so it can't be blocked via defederation
* uses a bunch of dynamic IPs so it can't be banned at network level (hilariously, the author redacted this part and forgot that the edit history can be viewed by anyone)
* can be blocked by server admins via robots.txt, but they're planning to publish which instances are opting out (right now this is "open for debate")
* can be blocked by users by disabling indexing in the profile settings (!) or adding a specific hashtag to their bio (!!)

There's ZERO mention of opt-in, a lot of pushback against anyone who dares calling this thing a scraper ("we're using public APIs, so we're not a scraper") and the inevitable "we got complaints only from people who have something to hide".

With this attitude, I wonder how they're going to respond to the first GDPR compliant they're inevitably going to receive, it'll be fun 🍿

@rfc1459 I'd say we should just start posting long original content (AO3 style fan fictions) on here and then once they're in the index, sue them for copyright infringement...

@rfc1459 There's no easy answer on that one. It's a "it depends". Depends if it is a company or an individual. Depends on where they are based. Depends where the scripts are running from (jurisdiction is a swamp).

Most likely, they will get away with it for a very long time. Enough to fill Tb of data, index it and make money out of it

@rfc1459 oh no  can we have a link to the discussion or something similar related to this?
Matt Cloy (@[email protected])

So I made a "pending review" decision on the fediverse full-text search engine we wrote - uses the public API, which means it can't be defederated, and it [redacted], filling out robots.txt is the solution for hosts, or as a user set your profile to do-not-index on mastodon and/or add #noindex to your bio). Available under login ONLY to *verified* instance moderators (and only searching federated instances of that mod). I.E. if the server is defederated from your instance, their mods can't search the commons for anything. *Constructive* feedback on this welcome (including thoughts on adding watch-phrases for flagging abuse patterns for review, making robots.txt-banned instances public, or anything else that improves moderation), please let me know NOW not later. Would rather a discussion before the cat is out of the bag than afterwards. #fediblock (because I know that hashtag will get me feedback) #flameproofpantstime

TechHub
@rfc1459 It will not be possible to prevent someone from offering a full text search on the fediverse.
@rfc1459 Who/what/where is it, and what's the hashtag to block it?