This person is building a tool (zeitgeist dot blue) that scrapes your ActivityPub feed to generate daily LLM-driven summaries. https://alpaca.gold/@seldo/116286295716964968

@drahardja yikes.

I've been a little concerned about someone doing this for a while. It's part of the reason my default post is quiet public, and I use block/unblock to remove any follows I am not 100% is a person.

People like this is why we can't jave nice things and the fact he doesn't seem to understand why people would be upset seals the deal.

@maleve “Consent” seems to be a challenging concept to AI-pilled people.
@drahardja yeah, I might argue it's the other way around, that ethically challenged people are attracted to do shitty things with AI but either way it's bad very very very bad
@drahardja Wondering if the @moderators here have a policy about this (I couldn't find anything in About, Privacy, or Community Hub).
@haljor @moderators I would support discovering the IP addresses and/or user-agent of the client they use to scrape feeds, and blocking them.
@drahardja @haljor @moderators We don’t have a policy on this, but this particular tool will respect standard Mastdon opt-out tags in bios, like #noindex/#nobots/#noai and also the “don’t index” flag on user profiles. This seems pretty reasonable to me (not speaking for SFBA on that last part).

@neuralgraffiti @haljor @moderators I’m not sure many people know about those tags; I certainly didn’t.

IMO the “don’t index” preference is orthogonal, or at least a superset, of opting out of LLM summarizing. I don’t mind my posts being indexed/searchable, but I don’t want them to be fed into the LLM meat grinder.

@drahardja @neuralgraffiti @moderators I didn't know about these tags either, and I'm not finding documentation on how to use them. I also don't see "don't index" specifically in Preferences, unless that's implied by one of the checkboxes.

Is there more information on these tags somewhere?

@haljor It’s this setting.

@haljor @drahardja @moderators The tags are a convention that’s been around for a while, but isn’t official.

The indexing profile option is described in the docs here: https://docs.joinmastodon.org/user/preferences/#misc

Set your preferences - Mastodon documentation

Customize things just the way you like them.

@neuralgraffiti @haljor @moderators I think they should be opt-in, like search. Instead of noai, it should be yesai.

Consent should be opt-in.

@drahardja @haljor @moderators That’s a fair point. I’m not sure we want to be in the position of committing to find and block all tools that come around from an operational perspective, but we are always open to community feedback, and obviously AI tools are a rapidly changing landscape. We’ve had some internal debates on how to handle them.

Frankly, and not to derail, the various social media laws being passed around the US are of much greater concern to me.

@drahardja Note: It only looks at an account's own timeline — so if an account doesn't follow you (or someone who boosts your posts), it won't show up if the account owner uses this tool.

I appreciate that Laurie added support for ways to have the tool blocked for your account (certain hashtags in your bio, the "don't index" flag) once this option was pointed out to him. I'm adding the noai hashtag to my bio!

And as much as I personally hate GenAI, I've long accepted that since my posts are public, and you don't need a Mastodon login to read them, they could be scraped by anyone without me ever knowing it. I don't like that, but it's reality.

@jeridansky I hear you, but I also try to avoid premature surrender. Just because it’s impossible to entirely prevent public scraping doesn’t mean we shouldn’t establish norms for acceptable behavior. For example, LLM scrapers routinely ignore robots.txt; that doesn’t mean we shouldn’t continue to insist that people respect it.

I think it’s good that we set clear boundaries and expectations, and insist on asking consent; then call out people who violate these norms, even when we can’t stop every bad actor.