Hello :)
At https://mastodon.help we have a crawler that, once a week, crawls Mastodon instances in search for new ones, and once a day updates info about those which have already been discovered, which are searchable at https://mastodon.help/instances. I'm thinking of making the crawler optionally create and maintain a searchable "global directory of public mastodon accounts", which would be updated once a day too, by retrieving accounts info *only* for accounts which are published in the opt-in public directories of Mastodon instances, and *only* for those users who have not activated "opt-out of search engine indexing" ("preferences" -> "other" -> "opt-out of search engine indexing"). Since each instance's public directory is opt-in and in any case, even when they choose to opt-in to it, users can opt-out of search engine indexing, i'm not sure whether the crawler should also exclude those accounts which have the #nobot or #nobots tags in their bio, which seem to me to refer to bots, particularly follow-bots, and not to search engines. What do you think about it?
Mastodon Help - Guide

A thorough introduction to Mastodon

#nobot and #noindex should definitely be respected by bots. There's a Fediverse beyond mastodon after all.
Stereophonic

@Maholmire But the crawler only crawls Mastodon instances, and Mastodon gives users a specific setting to opt-out of indexing, and while #noindex is unambiguous and i agree it should be honored in any case, #nobot is not as much, in such a situation: there are Mastodon users who opt-in to the public directory, don't opt-out of search engine indexing, and have the #nobot tag in their bio: maybe some or most of them are ok with search engine indexing, and not with other bots, and if i make the crawler honor #nobot as it was #noindex, there would be no way left for users to say "i'm ok with search engine indexing, not with other bots". I'm doubtful :)
Stereophonic

@Maholmire in the end, i think i'll make it honor #nobot as it was #noindex, and to still give users the possibility to say "i'm ok with search engine indexing, not with other bots", i'll make it honor a #yesindex in any case :))
Stereophonic

Another thing to consider is that there are instances, Stereophonic for example which provides multiple platforms to select from. How can you be sure I'm not using Pleroma, Soapbox, or Bloat over Mastodon?

Unless you stick to indexing instances that only have Mastodon as their single interface there is no knowing whether you are mistakingly indexing a Plemora or Mastodon user. I feel it is for this reason that #nobot is also respected to air on the side of caution when it comes to privacy. Something that I feel most people on the Fediverse do care about.
Stereophonic

@Maholmire well to my knowledge those you cited are only front-ends (Soapbox front-end, Mastodon front-end and Bloat front-end) and Stereophonic is definitely Pleroma, anyway as i wrote before i'll make the crawler honor #nobot as it was #noindex, and still give users the possibility to say "i'm ok with search engine indexing, not with other bots", making it honor a possible #yesindex tag in any case :)
I also think Pleroma is better than Mastodon when it comes to this topic: it has an "Allow discovery of this account in search results and other services" setting which is opt-in, and is not active by default.
Stereophonic

I was referring to them previously as front-ends in laymen's terms. But yeah, that would probably be the smartest course of action. Though you'll need to figure out a way to let people interact with the bot so that it can differentiate those who have consented to indexing from that bot specifically from others who use #nobot as a generic catch-all.
Stereophonic

@Maholmire ok, so to summarize... #nobot will be treated as a generic catch-all, preventing the crawler from indexing the profile and making it delete it if it was previously indexed (people change their minds); #yesindex will be treated as a generic "i'm ok with any indexing"; and there will be a very special #noindexbutmasthelp that will be treated accordingly :))
Stereophonic

@Maholmire ...the sequence will also matter: if in the same bio a #noindex followed a #yesindex, #noindex would occur.
Stereophonic

#noindex would take precedence over #nobot and likewise for #yesindex.
Stereophonic