Hmm I probably have the most ridiculous #robotstxt for a #Misskey instance right now lol. I just want to let #Mojeek and #Marginalia crawl #Makai and make sure to keep out #Google and the AI scrapers...
If there are other user-agents of independent #searchengines I should allow in https://makai.chaotic.ninja/robots.txt, please let me know! I'm actually searching #SauceNAO, #TinEye, and #IQDB's #useragent so I can let them fetch our media for their reverse image search.
User-Agent: MojeekBot
User-Agent: FeedFetcher-Mojeek
User-Agent: search.marginalia.nu
Allow: /
Allow: /notes
Disallow: /admin
Disallow: /settings
Disallow: /my/
User-Agent: *
User-Agent: Googlebot
User-Agent: Google-Extended
User-Agent: GoogleOther
User-Agent: AdsBot-Google
User-Agent: AdsBot-Google-Mobile
User-Agent: Mediapartners-Google
User-Agent: CCBot
User-Agent: ChatGPT-User
User-Agent: GPTBot
User-Agent: Omgilibot
User-Agent: omgili
User-Agent: FacebookBot
User-agent: Twitterbot
User-Agent: cohere-ai
User-Agent: anthropic-ai
User-Agent: Bytespider
User-Agent: Amazonbot
User-Agent: Applebot
User-Agent: PerplexityBot
User-Agent: YouBot
User-Agent: AwarioRssBot
User-Agent: AwarioSmartBot
User-Agent: ClaudeBot
User-Agent: Claude-Web
User-Agent: DataForSeoBot
User-Agent: FriendlyCrawler
User-Agent: ImagesiftBot
User-Agent: magpie-crawler
User-Agent: Meltwater
User-Agent: peer39_crawler
User-Agent: PiplBot
User-Agent: Seekr
Disallow: /
# todo: sitemap#sysadmin #fediadmin
Mima-sama