Mastodawn

mesa Feb 7, 2025

FediDB has stoped crawling until they get robots.txt support

FediDB has stoped crawling until they get robots.txt support - Lemmy.World

We have paused all crawling as of Feb 6th, 2025 until we implement robots.txt support. Stats will not update during this period.

Show thread

hendrik Feb 7, 2025

Did someone complain? Or why stop?

Show thread

mesa Feb 7, 2025

No idea honestly. If anyone knows, let us know! I dont think its necessarily a bad thing, If their crawler was being too aggressive, then it can accidentally DDOS smaller servers. Im hoping that is what they are doing and respecting the robot.txt that some sites have.

Show thread

hendrik Feb 7, 2025

I think it's just one HTTP request to the nodeinfo API endpoint once a day or so. Can't really be an issue regarding load on the instances.

Show thread

jmcs Feb 7, 2025

It’s not about the impact it’s about consent.

Show thread

hendrik Feb 7, 2025

True. Question here is, if you run a federated service... Is that enough to assume you consent to federation?

Show thread

WhoLooksHere Feb 7, 2025

Why invent implied consent when complicit consent has been the standard in robots.txt for ages now?

Show thread

hendrik Feb 7, 2025

I guess because it's in the specification? Or absent from it? But I'm not sure. Reading the ActivityPub specification is complicated, because you also need to read ActivityStreams and lots of other referen es. And I frequently miss stuff that is somehow in there.

But generally we aren't Reddit where someone just says, no we prohibit third party use and everyone needs to use our app by our standards. The whole point of the Fediverse and ActivityPub is to interconnect. And to connect people across platforms. And it doen't even make lots of assumptions. The developers aren't forced to implement a Facebook clone. Or do something like Mastodon or GoToSocial does. They're relatively free to come up with new ideas and adopt things to their liking and use-cases. That's what makes us great and diverse.

I -personally- see a public API endpoint as an invitation to use it. And that's kind of opposed to the consent thing.

But with that said... We need some consensus in some areas. There are use cases where things arent obvious from the start. I'm just sad that everyone is ao agitated and seems to just escalate. I'm not sure if they tried talking to each other nicely. I suppose it's not a big deal to just implement the robots.txt and everyone can be happy. Without it needing some drama to get there.

Show thread

WhoLooksHere Feb 7, 2025

Robots.txt started I’m 1994.

It’s been a consensus for decades.

Why throw it out and replace it with imied consent to scrape?

That’s why I said legally there’s nothing they can do. If people want to scrape it they can and will.

This is strictly about consent. Just because you can doesn’t mean you should yes?

Show thread

Rimu Feb 7, 2025

It's been a consensus for decades

Let's see about that.

Wikipedia lists http://www.robotstxt.org as the official homepage of robots.txt and the "Robots Exclusion Protocol". In the FAQ at http://www.robotstxt.org/faq.html the first entry is "What is a WWW robot?" http://www.robotstxt.org/faq/what.html. It says:

A robot is a program that automatically traverses the Web's hypertext structure by retrieving a document, and recursively retrieving all documents that are referenced.

That's not FediDB. That's not even nodeinfo.

robots.txt - Wikipedia

Show thread

WhoLooksHere

From your own wiki link

robots.txt is the filename used for implementing the Robots Exclusion Protocol, a standard used by websites to indicate to visiting web crawlers and other web robots which portions of the website they are allowed to visit.

How is f3didn not an “other web robot”?

Show thread

Rimu Feb 8, 2025

Ok if you want to focus on that single phrase and ignore the whole rest of the page which documents decades of stuff to do with search engines and not a single mention of api endpoints, that's fine. You can have the win on this, here's a gold star.

Show thread

WhoLooksHere Feb 8, 2025

Okay,

So why should reinevent a standard when one that serves functionally the same purpose with one of implied consent?