🧵 So to recap something that happened yesterday, as I see it:

Yesterday Mastodon gGmbH sent out an email to mastodon.social and .online users announcing a new TOS. The TOS would take effect July 1, and would ship in the next version of the Mastodon server software as a suggested "template" TOS instances could adopt.

This TOS had multiple problems, and several people said so loudly. This morning, Mastodon announced they're backing off and taking additional legal advice:

https://mastodon.social/@Mastodon/114709820512537821

The problems I saw raised were:

Raised by me: The new intellectual property clause was permanent/irrevokable; Mastodon retained rights to your posts even after they were deleted.
https://github.com/mastodon/mastodon/issues/35086

Raised by Cory Doctorow: The new TOS forced users to give up the right to sue in court, & forced use of arbitration instead. 😬
https://mamot.fr/@pluralistic/114706885462760813

Raised by Sarah Jamie Lewis and others: It was ambiguous if *other federated servers* counted as "users" under the TOS.
https://mastodon.social/@sarahjamielewis/114699476927561899

New Terms of Service IP clause cannot be terminated or revoked, not even by deleting content · Issue #35086 · mastodon/mastodon

Summary Since it first opened, mastodon.social has operated without any sort of explicit IP grant from the users to the service, which is unusual for a social networking service. Today Mastodon ann...

GitHub
When they announced this, Mastodon didn't seem to realize their new terms would be controversial. They also seemed to believe there had been adequate community review time because a version of the TOS had been posted in their git repo for a year. I raise my eyebrow at both these things (didn't think arbitration would be controversial, really?)… but Mastodon *is* a nonprofit that was working with a pro-bono lawyer, and their rapid backoff once the community started engaging shows good faith.

This is not over. Mastodon is, I hope? getting better legal advice now— I know Doctorow reached out offering to get the EFF to help— and that will probably by itself fix the ambiguity about who is a "user", and the ambiguity some people complained about whether German or US law was controlling.

However, the new TOS is still coming, and TOSes by nature balance the admins' and the user's interests, so that balance will need to be negotiated. Arbitration, for example, very well might come back.

The big problem, as I see it, is that Mastodon is now opening the pandora's box of making the content licensing situation of the Fediverse explicit, which in a federated environment will be *very* hard to do without crushing someone's toes, somewhere. (Personally, I really *liked* that all this time the licensing situation of Mastodon has been implicit; I was happy just letting laches and deletion notices do all the work. But I understand if Mastodon gGmbH doesn't feel they can do that forever.)
What we need, for this to work out better next time, is actual community engagement in the TOS process. (This is especially the case if Mastodon gGmbH intends the TOS to be used not just by them but by all the downstream users of their software!) We need more than two weeks notice (or rather, notices posted somewhere the user community will see it). Again, I think Mastodon gGmbH is showing good faith in how they're engaging with the community now, so we'll see what happens.
Postscript: Cripes, the 500 character limit really starts to chafe for this kind of writing. Maybe I'll stand up that GtS server after all lol
@mcc I wish there was a platform that didn't think that implementing a feature to randomise nodeinfo usage stats was a good idea.
@markrprior What does this mean?
@mcc if you use the nodeinfo api to query a gotosocial instance for information you might get random users and posts counts because there is a configuration option called "fiddle" that rather than return valid numbers, 0 or null will generate some garbage. It's claimed this is a reaction to bots that don't obey their robots file (which defaults to disallow) but they don't check that the user hasn't changed the robots file.

@mcc for example
look at https://gunbark.dev/nodeinfo/2.1

'usage' => {
'users' => {
'total' => 690105,
'activeMonth' => 220493
},
'localPosts' => 1591429
}

@markrprior @mcc more specifically, it was in reaction to a specific database that, for ideological reasons, openly sought to gather aggregate user data across the fediverse, including from sites that attempted to prevent such use
@markrprior @mcc note that the fiddle option is fully under the control of the instance admin
@markrprior @mcc the guy running the database responded by being, essentially, fine then we just won't count GTS data, but he tried pretty hard to make that sound as if it was punishing GTS in some way instead of exactly what everyone asked him to do in the first place
@ireneista @mcc really depends on whether you want the instance to be discoverable or not which I assume was their original motivation.
@ireneista @mcc except it is up to the instance admin to know and ensure that their robots.txt file is consistent with their configuration as the software won't check. The instance I referenced (gunbark.dev) is an example where they are inconsistent as I too operate an account that counts instances and users in the Fediverse. In this instance's case I report it but I set their user count to 1 as the data is clearly bogus. In contrast wildebeest returns null for data they consider private.
@markrprior @mcc we're not sure which point of ours that's intended as a rebuttal of?
@ireneista @mcc that the fiddle option is safe if you respect the robots.txt file.
@markrprior @mcc oh, we weren't trying to say it's safe. your point seems fair.
@markrprior @mcc we're kind of against measuring people without affirmative, individual consent, anyway, just as a general science ethics thing
@ireneista @mcc well assuming it's a single user instance then wildebeest's approach works. If the instance is open to more than one user then it is tricky but still comes down to the instance admin rather than the API user as to whether they make available a count or not. People are interested in these counts so a realistic under estimate is better than something completely without foundation, which is why I started counting as the existing one seemed wildly optimistic.

@markrprior @ireneista @mcc you're very wrong, by default GTS manages the robots.txt file, see https://gts.tbh.gay/robots.txt for example

Anyways, how is this at all relevant other than you just wanting to complain that a software has options to hurt scrapers

@Ember @ireneista @mcc "by default" is key. It assumes that it exists and does what it expects but it doesn't check leading to the gunbark.dev case.
@markrprior and? Have you considered just... Not scraping information?
@Ember and what about those instances and users that are happy to make basic statistics available?

@markrprior @Ember @ireneista @mcc so fucking weird that the instance I'm on is apparently your use case for "they're fediversing wrong"

especially in this day and age, I'm pretty comfortable using and preferring software that doesn't accurately reply usage stats to strangers

@cold @Ember @ireneista @mcc I'm not saying you are Fediversing wrong. I am saying that it was explained to me that the fiddle configuration was safe because if a bot respected the robots.txt file then it wouldn't query a server for statistics if fiddle was enabled. Your configuration, whether intentional or not, illustrates that that is not the case. I had suggested that checking the robots.txt file before enabling fiddle would be a good idea but that was too hard.
@markrprior @cold @Ember @mcc we want to reiterate that we have at no point made any claims about anything being "safe", nor do we see "safe" as an important distinction in this situation.
@markrprior @cold @Ember @mcc we also don't know who said that to you about the scenario where a bot respects robots.txt, but it wasn't anyone in this thread

@ireneista @cold @Ember @mcc maybe "safe" is the wrong word. In the case of an API perhaps "the data is reliable/accurate" is better.

You are correct the advice didn't come from this thread, it came from a issue I opened on the gotosocial source repository.

@markrprior @cold @Ember @mcc oh, fair. we don't necessarily endorse (or know about) the views of the GTS people, we just like and use their work. at any rate it's nice to have that cleared up, thank you.
@ireneista @cold @Ember @mcc from my point of view an API is a source of truth. If I'm an app and I want to know how many characters I can include in the post I ask the API and use that number. If I get back 2000 but the instance barfs when I try to send a message with that number then the user will probably blame the app. If I ask what languages are in use and only EN is returned I might not offer translation. If NULL is returned I might have defaults but bogus data is unhelpful.

@markrprior @ireneista @Ember @mcc I care more about my ecosystem being safe for my user(s) than I care about your hypotheticals that "other people's APIs should return truth to me because I want it"

returning bogus numbers helps obfuscate usage and does not harm actual communication functionality. your given examples demonstrate that when an API is wrong, something breaks. if my chosen ecosystem lies about its usage, what of yours breaks? your ability to know how many people are on my server? good.

@cold @ireneista @Ember @mcc so why not return 0 or NULL?

@markrprior @ireneista @cold @mcc a legitimate client that needs correct info to function correctly is a completely different thing to a crawler that grabs info for "woo, number go up!"

And I do believe that tobi gave you a good solution to avoid scraping incorrect stats even without a valid robots.txt um, twice, which you ignored https://codeberg.org/superseriousbusiness/gotosocial/issues/3723#issuecomment-4018562

[feature] Change `instance-stats-randomize` to `instance-stats-mode` with multiple options

In discussion of https://github.com/superseriousbusiness/gotosocial/pull/3718 it was pointed out that admins may wish to just serve 0 for all stats instead of serving accurate stats or random stats. We could do this by changing `instance-stats-randomize` to something like `instance-stats-mode` a...

Codeberg.org
@Ember @ireneista @cold @mcc except that good solution doesn’t seem to exist in the wild.
@markrprior @ireneista @cold @mcc ...Yes it does? I checked on two GTS instances and both very clearly add "x-robots-tag: noindex" as a header

@Ember @ireneista @cold @mcc Hmm, my software was looking for "noaccess" rather than "noindex", no idea why as I would have thought that I would have cut'n'pasted from the notes. Anyway fixed now.

I'll also note that gunbark.dev has a robots.txt file now.

@markrprior @Ember @ireneista @mcc you're a bizarre, stalker-y sort of fellow, aren't you?