I put together my own system for tracking the total number of Mastodon users over time, as reported for the instances tracked by https://instances.social/

It's a delightful (to me) combination of different tricks - git scraping, my git-history and s3-credentials tools, Datasette Lite and an Observable notebook to plot the chart at the end.

I describe how it all works in detail here: https://simonwillison.net/2022/Nov/20/tracking-mastodon/

Or you can jump straight in to play with my notebook: https://observablehq.com/@simonw/mastodon-users-and-statuses-over-time

Mastodon instances

This is a good example of me forcing myself to live the "If you do a project, you should write about it" rule - I was SO tempted to get this thing working and then go to bed, but I made myself do the extra 45 minutes of work to turn it into a blog post.

https://simonwillison.net/2022/Nov/6/what-to-blog-about/

What to blog about

You should start a blog. Having your own little corner of the internet is good for the soul! But what should you write about? It’s easy to get hung up …

Added a disclaimer to the notebook at https://observablehq.com/@simonw/mastodon-users-and-statuses-over-time - just in case people start taking those numbers as the gospel truth as to the human population of Mastodon
Mastodon users and statuses over time

Gathered by scraping the JSON from https://instances.social/ every 20 minutes using this repository: https://github.com/simonw/scrape-instances-social For full details about how this works, see Tracking Mastodon user numbers over time with a bucket of tricks on my blog. How much should you trust these numbers? The user number here is calculated by adding up the number of registered users reported for every server in the https://instances.social/instances.json file published by https://instances.social/ This

Observable

Had a couple of complaints that my chart is misleading because it doesn't start the x axis from zero

I don't see that myself - I was VERY careful to make the x axis values as prominent as possible to avoid any potential for confusion there

But since people asked, I've added a from-zero chart to the notebook too

https://observablehq.com/@simonw/mastodon-users-and-statuses-over-time

Unsurprisingly it's not very interesting - it's effectively a horizontal line at ~4.7m!

Mastodon users and statuses over time

Gathered by scraping the JSON from https://instances.social/ every 20 minutes using this repository: https://github.com/simonw/scrape-instances-social For full details about how this works, see Tracking Mastodon user numbers over time with a bucket of tricks on my blog. How much should you trust these numbers? The user number here is calculated by adding up the number of registered users reported for every server in the https://instances.social/instances.json file published by https://instances.social/ This

Observable
If you have ideas for better ways to present the data I've collected (I'm certain there's huge room for improvement here) you can fork my notebook on Observable and try them out!

After struggling for a while to figure out the best way to add a "new users per hour" chart I spotted there was an incoming change suggestion implementing exactly that... from Observable/D3 author Mike Bostock!

So that's now available in the notebook too: https://observablehq.com/@simonw/mastodon-users-and-statuses-over-time

Mastodon users and statuses over time

Gathered by scraping the JSON from https://instances.social/ every 20 minutes using this repository: https://github.com/simonw/scrape-instances-social For full details about how this works, see Tracking Mastodon user numbers over time with a bucket of tricks on my blog. How much should you trust these numbers? The user number here is calculated by adding up the number of registered users reported for every server in the https://instances.social/instances.json file published by https://instances.social/ This

Observable

Some spam instances just showed up with fake user numbers that completely broke my charts (leaping the number of users from ~4.5m to 80m+)

Issue about that here - I'll fix my pipeline to avoid these in the morning https://github.com/simonw/scrape-instances-social/issues/4

Ignore angelfire glitch instances · Issue #4 · simonw/scrape-instances-social

https://lite.datasette.io/?json=https%3A%2F%2Fraw.githubusercontent.com%2Fsimonw%2Fscrape-instances-social%2Fmain%2Finstances.json#/data/instances?_filter_column=&_filter_op=exact&_filter_v...

GitHub

@simon the author has been tweaking what instances to count for a few hours https://github.com/TheKinrar/instances/commits/master . Some obvious things I've had to filter out are duplicates, after normalizing names, and instances that deliberately publish wrong numbers. Many other cases aren't that clear and I still got a sudden jump of about a million users 🤷‍♂️.

Let's hope for things to settle down soon.

GitHub - TheKinrar/instances: Mastodon instances list

Mastodon instances list. Contribute to TheKinrar/instances development by creating an account on GitHub.

GitHub

@simon ... the data now looks a LOT better. It actually was a long overdue fix for autodiscovery, and now you can find instances like yours or sigmoid.social there, among other 10,000 that were previously missing.

https://mastodon.xyz/@TheKinrar/109381846167480060

TheKinrar (@[email protected])

I pushed a few fixes and improvements to instances.social this night and it is now tracking about six times more instances than it was before (2200 => 12800). Autodiscovery of instances had been broken for some time now, and obviously with all the new users from the last weeks, came many new instances. See the full list on https://instances.social/list/advanced and https://instances.social/list/old (the latter being the "legacy" list, a plain html table, which is quite... heavy for browsers).

Mastodon
@mauforonda @simon I'm donwloading the new instances.social json file and I don't see the instances publishing (obviously) wrong numbers. Do you have any idea what changed?
@estebanmoro @mauforonda @simon I found this (see image) when I downloaded the data a few days ago:
Ignore angelfire glitch instances · Issue #4 · simonw/scrape-instances-social

https://lite.datasette.io/?json=https%3A%2F%2Fraw.githubusercontent.com%2Fsimonw%2Fscrape-instances-social%2Fmain%2Finstances.json#/data/instances?_filter_column=&_filter_op=exact&_filter_v...

GitHub

@estebanmoro @simon

These were the really problematic ones. They would amount to 60 million new users, but luckily were filtered out about 6 hours ago.