This morning while walking to work I had a thought: is it possible that the reason why the perception of the #replyGuy problem being worse on #Mastodon (and the #Fediverse in general) is due to sheer (lack of) numbers?

So I came up with a simplified model to check if this was indeed sufficient to explain the phenomenon.

Let's assume that, similarly to Mario Cipolla's Basic Laws of Human Stupidity, the fraction of reply guys in a population is a given constant ρ.

1/n

#maths

Let's also assume for simplicity that reply guys operate mechanically, i.e. do not use some kind of automation to find posts to reply to. This puts an upper limit to the total number of posts a reply guy can reply to in the course of a single day. Let's call this D.

We can also assume that on average a single non-reply-guy will generally post P < D posts per day.

Ignoring for a moment larger timeframes than a single day (but the discussion is similar if we extend the observation period) …

2/n

 … and assuming replyguys select randomly the posts to reply to (i.e. they don't specifically target a person for any particular reason), can we calculate what is the probability that a given person will see at least one reply from a reply guy?

Before doing the actual math, let's look at some practical example. Let's say that 1 in 10 users are reply guys, a reply guy goes through 100 posts per day, and a regular user posts 10 messages per day.

3/n

If there are 10 users in the network, 9 are regular users and 1 is a reply guy. There are 90 regular posts per day, and the the single reply guy will see ALL of them (because they can read 100 posts per day, so they see all the posts made on that network on that day.

However, if there are 100 users in the network, 10 of which are reply guys, the network will produce 900 posts in a day, but reply guys will NOT necessarily see all of them!

4/n

In the second scenario, each reply guy will see 100 random posts of the 900. Even if there are 10 reply guys, this does NOT mean that each post will be found by a reply guy! There may be posts that are seen by multiple reply guys, and posts that will not be seen by any reply guy!

So how do we compute the probability for a post to be seen by a reply guy in this case?

One way to do this is to compute the probability of NOT being seen by ANY reply guy, and taking the complement.

5/n

Now, since each reply guy sees 100 of the 900 posts, each post has a 1/9 probability of being seen by any particular reply guy, i.e. an 8/9 probability of NOT being seen by that particular reply guy. The probability of not being seen by ANY reply guy is thus 8/9 for each reply guy and assuming that the probabilities are independent, this means that the probability of not being seen by any of the 10 reply guys is (8/9)^10, which is 0.30794....

6/n

This means that the probability for each post to be seen by AT LEAST ONE reply guy is 1-(8/9)^10 or approximately 69%.

Of course, if someone posts 10 posts per day, then the probability for them to come across a reply guy is higher than that, because it will be the probability of at least on their post being seen by at least one reply guy!

Again assuming that for each post the probabilities are independent, the math is only slightly more complex:

6/n

each post has a (8/9)^10 probability of not being seen, but for the probability for ALL posts not being seen, that has to be further elevated to the number of posts made by the same person, which is 10, so (8/9)^100, and the probability of AT LEAST ONE POST being seen by AT LEAST ONE reply guys becomes 1 - (8/9)^100 = 0.99999233… which is not 100% but very close to it.

This was the case of a network 100 people, 10 of which are reply guys. Let's do one more and then go general.

7/n

Let's now assume the network has 100K users, still with the same 10% of reply guys (90K regular users, 10K reply guys).
Each reply guy still reads 100 posts per day, and the regular users still produce 10 posts per day.

This means that in a single day the regular users produce 900K posts, and each reply guys reads 100 of those 900K posts!

Still under the randomness assumption, this means that a single reply guy will only see 100/900K posts!

8/n

So now the probability of AT LEAST ONE of the 10 post by a regular user being seen by AT LEAST‌ ONE of the reply guys is:

1 - (1 - 100/900K)^(10*10K) = 99.9985%

Note that compared to the 99.99923% of the other case, this value is indeed smaller! We went from 100% something with 5 9s to something with four 9s (and an 8).

9/n

If we go to the general case of a fraction ρ of reply guys over the given population P reading L posts per day with D posts per person being produced on average, we can then compute the probability of coming across a reply one in one of your mentions when posting D0 messages as

1 - (1 - L/(D*P*(1-ρ))^(D0*P*ρ)

It is possible to show that this expression is decreasing in P, i.e. increasing the population *will* decrease the probability of coming across reply guys,

10/n

… even if the “density” of reply guys in the population remains constant. Also, a network with more (regular) messages per person will see *statistically* lower interactions with reply guys, and (obviously) people posting less will see statistically lower interactions with reply guys.

11/n

Of course, this is a gross simplification, that does not take into account the effects of boosts (that increase visibility of individual posts) or the fact that reply guys may specifically target some demographic within the network at large (which will lead to specific individuals seeing higher reply guy interactions than other individuals).

12/n

So yeah, this alone does not explain why you may experience more reply guys here on the Fediverse than in other networks. BUT, it does show that this is at least in part due to the Fediverse being a smaller network.

Cheers, and block those asshats,

13/13