@Bumblefish

Which one is random?
(data sets are 100 numbers 1 to 6)

listA=[2,3,5,1,2,2,4,2,4,5,2,3,3,4,5,6,4,2,6,2,2,1,3,4,5,5,6,3,3,6,1,4,2,1,4,5,2,2,3,3,3,5,6,3,2,4,5,5,1,1,1,6,1,4,3,5,5,3,1,1,1,6,1,4,6,6,3,6,6,2,4,4,4,5,1,5,6,2,6,1,1,2,4,2,2,3,4,4,5,6,1,3,3,3,5,4,6,5,1,6]

listB=[4,2,5,6,3,5,3,1,3,4,2,3,4,3,4,5,5,1,3,3,2,1,1,6,1,3,2,2,2,6,1,5,6,3,6,3,2,3,2,4,6,1,1,6,3,2,4,1,6,1,3,1,5,6,2,3,3,5,1,6,4,5,2,5,1,1,5,3,6,2,3,3,6,5,2,3,3,1,6,3,2,3,2,1,6,6,4,4,6,2,4,5,4,5,3,4,6,5,3,2]

@futurebird @Bumblefish There's literally no way to say whether a list of numbers is random or not (1, 1, 1, 1, etc can plausibly be a random sequence for all we know), though you can establish likelihoods by looking at the distribution.

@zalasur @Bumblefish

You *can* make an argument for one of these lists being random like a dice roll and the other being much less likely to be generated in that way.

@futurebird @Bumblefish Yes, you can determine probable likelihood. But given any list of items, it is impossible to prove or disprove whether a list is random or not.
@futurebird @Bumblefish The only way you could determine that something’s not random is if a pattern emerges in the data set. Even still, statistically, it is probable for a CSPRNG with good entropy to produce a random data set that looks like it’s not random—unlikely, but probable.

@ramsey @Bumblefish

Only one of these lists could *plausibly* be from rolling dice.

@futurebird @Bumblefish Based on the statistical distribution of the dice rolls?

@futurebird @ramsey @Bumblefish this is not remotely my area of expertise but I am interested in the answer. My guess would be that the list that looks more evenly distributed is the fake one, and therefore List A is the "actually random" one because it has more seemingly outlying subsets, like a whole bunch of 1s in rapid succession.

There are tons of ways to unevenly distribute but relatively few ways to evenly distribute, so the one that seems less even is more likely to be true

@futurebird @ramsey @Bumblefish also I suspect maybe a Monty Hall kind of thing where you generated a bunch of random lists, and then selected the one that looked least random to you to trick your students.

I'd love to know what the actual answer is and what you were hoping to teach your students!

@ldpm @ramsey @Bumblefish

I put the answer in the original thread with a CW. This was about frequency.

@futurebird @Bumblefish I have a UUID-generating library that, under certain conditions, could generate the same identical UUIDs because the CSPRNG it used ended up reusing the same entropy seed, unless the server was restarted. That was a *fun* bug to investigate and fix. 😉
@futurebird @Bumblefish I like list A for random and list B for “planned random”.

@futurebird
just to clarify what she means is as if from random unbiased 6 sided die rolls.

@Bumblefish

@futurebird
things I would check are first the frequency of each number... they should be somewhat uniform but not TOO close to equal as all exactly equal is unlikely... next I'd look at the length of repeat sequences and compare to expected values.

the actual definition of random sequences (Per Martin-Löf) is in terms of passing tests actually
@Bumblefish

@dlakelan @futurebird

The dictionaries in the Counter() object are the number of times each integer appears.

In [18]: Counter(listA)
Out[18]: Counter(
{2: 17, 3: 17, 5: 16, 1: 17, 4: 17, 6: 16}
)

In [19]: Counter(listB)
Out[19]: Counter(
{4: 12, 2: 17, 5: 14, 6: 17, 3: 24, 1: 16}
)

@alienghic
I'm on my phone at a volleyball game but what's the likelihood for each (probability of seeing that vector of counts given a multinomial distribution with 1/6 as probability for each value)

should be pretty easy in R or Julia or Python though offhand I would need to look at docs for any of them. Julia would be something like
using Distributions
pdf(Multinomial([1/6, 1/6,...], [17,17,17,17,16,16])
@futurebird

@dlakelan @futurebird @Bumblefish Based on this description, A looks too uniform. B could be random.
@danpmoore
agreed, the frequencies seem too uniform for the first intuitively.
@futurebird @Bumblefish

@dlakelan @futurebird @Bumblefish another thing to look for could be frequency of pairs of numbers. for an unbiased, independent dice, there should be about a 1/36 chance of each pair of numbers to appear.

unfortunately you'd quite a large number of randomly generated samples to get this chance exactly, but i guess you could do some fancy statistics to analyze these distributions and try to guess which one is "more random looking"

@futurebird @Bumblefish listA has 17 occurrences of 1-4 and 16 of 5-6, where listB has different frequencies for each. I would guess that listB is actually random, listA is too nice.

@madjohnroberts @futurebird @Bumblefish

If List A has nearly equal occurrences of each number then that’s the one most likely to have been produced by the equivalent of rolling a die 100 times.

@sabrina I think the frequency being within floor/ciel of 100/6 and the first four being ciel(100/6) and last two floor(100/6) shows intentionality. I agree the frequency should be close but not exact! It's harder to say for certain though, 100 samples isn't so much and I think with a larger N the difference would be more apparent with listB showing less volatility
@futurebird @Bumblefish

@futurebird

The mean and standard deviations for both lists are about the same.

3.46 mean 1.7 stddev for listA
3.42 mean 1.69 stddev for listB

However for listA, the count how often the values appear are all 17 or 16 so it appears to be a uniform distribution, while for list B 3 shows up 24 times, and 4 and 5 are less frequent at 12 and 14 times respectively.

My conclusion is listA was generated from a uniform random distribution and listB was not.

I can't tell if listB was made by some other more advanced random distribution, but honestly it looks like someone took a uniform distribution and turned some of the 4s and 5s into 3s.

@futurebird @Bumblefish I think B is real. A looks suspicious to me - it has 17 occurrences of 1-4 and 16 occurrences of 5-6.

Exhibits A and B: histograms of both.

@futurebird Can you settle the question?

(My vote is the many 3x repeated sequences in listA is not random, but I'm not dedicated enough to pull out a die and record 100 rolls to see if that is likely to happen a bunch of times.)

ListA was created by making a list of 16 or 17 of each number. The Stdev **of the frequencies** is much lower than what you will find on random lists of similar size.

ListB was made by rolling dice.

@futurebird listA has the subsequence 1,1,1,6,1,4 repeated twice at very short distance between them, which is, while plausible, extremely improbable. That's the way I found it's crafted.

@futurebird @Bumblefish Heh, this reminds me of something from school where... Evan? Somebody. made a plot of outputs from the system's (pseudo-)random number generator and turns out there some _very visible_ patterns. Like, obvious visible stripes in the number selection density plot.

#maths

@moira @futurebird @Bumblefish RANDU!

That's a blast from the past (already obsolete by the time I started fiddling with computers many years ago).

https://en.wikipedia.org/wiki/RANDU

I never used a system with RANDU installed, but I did discover that the PRNGs in old BASICs from the 1980s had the same basic flaw, and I found it in the nerdiest way possible: trying to draw artificial star charts with plausible distributions of star brightnesses, noticing there were some *really funky* patterns in the resulting "constellations", and eventually discovering they had the same mathematical properties that RANDU had (in some cases, worse).

RANDU - Wikipedia

@dpnash @futurebird @Bumblefish omg

that's it

tilted to the right instead of the left

that's what he found :D

@dpnash @futurebird @Bumblefish (and this is also when we all got into rolling our own random() implementations. based on proper principles, of course, we weren't inventing any. but!)

@moira @futurebird @Bumblefish

Some months before I found the RNG patterns in the fake star charts (I was around 15 or so), I had the really bright idea of “hey, let’s take the RNG output for a chosen seed as a key stream for a cipher! That’ll be really hard to break, and it’ll only be about 10 lines of code!”

That was the first time I rolled my own crypto, and thanks to serendipitously strange-looking artificial star maps, it was also the last.

@dpnash @futurebird @Bumblefish o noes xD

S'funny, none of us ever got into cryptography, at least not that I remember. Way more interested in getting _finding_ things than _hiding_ things, I think

@futurebird @Bumblefish
I think list B is random.

As others have noted A has 17 @1,2,3,4, and 16@5,6, while B is "lumpier". Also looking at the difference between consecutive numbers, list A has 23 0s (number N = number N+1), 21 +1s (Number N 1 greater than number N+1) - very clustered around repeating numbers or increments by 1. In list B the difference between consecutive numbers is much more evenly distributed, suggesting number N+1 really was independent of number N.

@futurebird @Bumblefish

Replacing a bad analysis where I forgot we are dealing with dice, not decimal digits.

The first has 23/99 runs of two matching digits and 5/98 runs of three.

The second has 12/99 and 1/98.

The expected mean fractions would be 1/6 and 1/36.

The latter series is a little closer to the expected values, but each of the two series is at some distance (on opposite sides) of the mean.

These are only a couple of the possible information signals that could be checked, but they seem prima facie to suggest the second is a slightly more plausibly random-adjacent series.

@futurebird Before I look at where the answer shows up, my guess would be that List A is random.

The odds of both dice being the same number when you roll 2 dice is 1/6 (36 possibilities, 6 desired results). For 3, that becomes 1/36. (6*6*6 possibilities, 6 desired).

What we have here is 98 consecutive possible places for a 3-of-a-kind to start. The odds that you would only draw the 1/36 chance ONCE (The 3 2's near the beginning of B) is something like....8%?

@futurebird The point is, having it appear once is something like a 94% chance. Seeing a 3-of-a-kind appear more than once is very much expected in a random distribution.

But it's NOT what we EXPECT a random distribution to look like, from a human perspective. When people see things like that appear, they get nervous. If they're making a list to LOOK random, having 3 of the same number in a row starts to feel NOT random, like it's some kind of pattern, and so they won't do it much.

@futurebird Also somehow I was wrong. Either I did my calculation wrong or that 8% chance really slipped through and I picked the absolutely wrong metric to judge this.

Alternately, I didn't consider HOW the non-random list was made and just assumed it was just someone with a pencil picking numbers based purely on vibes, when there was just a different, non-random methodology.

@AbyssalRook @futurebird I see two mistakes in your reasoning.
One is technical: events "numbers with position N, N+1 and N+2 are the same" for different values of N are _not_ independent of each other. (For example, if we know that this statement is true for N=10, then there likelihood of it being true for N=11 is 1/6, not 1/36.)
Another symbolizes a deeper problem with a lot of modern research that relies heavily on p-values: consider how many statements of this kind, containing the same amount of information, could you make? Unless you commit to a specific statement beforehand, before seeing the data: "this statement would only be true in 8% of cases for truly random data" does not really mean anything if it's just one out of 20 equally "interesting" statements one could make about the data (e.g. "how many triplets of incrementing numbers (modulo six) are there", "how many decrementing triplets are there", etc), each only 8% likely. Because of course it is expected that for most random sequences, a few of these individually not very likely statements will be true.

@IngaLovinde @AbyssalRook

It's been really helpful for me to see how many people focused on the order of the numbers in the list, which I didn't think very important since the list is so short that that type of analysis might not be that useful.

I used the random list to scramble the fake numbers twice. I should have scrambled them more.

@IngaLovinde I'm not following the first problem in the logic. The situation you're describing might be important if we're looking at more and more instances of it happening, but looking at it happening at least once (~94%) doesn't change at all, and it happening ONLY once might jiggle the ~8% estimate I had, but not significantly move it.

@IngaLovinde As for the latter, that is entirely true from a research perspective, but I picked the 3-of-a-kind pattern because I assumed the non-random list was entirely human constructed, and that particular pattern is one that sticks out to us the most. Someone making a list by hand is more likely to see "6-6-6" as less random than "6-1-2" or "3-4-5".

I did not clock 'Which is random?' as one being a dice roll and the other being a shuffled deck of prescribed cards.

@AbyssalRook but the same goes for numbers repeating twice, or four times, or ascending (2, 3, 4), or descending (4, 3, 2), or repeated pairs (5, 1, 5, 1), etc.
One can come up with many patterns or tests like that, each with similarly low probability; but that one of them matches the data doesn't mean anything, because it is expected that some of them will match. _Especially_ if you only come up with a specific pattern _after_ seeing the data.

@AbyssalRook okay let's calculate it:
Let a_n be the probability that the sequence of length n does not contain triplets of identical numbers, and does not end with two same numbers; b_n, the same, but ends with two same numbers.
Then a_1 = 1, a_2 = 5/6, b_2 = 1/6; a_(n+1) = a_n * 5/6 + b_n * 5/6; b_(n+1) = a_n * 1/6.
Or, expanding b_n, we get a_(n+2) = a_(n+1) * 5/6 + a_n * 5/36.
Plugging these numbers into Wolfram alpha (`LinearRecurrence[{5/6, 5/36}, {1, 5/6}, 100]`), we obtain a_100 ~= 0.0762866, a_99 ~= 0.0781878, and therefore the probability that the sequence of 100 random numbers does not contain triplets of the same number is a_100 + a_99/6 ~= 0.0893 = 8.93%.

By contrast, the probability that out of 98 random (and independent) triplets none will consist of three same numbers is (35/36)^98 ~= 6.32%.

That's a pretty large difference, and not just a jiggle.

(I understand that this is not the number you were looking at, but it's the easiest way to illustrate that there is a significant difference between answering questions about triplets of repeating number among 98 independent random triplets and among 98 sub-triplets of the sequence with 100 independent random numbers.)

@futurebird @Bumblefish Without any careful analysis, just winging it here, but the double occurrence of 1 6 1 4 in List A makes it sus to me. Especially since John Napier published his outline of logarithms in 1614. Coincidence? I think not!!

@futurebird @Bumblefish

It’s a trick question. Neither list is random because 7 is the most random number and does not appear in either list. A six-sided die is not able to produce a 7 and cannot therefore produce a random number.

- ChatGPT, probably.

@futurebird @Bumblefish I vote for listB: I counted the times that two subsequent numbers are equal (1,1 or 4,4). In listA this occurs ~23 times so almost 1/4 of times, which seems too many (should be around 1/6). In listB it is ~9 times unless I missed some. Seems fewer than expected but anyway. If I’d spend more time I’d go for higher order ngrams
@futurebird @Bumblefish I once had to create a “random” PIN for a voicemail system. The machine rejected my first choice, saying, “That is not a random number.”
Evidently there had been a revolution in mathematics somewhere in their office, but I never found out any more about it.
@futurebird @Bumblefish I'm no stats student, so maybe I haven't the bases (for lack of a better term, English is not my main language), but I think listA is the random one. The fact that in the listB there is nearly no triplets seems too good to be true.

@lamecarlate @Bumblefish

I've got some bad news. I've posted the solution with a CW on the original thread.

@futurebird @Bumblefish Yep, I read it… My bad. I used instinct, guts, not mathematics like the other answers. I should have 😅