Had a lot of fun with my stats students today. I gave them two data sets. One from a random number generator, the other was one I made up that was not random, but designed to look random. They were able to figure out which one was fake.

Then we had ChatGPT make the same kind of data set (random numbers 1-6 set of 100) and it had the same problems as my fake set but in a different way.

We talked about the study about AI generated passwords.

There is something very creepy about the way LLMs willy cheerfully give lists of "random" numbers. But they aren't random in frequency, and as my students pointed out "it's probably from some webpage about how to generate random numbers"

But even then, why is the frequency so unnaturally regular? Is that an artifact from mixing lists of real random numbers together?

@futurebird haven't tried it but maybe it's also all mixed up with non-random numbers in training content e.g. the next number after '20' is likely one of 0, 1 or 2, the start of a 21st century year so far. Or Benford's law https://en.wikipedia.org/wiki/Benford%27s_law
Benford's law - Wikipedia

@okohll @futurebird I was about to suggest Benford's Law too!
@cstross @futurebird God does play dice, but there’s a big lead weight in one side