Had a lot of fun with my stats students today. I gave them two data sets. One from a random number generator, the other was one I made up that was not random, but designed to look random. They were able to figure out which one was fake.

Then we had ChatGPT make the same kind of data set (random numbers 1-6 set of 100) and it had the same problems as my fake set but in a different way.

We talked about the study about AI generated passwords.

"Why don't you just load a library to find the mean and SD?"

Because I'M OLD. I like to write my own function. I do it for integration sometimes... kids these days.

@futurebird Faster than finding a library and RTFM too.

@futurebird

And if you keep using someone else's functions, you'll never truly understand the underlying math.

So many nights spent pouring over books and trying out code to learn the what and how... 😵‍💫😅

@futurebird
When I was a kid, we solved integrals in the snow and rain uphill in both directions.
@ohmu @futurebird LOL 42 and 73 are my picks for "random" numbers out of the LLMs, for now.
@ai6yr @ohmu @futurebird wait so... is that the ultimate question? "What number will an LLM always include when generating random numbers?"
@meuwese @ohmu @futurebird Apparently humans have willed that into existence, yes. LOL. (err... Douglas Adams, precisely)
@futurebird I know how to find the SD and I will use the php-stats library every day of the week and twice on Sunday. I would much rather be able to depend on well supported community code. (At least until it is all replaced by ai slop)
@ldpm @futurebird
AIUI, there's also that the formulas for mean and especially stdev that we learn in school don't work great with the way we represent floating point numbers in computers, with the way rounding works with those, and hopefully the stats library uses more obscure formulas that take care of that, what they call "numerical stability"

@ldpm

I don't mind using libraries, but it's fun to write my own versions of things just so I know how they work.

When we make projects where we share code I encourage them to use libraries more often. I'm just a grumpy old lady about it sometimes.

@futurebird I found out quickly that the entropy tools from NIST and Fourmilab don’t work well with a data set that’s log2(6) bits per element.
@futurebird I assume from this post someone already mentioned statistics from the python standard library?

There is something very creepy about the way LLMs willy cheerfully give lists of "random" numbers. But they aren't random in frequency, and as my students pointed out "it's probably from some webpage about how to generate random numbers"

But even then, why is the frequency so unnaturally regular? Is that an artifact from mixing lists of real random numbers together?

The LLM is like a little box of computer horrors that we peer into from time to time.

I'm sorry but the whole interface is just so silly.

You ask for random numbers with sentences and it pretends to give them to you? What are we doooooing?

@futurebird it really puts into perspective what my interaction with real people is like

@futurebird It's very weird.

In principle, if you take an LLM, you should be able to get it to generate random numbers in a way that reflects the numbers that appear in the corpus it was trained on. If you have the raw model you can probably do that.

But if you ask ChatGPT (or at least if I do) it starts talking about how numbers taken from around us typically follow Benford's law so their first digits have a logarithmic distribution. When it then spits out some random numbers it's no longer sampling random numbers from the entire corpus but a sample that's probably heavily biased towards numbers that appear in articles about Benford's law. I.e. what people have previously said about these numbers, rather than the actual numbers.

@dpiponi Even with a raw model, I don't see how you would sample from the distribution of numbers in the corpus. Perhaps provide no context and sample one or more tokens (using an independent pseudo-random number generator) from the distribution, and if the returned token parses as a number, return it to the user, otherwise try again. Providing any context/prompt would bias what is returned. This seems too contrived/circular.
@futurebird
@jedbrown @futurebird You described exactly what I would do. Obviously it would depend on an external PRNG and yes, no prompt. One natural way to use an LLM is to transform draws from a PRNG into draws from a distribution intended to represent some corpus. Picking numbers out of these draws would be expected to have a similar distribution to picking numbers from the original corpus. IIRC I may already have tested to see of the results conform to Benford's law - I did a lot of stuff like that when llama.cpp first became available. You have to select the right parameters to have llama.cpp use the distribution "correctly".

@dpiponi @futurebird

Which in turn is what LLM do. They give an averaged output, not a reasoned.

In addition the inherent laws of measurement and control define that any reached output will never met the intended. Thus LLM output will never increase knowledge, but migrate toward zero.

@futurebird I am reminded of a Doctor Who episode, where they realize they are in a simulation because they are incapable of generating truly random numbers. One scene has a whole bunch of scientists sitting at a table and they all keep yelling the same number at the same time.

@futurebird the first episode of Numb3rs covered the appearance of randomness vs true randomness. I would not have remember that but watched a bunch of episodes to serve as math concept inspiration for the 31 music pieces I wrote and performed (on actual hardware synths) the whole month of January for #jamuary2026 #math #music #synths

https://soundcloud.com/francois_dion/sets/jamuary-2026

Jamuary 2026

A few recordings I did for #jamuary 2026. I will add a track each day or two for the whole month. The day of the month draws a title and some concept tied to that. These are the only 2 guiding princi

SoundCloud

@futurebird I do truly wonder how many "non techies" understand that none of this AI stuff actually understands the questions it is asked, or the things it reads, etc.

Like, you and I know it's just sparkling autocomplete. But how many of my family members know that? And actually understand what that means? And why it leads to the outcomes it does?

@ricko

This is the epistemological issue I have with the interface. It's ... well, not to be harsh but it's deceptive.

If you ask a "computer" for random numbers that has a kind of meaning, and expected process. If you ask a computer "how did you generate those random numbers?" that also has a set of expectations... and an LLM isn't meeting ANY of them.

@futurebird 🎶 little box, little box of horrors 🎶
@futurebird the first time I had to go nuclear about LLM use in my department was when my boss was showing me her design for a major experiment where they were planting actual trees of different species in long term plots, and when I asked how did they randomise the distribution of species she said the post doc responsible for setting up the experiment had asked chatgpt to randomise it! (1/2)
@futurebird And that was about 2 years ago, when this kind of thing would probably be even worse. It took me half an hour to write code to generate the plots and some nice figures with the positions of every tree... I wonder how long they were fighting the chat box to get any kind of answer. Let alone the fact this experiment will be running for years to come. How can people be so careless? (2/2)
@LeoRJorge @futurebird Over and over again, if you know what you're doing, the LLM-generated version of it is so bad that doing it from scratch is easier and faster. Only people who don't know what they're doing, and usually people who sneer at learning to do something, really want to use LLMs. They think it's a cheat-code against acquiring skills, but it just makes them look lazy and uncaring. That's the owner-class dream, of course.

@futurebird Well, LLMs are tools. Know their limitations. Know their power.

In your case:

"create 20 random numbers between 1 and 100 by developing a little python app and running it"

Some day, AIs will respond to any prompt in a perfect way and we humans will be in deep shit.

Edit: LOL mistral.ai answers this prompt by generating the random numbers and THEN SORTING THEM. 🤦‍♂️

@Mastokarl @futurebird > Some day, AIs will respond to any prompt in a perfect way and we humans will be in deep shit.

I don’t think those AIs will be based on LLMs, though. 🙂

@ramsey @futurebird It will be like in Hitchiker‘s Guide to the Galaxy. LLMs are like Deep Thought, the confusing computer that cannot answer the big question but that will construct the computer that can.

With Reinforcement learning you can make LLMs better and better coders, and some day a million brilliant AI coders will design a NN that we‘re too stupid to design.

(Yes that requires creativity, yes I don‘t have a problem with assuming the LLMs will have that)

@Mastokarl @futurebird I can’t see it happening with LLM technology. LLMs might get very good at fooling you into thinking they’re coming up with novel and creative solutions, but they can’t actually problem-solve. That’s not how they work.
@ramsey @futurebird Which is quite funny because I see Claude and ChatGPT solve fairly complicated software development problems every day. And I guarantee you, some of my software problems are not in a training set. Maybe you‘re underestimating what LLMs today do?
Not a single ai passes the PEN TEST #grok #chatgpt #gemini #ai

YouTube

@futurebird @Mastokarl This is clearly AI-generated, and it doesn’t compile, and many of the tests aren’t testing what they claim to be testing.

https://github.com/php/php-src/pull/21317

@ramsey @futurebird Yes, there are a large number of examples of AI failures. If the codebase gets larger, currently probably around 50k lines, results get worse, you have to invest more upfront to make the LLM understand the code base, and you might still fail.

For everything below, say, 20k lines, Opus 4.6 totally beats me at coding. And I consider me quite good at that, CS PhD and all.

And the limit of what size of codebase an LLM can handle increases every month.

@Mastokarl @futurebird I don’t have a PhD or even a CS degree, so I’m sure I don’t know what I’m talking about. 😉
@ramsey @futurebird Oops sorry if my post suggested that. Just needed a few-words way to say „I know how to code“. I appreciate your posts.
@Mastokarl @futurebird No worries. I’m sorry for responding like that.
@futurebird @ramsey Quite unexpectedly, AIs fail in tasks that humans can do easily. And humans fail in tasks that AIs can do easily. They are not perfect, we are not perfect. If we use them right, they can make us much faster. If we use them right we will look like idiots (like that lawyer who brought a document to court that was AI generated and full of non-existing case references).
@futurebird The trouble is that people can accept that "factual" output from an LLM may be statistically generated until they hit words that are generated that sound like "reasoning." Then even the most aware humans can get lulled into thinking that the words can be trusted.

@futurebird there was a study that found that if you give an LLM some prompting to push it into a particular sampling-space (say, "bleeding heart leftie") and then ask it for some random numbers, you can then feed those numbers into another fresh instance and it'll drift towards the same sampling space.

In other words, even the numerical distributions they sample from can be connected to the broader "noosphere" they're trained on, and that relation is a fucked sort of bijection

@futurebird  if you prompt it into "stats prof" or "crypto nerd" sampling space does it improve the quality of the fake RNG output?
@futurebird I think I've got a printed book of random numbers upstairs somewhere.
@darkling @futurebird this is like the old logarithm and trigonometric tables i used to use as a kid.
@flipper @futurebird I definitely have some of those. Several, in fact, at various levels of precision and different sets of functions.
@darkling @futurebird I don't have those any more, just statistical tables for critical values for things like F tests, but I used to pore over them when I was at school, trying to see any particular pattern.

@darkling @futurebird

Books of random numbers or letters were made

https://en.wikipedia.org/wiki/One-time_pad

One-time pad - Wikipedia

@alienghic @futurebird Indeed they were. I wasn't joking when I said I thought I had a book of them.
@futurebird haven't tried it but maybe it's also all mixed up with non-random numbers in training content e.g. the next number after '20' is likely one of 0, 1 or 2, the start of a 21st century year so far. Or Benford's law https://en.wikipedia.org/wiki/Benford%27s_law
Benford's law - Wikipedia

@okohll @futurebird I was about to suggest Benford's Law too!
@cstross @futurebird God does play dice, but there’s a big lead weight in one side
@futurebird i mean the LLM itself is just a statistical distribution… the path through the distribution is i assume randomized, but the distribution itself is gonna be the same every time.