Had a lot of fun with my stats students today. I gave them two data sets. One from a random number generator, the other was one I made up that was not random, but designed to look random. They were able to figure out which one was fake.

Then we had ChatGPT make the same kind of data set (random numbers 1-6 set of 100) and it had the same problems as my fake set but in a different way.

We talked about the study about AI generated passwords.

There is something very creepy about the way LLMs willy cheerfully give lists of "random" numbers. But they aren't random in frequency, and as my students pointed out "it's probably from some webpage about how to generate random numbers"

But even then, why is the frequency so unnaturally regular? Is that an artifact from mixing lists of real random numbers together?

The LLM is like a little box of computer horrors that we peer into from time to time.

I'm sorry but the whole interface is just so silly.

You ask for random numbers with sentences and it pretends to give them to you? What are we doooooing?

@futurebird it really puts into perspective what my interaction with real people is like

@futurebird It's very weird.

In principle, if you take an LLM, you should be able to get it to generate random numbers in a way that reflects the numbers that appear in the corpus it was trained on. If you have the raw model you can probably do that.

But if you ask ChatGPT (or at least if I do) it starts talking about how numbers taken from around us typically follow Benford's law so their first digits have a logarithmic distribution. When it then spits out some random numbers it's no longer sampling random numbers from the entire corpus but a sample that's probably heavily biased towards numbers that appear in articles about Benford's law. I.e. what people have previously said about these numbers, rather than the actual numbers.

@dpiponi Even with a raw model, I don't see how you would sample from the distribution of numbers in the corpus. Perhaps provide no context and sample one or more tokens (using an independent pseudo-random number generator) from the distribution, and if the returned token parses as a number, return it to the user, otherwise try again. Providing any context/prompt would bias what is returned. This seems too contrived/circular.
@futurebird
@jedbrown @futurebird You described exactly what I would do. Obviously it would depend on an external PRNG and yes, no prompt. One natural way to use an LLM is to transform draws from a PRNG into draws from a distribution intended to represent some corpus. Picking numbers out of these draws would be expected to have a similar distribution to picking numbers from the original corpus. IIRC I may already have tested to see of the results conform to Benford's law - I did a lot of stuff like that when llama.cpp first became available. You have to select the right parameters to have llama.cpp use the distribution "correctly".

@dpiponi @futurebird

Which in turn is what LLM do. They give an averaged output, not a reasoned.

In addition the inherent laws of measurement and control define that any reached output will never met the intended. Thus LLM output will never increase knowledge, but migrate toward zero.

@futurebird I am reminded of a Doctor Who episode, where they realize they are in a simulation because they are incapable of generating truly random numbers. One scene has a whole bunch of scientists sitting at a table and they all keep yelling the same number at the same time.

@futurebird the first episode of Numb3rs covered the appearance of randomness vs true randomness. I would not have remember that but watched a bunch of episodes to serve as math concept inspiration for the 31 music pieces I wrote and performed (on actual hardware synths) the whole month of January for #jamuary2026 #math #music #synths

https://soundcloud.com/francois_dion/sets/jamuary-2026

Jamuary 2026

A few recordings I did for #jamuary 2026. I will add a track each day or two for the whole month. The day of the month draws a title and some concept tied to that. These are the only 2 guiding princi

SoundCloud

@futurebird I do truly wonder how many "non techies" understand that none of this AI stuff actually understands the questions it is asked, or the things it reads, etc.

Like, you and I know it's just sparkling autocomplete. But how many of my family members know that? And actually understand what that means? And why it leads to the outcomes it does?

@ricko

This is the epistemological issue I have with the interface. It's ... well, not to be harsh but it's deceptive.

If you ask a "computer" for random numbers that has a kind of meaning, and expected process. If you ask a computer "how did you generate those random numbers?" that also has a set of expectations... and an LLM isn't meeting ANY of them.

@futurebird 🎶 little box, little box of horrors 🎶
@futurebird the first time I had to go nuclear about LLM use in my department was when my boss was showing me her design for a major experiment where they were planting actual trees of different species in long term plots, and when I asked how did they randomise the distribution of species she said the post doc responsible for setting up the experiment had asked chatgpt to randomise it! (1/2)
@futurebird And that was about 2 years ago, when this kind of thing would probably be even worse. It took me half an hour to write code to generate the plots and some nice figures with the positions of every tree... I wonder how long they were fighting the chat box to get any kind of answer. Let alone the fact this experiment will be running for years to come. How can people be so careless? (2/2)
@LeoRJorge @futurebird Over and over again, if you know what you're doing, the LLM-generated version of it is so bad that doing it from scratch is easier and faster. Only people who don't know what they're doing, and usually people who sneer at learning to do something, really want to use LLMs. They think it's a cheat-code against acquiring skills, but it just makes them look lazy and uncaring. That's the owner-class dream, of course.

@futurebird Well, LLMs are tools. Know their limitations. Know their power.

In your case:

"create 20 random numbers between 1 and 100 by developing a little python app and running it"

Some day, AIs will respond to any prompt in a perfect way and we humans will be in deep shit.

Edit: LOL mistral.ai answers this prompt by generating the random numbers and THEN SORTING THEM. 🤦‍♂️

@Mastokarl @futurebird > Some day, AIs will respond to any prompt in a perfect way and we humans will be in deep shit.

I don’t think those AIs will be based on LLMs, though. 🙂

@ramsey @futurebird It will be like in Hitchiker‘s Guide to the Galaxy. LLMs are like Deep Thought, the confusing computer that cannot answer the big question but that will construct the computer that can.

With Reinforcement learning you can make LLMs better and better coders, and some day a million brilliant AI coders will design a NN that we‘re too stupid to design.

(Yes that requires creativity, yes I don‘t have a problem with assuming the LLMs will have that)

@Mastokarl @futurebird I can’t see it happening with LLM technology. LLMs might get very good at fooling you into thinking they’re coming up with novel and creative solutions, but they can’t actually problem-solve. That’s not how they work.
@ramsey @futurebird Which is quite funny because I see Claude and ChatGPT solve fairly complicated software development problems every day. And I guarantee you, some of my software problems are not in a training set. Maybe you‘re underestimating what LLMs today do?
Not a single ai passes the PEN TEST #grok #chatgpt #gemini #ai

YouTube

@futurebird @Mastokarl This is clearly AI-generated, and it doesn’t compile, and many of the tests aren’t testing what they claim to be testing.

https://github.com/php/php-src/pull/21317

@ramsey @futurebird Yes, there are a large number of examples of AI failures. If the codebase gets larger, currently probably around 50k lines, results get worse, you have to invest more upfront to make the LLM understand the code base, and you might still fail.

For everything below, say, 20k lines, Opus 4.6 totally beats me at coding. And I consider me quite good at that, CS PhD and all.

And the limit of what size of codebase an LLM can handle increases every month.

@Mastokarl @futurebird I don’t have a PhD or even a CS degree, so I’m sure I don’t know what I’m talking about. 😉
@ramsey @futurebird Oops sorry if my post suggested that. Just needed a few-words way to say „I know how to code“. I appreciate your posts.
@Mastokarl @futurebird No worries. I’m sorry for responding like that.
@futurebird @ramsey Quite unexpectedly, AIs fail in tasks that humans can do easily. And humans fail in tasks that AIs can do easily. They are not perfect, we are not perfect. If we use them right, they can make us much faster. If we use them right we will look like idiots (like that lawyer who brought a document to court that was AI generated and full of non-existing case references).
@futurebird The trouble is that people can accept that "factual" output from an LLM may be statistically generated until they hit words that are generated that sound like "reasoning." Then even the most aware humans can get lulled into thinking that the words can be trusted.

@futurebird there was a study that found that if you give an LLM some prompting to push it into a particular sampling-space (say, "bleeding heart leftie") and then ask it for some random numbers, you can then feed those numbers into another fresh instance and it'll drift towards the same sampling space.

In other words, even the numerical distributions they sample from can be connected to the broader "noosphere" they're trained on, and that relation is a fucked sort of bijection

@futurebird  if you prompt it into "stats prof" or "crypto nerd" sampling space does it improve the quality of the fake RNG output?

@futurebird
> what are we doing?

I think that the best description is, that we take part in a play. LLM makes its best effort to write how this dialogue could continue to look plausible for the reader. Choose your own adventure.

@futurebird

"What are we doooooing?"

Well, we've taken the sound algorithm of a brabbling baby, supercharged by a huge library of words annotated by possibility of sequence and now management is jumping around like parents bragging what a genius their 11 month old is. All because WE try to find meaning in the perceived word sequence.

Same management that brags about 1400% lower prices :))

@futurebird glorifying statistical models? It's just #marketing.

Most of the "godfathers of #AI" are cashing in, but also think that the idea of LLM's leading to #AGI is laughable.

But let's us, Scooby & The Gang rip off the monsters mask... (1/2)

It was Old Man Surveillance Capitalism all along!? Who knew...?

We all did. Come, on let's not kid yourselves.

The Vectors of Intent which Drive the Pursuit of Large Language Models

I am interested in modeling the context within which large language models, or so-called artificial intelligence, is being developed.

John’s Substack
@futurebird "What are we doooooing?"
Well, we're learning.
Life is hard, progress is often (usually) imaginary, the wisdom of crowds is flawed, ...
Survival of the fittest (or determination of capital 't' TRUTH) is only determined by survival, not by a priori guesses.
@futurebird It seems like a massive regression to go from complicated but highly specific UI controlling deterministic software to begging a computer to spit out something that approximates the shape of the answer you want.
@futurebird I agree, I think that the interconnected network models underlying LLMs are very useful. Things like Alpha-fold, and some of the methods for generating possible compounds, or searching for data in huge datasets… but the jump to use LLMs as general intelligence seems silly.

@futurebird

There are some great short videos of people just trying to get different LLMs to simply count to 200. They can't. They keep stopping to try to give the prompter what they "think" the prompter wants. Just nuts.

There is no way AI can be trusted to provide true random numbers. In fact, that goes against what it tries to do - predict what the answer should be.