Had a lot of fun with my stats students today. I gave them two data sets. One from a random number generator, the other was one I made up that was not random, but designed to look random. They were able to figure out which one was fake.

Then we had ChatGPT make the same kind of data set (random numbers 1-6 set of 100) and it had the same problems as my fake set but in a different way.

We talked about the study about AI generated passwords.

There is something very creepy about the way LLMs willy cheerfully give lists of "random" numbers. But they aren't random in frequency, and as my students pointed out "it's probably from some webpage about how to generate random numbers"

But even then, why is the frequency so unnaturally regular? Is that an artifact from mixing lists of real random numbers together?

The LLM is like a little box of computer horrors that we peer into from time to time.

I'm sorry but the whole interface is just so silly.

You ask for random numbers with sentences and it pretends to give them to you? What are we doooooing?

@futurebird Well, LLMs are tools. Know their limitations. Know their power.

In your case:

"create 20 random numbers between 1 and 100 by developing a little python app and running it"

Some day, AIs will respond to any prompt in a perfect way and we humans will be in deep shit.

Edit: LOL mistral.ai answers this prompt by generating the random numbers and THEN SORTING THEM. 🤦‍♂️

@Mastokarl @futurebird > Some day, AIs will respond to any prompt in a perfect way and we humans will be in deep shit.

I don’t think those AIs will be based on LLMs, though. 🙂

@ramsey @futurebird It will be like in Hitchiker‘s Guide to the Galaxy. LLMs are like Deep Thought, the confusing computer that cannot answer the big question but that will construct the computer that can.

With Reinforcement learning you can make LLMs better and better coders, and some day a million brilliant AI coders will design a NN that we‘re too stupid to design.

(Yes that requires creativity, yes I don‘t have a problem with assuming the LLMs will have that)

@Mastokarl @futurebird I can’t see it happening with LLM technology. LLMs might get very good at fooling you into thinking they’re coming up with novel and creative solutions, but they can’t actually problem-solve. That’s not how they work.
@ramsey @futurebird Which is quite funny because I see Claude and ChatGPT solve fairly complicated software development problems every day. And I guarantee you, some of my software problems are not in a training set. Maybe you‘re underestimating what LLMs today do?
Not a single ai passes the PEN TEST #grok #chatgpt #gemini #ai

YouTube

@futurebird @Mastokarl This is clearly AI-generated, and it doesn’t compile, and many of the tests aren’t testing what they claim to be testing.

https://github.com/php/php-src/pull/21317

@ramsey @futurebird Yes, there are a large number of examples of AI failures. If the codebase gets larger, currently probably around 50k lines, results get worse, you have to invest more upfront to make the LLM understand the code base, and you might still fail.

For everything below, say, 20k lines, Opus 4.6 totally beats me at coding. And I consider me quite good at that, CS PhD and all.

And the limit of what size of codebase an LLM can handle increases every month.

@Mastokarl @futurebird I don’t have a PhD or even a CS degree, so I’m sure I don’t know what I’m talking about. 😉
@ramsey @futurebird Oops sorry if my post suggested that. Just needed a few-words way to say „I know how to code“. I appreciate your posts.
@Mastokarl @futurebird No worries. I’m sorry for responding like that.