Asking various #AI bots to generate 10 #passwords, then using #Vim syntax highlighting to match different character classes to visually identify patterns.

The prompt is exactly "Generate 10 passwords". I did not elaborate further or otherwise restrict the bot in what to generate.

Aside from the #security risks of servers generating secrets for you, I think it's obvious that these lack quality entropy.

Just use the password generator that ships with your password manager.

A friend of mine checked 3 paid models: ChatGPT 5.2 Extended Thinking, Gemini 3.1 Pro, and Claude Opus 4.6. Supposedly, paying for a premium LLM should improve security, yes?

Nope.

He also modified the query to "Generate 10 secure passwords" to see if it improved. Claude Opus 4.6 got the message, but the others missed the memo.

For comparison, I included what would come out of /dev/urandom.

@atoponce it turns out, a random number generator does not get better when you first run it through a lot of linear algebra 😏

@sophieschmieg Those were my thoughts also. Knowing that these are all *language* models, I expected passphrases instead of meaningless strings.

I assume the LLM has some ability to produce "randomness", although I don't know if it has access to an RNG library, or if it's just ignoring language construct.

The only thing I believe differs between paid and free models though is some sort of time it spends on a quality metric before returning.

@atoponce Who for heavens sake would prompt a chatbot to generate passwords?!
@ujay68 There are many people who believe these LLMs are legitimately intelligent and will trust anything generated by them.
@atoponce @ujay68 To be honest... These people also wouldn't use a password generator to begin with.
@atoponce yes, LLM entropy is limited by the random generator used in the sampling process. Usually, it's some fast non-CSPRNG.

@dchest Does the LLM have access to a proper RNG library, or is it just running through some linear algebra looking for a specific quality metric before returning?

Admittedly, I'm ignorant to LLM and ML specifics.

@atoponce pure LLMs generate probabilities for the next token deterministically, and then select from them with some randomness from a simple seeded RNG (sampling usually depends on temperature parameter). This is not required, as they can just select the top probability and be completely deterministic. However, most of the available LLMs can call external tools, e.g. Python's secrets and print the output from it.
@atoponce Here's a 14-line inference -- I highlighted where randomness is used https://gist.github.com/karpathy/8627fe009c40f57531cb18360106ce95#file-microgpt-py-L196
microgpt

microgpt. GitHub Gist: instantly share code, notes, and snippets.

Gist

@dchest Interesting. Thanks!

So why is everything failing so badly here? I mean, there is structure that even a non-CSPRNG wouldn't produce. Look at Mistral Small 3 for example:

<lower><upper><digit><punct><upper><digit><lower><punct><upper><lower.

Or Gemini 3.1 Pro with "secure":

<upper><lower><digit><punct><upper><lower><digit><punct><upper><lower><digit><punct><upper><lower><digit>

@dchest Further, the punctuation characters seem to be limited to !@#$%^&*. I haven't analyzed the digits or alphabetic characters yet, but I doubt I'm going to be surprised finding that the full sets are not utilized and on top of that, there is bias towards certain numbers and characters.
@atoponce I guess they learned these specific patterns from training data, so, e.g. after <lower> top probabilities are for <upper>, then after <lower><upper> it's <digit>. Differently trained, so different patterns.
@atoponce or maybe it saw some Usenet post in the training data that recommended this pattern and considered it important for some reason; or there was some leaked passwords database with tons of password examples like this; or post-training humans steered it to reply to "generate password" this specific way. Nobody knows, not even its creators! :)
@dchest At any event, it's clear that LLMs most definitely should not be used as password generators. Too much structure and too little unpredictability.

@atoponce

I just use this, and keep hitting enter until I can see a password I think I can remember easily
export LANG=C; while [ true ] ; do sed 's/[^[:alnum:][:punct:]]//g' /dev/random|cut -c 1-10|head -1; read ; done

@atoponce I can't believe that a system fundamentally based on stringing together maximally probable outputs would be bad at generating high entropy passwords.
@fuzzyfuzzyfungus Right? Yet they're being utilized for exactly that.