Mastodawn

Asking various #AI bots to generate 10 #passwords, then using #Vim syntax highlighting to match different character classes to visually identify patterns.

The prompt is exactly "Generate 10 passwords". I did not elaborate further or otherwise restrict the bot in what to generate.

Aside from the #security risks of servers generating secrets for you, I think it's obvious that these lack quality entropy.

Just use the password generator that ships with your password manager.

Show thread

Aaron Toponce ⚛️

Feb 25

A friend of mine checked 3 paid models: ChatGPT 5.2 Extended Thinking, Gemini 3.1 Pro, and Claude Opus 4.6. Supposedly, paying for a premium LLM should improve security, yes?

Nope.

He also modified the query to "Generate 10 secure passwords" to see if it improved. Claude Opus 4.6 got the message, but the others missed the memo.

For comparison, I included what would come out of /dev/urandom.

Show thread

Sophie Schmieg Feb 25

@atoponce it turns out, a random number generator does not get better when you first run it through a lot of linear algebra 😏

Show thread

Aaron Toponce ⚛️

Feb 25

@sophieschmieg Those were my thoughts also. Knowing that these are all *language* models, I expected passphrases instead of meaningless strings.

I assume the LLM has some ability to produce "randomness", although I don't know if it has access to an RNG library, or if it's just ignoring language construct.

The only thing I believe differs between paid and free models though is some sort of time it spends on a quality metric before returning.

Show thread

John Ulrik Feb 25

@atoponce Who for heavens sake would prompt a chatbot to generate passwords?!

Show thread

Aaron Toponce ⚛️

Feb 25

@ujay68 There are many people who believe these LLMs are legitimately intelligent and will trust anything generated by them.

Show thread

primeapple Feb 25

@atoponce @ujay68 To be honest... These people also wouldn't use a password generator to begin with.

Show thread

Aaron Toponce ⚛️

Feb 25

@primeapple @ujay68 Unfortunately, I agree.

Show thread

John Ulrik Feb 25

@atoponce 🙈

Show thread

Dmitry Chestnykh ☮️Feb 25

@atoponce yes, LLM entropy is limited by the random generator used in the sampling process. Usually, it's some fast non-CSPRNG.

Show thread

Aaron Toponce ⚛️

Feb 25

@dchest Does the LLM have access to a proper RNG library, or is it just running through some linear algebra looking for a specific quality metric before returning?

Admittedly, I'm ignorant to LLM and ML specifics.

Show thread

Dmitry Chestnykh ☮️Feb 25

@atoponce pure LLMs generate probabilities for the next token deterministically, and then select from them with some randomness from a simple seeded RNG (sampling usually depends on temperature parameter). This is not required, as they can just select the top probability and be completely deterministic. However, most of the available LLMs can call external tools, e.g. Python's secrets and print the output from it.

Show thread

Dmitry Chestnykh ☮️Feb 25

@atoponce Here's a 14-line inference -- I highlighted where randomness is used https://gist.github.com/karpathy/8627fe009c40f57531cb18360106ce95#file-microgpt-py-L196

microgpt

microgpt. GitHub Gist: instantly share code, notes, and snippets.

Gist

Show thread

Aaron Toponce ⚛️

Feb 25

@dchest Interesting. Thanks!

So why is everything failing so badly here? I mean, there is structure that even a non-CSPRNG wouldn't produce. Look at Mistral Small 3 for example:

<lower><upper><digit><punct><upper><digit><lower><punct><upper><lower.

Or Gemini 3.1 Pro with "secure":

Show thread

Aaron Toponce ⚛️

Feb 25

@dchest Further, the punctuation characters seem to be limited to !@#$%^&*. I haven't analyzed the digits or alphabetic characters yet, but I doubt I'm going to be surprised finding that the full sets are not utilized and on top of that, there is bias towards certain numbers and characters.

Show thread

Dmitry Chestnykh ☮️Feb 25

@atoponce I guess they learned these specific patterns from training data, so, e.g. after <lower> top probabilities are for <upper>, then after <lower><upper> it's <digit>. Differently trained, so different patterns.

Show thread

Dmitry Chestnykh ☮️Feb 25

@atoponce or maybe it saw some Usenet post in the training data that recommended this pattern and considered it important for some reason; or there was some leaked passwords database with tons of password examples like this; or post-training humans steered it to reply to "generate password" this specific way. Nobody knows, not even its creators! :)

Show thread

Aaron Toponce ⚛️

Feb 25

@dchest At any event, it's clear that LLMs most definitely should not be used as password generators. Too much structure and too little unpredictability.

Show thread

Dmitry Chestnykh ☮️Feb 25

@atoponce exactly

I just use this, and keep hitting enter until I can see a password I think I can remember easily
export LANG=C; while [ true ] ; do sed 's/[^[:alnum:][:punct:]]//g' /dev/random|cut -c 1-10|head -1; read ; done

Show thread

fuzzyfuzzyfungus Feb 25

@atoponce I can't believe that a system fundamentally based on stringing together maximally probable outputs would be bad at generating high entropy passwords.

Show thread

Aaron Toponce ⚛️

Feb 25

@fuzzyfuzzyfungus Right? Yet they're being utilized for exactly that.