Somebody asked whether dictionary-word passphrases (“correct horse battery staple”, like the ones generated by 1Password) are any good. Short answer: good means different things. Shorter answer: yes!

I’ll talk about why in a thread below.

The basic idea of these passphrases is that you have a dictionary of D words. You pick N words at random. That’s the whole idea. Example: “overlook-hooey-valance-flood-useless-ladyship”.

Cryptocurrency BIP32 passwords use a 2048 (2^11) list, and use 12-24 words per passphrase. 1Password seems to use a larger list, between 18000-18500 words (2^14.15) and you can pick your length (6-8 is common.) https://github.com/1Password/spg/blob/master/agilewords.go

Someone in my timeline asked for papers saying these were good passwords. From a purely mathematical perspective we don’t need a paper, just a toot. But there’s more than math here.

Password quality is about three things: strength (how long til Mallory guesses it, perhaps with a powerful computer), memorability (can you keep it in your head) and usability (can you enter it into a website or device.) Only the first one involves any math.

The math for dictionary passphrases is pretty simple. Assuming you choose words uniformly at random: if your dictionary has D words and your oassphrase is N words long, then there are D^N total passphrases.

The total matters because for a random passphrase the best strategy for guessing is to try all (or most) of them. This D^N determines password cracking time.

A simpler way to do this math is with powers of 2. The 1 password dictionary is about 2^14 in size, so for a 6 word password we get 2^{14*6} = 2^84.

Cryptographers tend to treat anything over 2^80 as “probably good enough to secure your Bank of America account” and anything over 2^128 as “probably good enough to secure really important stuff”. I told you there’d be science.

For comparison, last I checked the Bitcoin network was computing about 2^64 hashes every 10 minutes and using as much electricity as Argentina.

Bitcoin doesn’t crack passwords, but if it could & the entire Bitcoin network was cracking your 6-word 1Password phrase, it would take about 9.5 years on average.

But what about human memorability? Can people memorize such complex passwords? The answer is “yes”, because I just memorized one.

If you don’t accept N=1 studies, then there are a few studies. This one looks at 3-4 word passphrases: https://cups.cs.cmu.edu/soups/2012/proceedings/a7_Shay.pdf

Here is another more recent study that focuses on 56-bit (2^56 strength) 6-word passphrases and discusses strategies that help people memorize them. It turns out that “spaced repetition” (making people learn the password over a period of time) works well enough that many don’t have to write it down. https://www.usenix.org/system/files/conference/usenixsecurity14/sec14-paper-bonneau.pdf
The final barrier is usability: can people actually use passphrases on the Internet? Sadly here the answer is “it depends.” The problem is that many website designers have decided on cargo-cult security procedures like “letters, numbers, special characters required”. Some even institute character limits.

If you’re looking for a recommendation here, I would urge you to do the following:

1. Use a good password manager with a strong random 6-8 word master passphrase.
2. Write it down (one safe place) and practice entering it from memory on a regular basis. You will eventually remember it.
3. Let the password manager generate passwords for individual sites.

There are no guarantees, but this is probably the safest way to keep passwords online.

And finally, as someone reminds me in replies (need quote toot here!): use 2FA/MFA wherever possible. Preferably 2FA/MFA based on an app/YubiKey rather than SMS codes.

(Do not ask me about backing up app 2FA, I don’t have a great answer.)

Also re-reading the early part of this thread I was a little fuzzy on how passphrases are generated, argh.

You have a dictionary of D words. You pick one word at random. And then you repeat this process N times. The clarification is: specific words can repeat more than once within a single passphrase. In practice this rarely happens (for large dictionaries) but it would certainly change the math.

@matthew_d_green Are there any studies on whether you can increase memorability by imposing some structure on the sequence of words but increasing the dictionary size to compensate?
@dalias I thought I saw one with natural language in the title but now I can’t find it.
@dalias In principle, you can do this with algorithms like GPT-x. These algorithms work by sampling tokens output by the model, so each phrase has a known entropy above and beyond the model output.
@dalias @matthew_d_green you only gain a small amount of extra entropy for each doubling of the word list (from, say, 32k to 64k goes from 15 to 16 bits of entropy per word) whereas each adding of an extra word to the passphrase increases entropy by 16 bits (for a 64k wordlist) for example.
@matthew_d_green for Italian speaking people it's even easier because we have a lot of slightly different verb forms and we almost never have doubt about how to spell them
@matthew_d_green Might be worth pointing out that "bunch of seemingly unconnected words" is not the same thing as "uniformly at random." It is extremely hard to get humans to do the latter; esp when choosing from a dictionary where some entries are orders of magnitude more familiar/likely than others.
@matthew_d_green Authy has a really nice 2FA backup solution. The private keys are encrypted locally using a user-supplied password (cached in device secure storage), and the whole bundle is stored on their servers. Since you aren't entering the backup password as part of auth flows, it's very safe to randomly generate something very strong and stick it in your password manager.
@djspiewak What I don't like about Authy is that it's hard to get the tokens out. OTOH, it's cool that you can recover with your phone number. I always worry about all my devices getting lost or burning in a fire.

@matthew_d_green Thank you. This is very interesting, and clear.

But I'm manually picking words at 'random' (from my Fediverse timeline, actually). I wonder how I can tell how big my D is?

@matthew_d_green You can use diceware for a low-tech way to generate truly random, word-based passphrases with pen, paper and 6-side dice.

See https://theworld.com/~reinhold/diceware.html for the original implementation. There are many more pages on that topic on the Web nowadays.

@matthew_d_green

One point worth noting:
In situations where it's possible to do so, adding MFA is likely going to be dramatically more impactful than worrying about the difference between words and random characters.

@matthew_d_green I don’t think that a word appearing multiple times in a generated password “changes the math” unless you are somehow disallowing such cases. If a word appears multiple times in a list, then you have problems.
@jpgoldberg @matthew_d_green Any entropy/crackability calculations must assume replacement no? Otherwise entropy would drop after each selection and odds of selecting a given word would change, violating the assumption that all words have equal chance of being selected.
@scottlougheed @matthew_d_green @jpgoldberg Hmm, but the difference is negligible, given a reasonably large dictionary 🤷‍♂️
6 words out of 18k words means 2^(6*14.135) possible outcomes, i.e. about 2^84, with or without repetitions.
For a 10-words dictionary, however, there would be an order of magnitude of difference between the repetitions/no-repetition variants.
@scottlougheed @matthew_d_green @odony true, and it would be easy enough to design as generator to do that, but I don’t see much motivation to do so.
@odony @matthew_d_green @jpgoldberg and to your point, the odds of seeing a repeated word given a large enough dictionary is also likely very limited. I suppose we’re ultimately nibbling at the margins of password strength here.

@jpgoldberg @matthew_d_green Speaking of usability, I find typing those long passwords pretty painful, especially on smartphones. Using spaces as separator rather than hyphens/underscore helps, as you don’t have to switch keyboard mode. Maybe that should be the default for 1Password? 😇
I guess it looks less passwordly though.

No separator at all is nice too, even if it costs a little entropy (in-put-clammy = input-clam-my). Intuitively, I don’t think those collisions put a significant dent in the entropy 🤔

@jpgoldberg What I meant is sampling with replacement (D^N), ie each word is sampled independent from the other words and a given word can be chosen twice in the same passphrase. What I originally wrote sounds like sampling without replacement (D choose N), that is: pick N unique words out of the dictionary, one at a time so that a given word only appears once in a passphrase. The number of passphrases is slightly smaller in the second case.
@matthew_d_green, ah. I didn’t read the original as (D choose N) so I misunderstood your correction.
@matthew_d_green is there a good (as in brief and understandable for non security people) comparison of SMS vs HOTP/TOTP vs U2F/FIDO 2FA? Any is obviously better than none, and SMS is the weakest, but otherwise?

@zhenech @matthew_d_green Biggest account takeover threat is phishing. Biggest mistake is password abuse: reused or guessable.

U2F/FIDO protects you from takeover via phishing, SIM swap, and password abuse.

HOTP/TOTP protects you from takeover via SIM swap and password abuse.

SMS protects you from takeover via password abuse. Vulnerable to SIM swap if enabled at all, even if unused.

(Password manager also protects you from account takeover via SIM swap and password abuse, without 2FA.)

@zhenech @matthew_d_green Footnote: 1FA vs 2FA is a bit of a red herring.

Passkey is 1FA but protects against the same threats as U2F/FIDO. Likewise ssh keys or TLS client certificates, in their domains (with privacy caveats).

Password from manager is 1FA but protects against largely the same threats as HOTP/TOTP 2FA (with minor exceptions).

SMS is 2FA, but having it enabled at all renders you vulnerable to phishing, if the attacker can pull off a SIM swap—even if you always use U2F/FIDO!

@matthew_d_green The Duo MFA app is free for personal use and is backed up / recoverable / able to be used on multiple devices
@matthew_d_green Does that imply that adding a "!" to the end of your phrase will foil most of the attackers?
@benfulton @matthew_d_green Now that you've said that, no. 🤪
@benfulton @matthew_d_green Model it as: there are maybe 1000 or 10000 reasonably likely "haha look how clever I am" transformations of a given password pattern, so figure doing stuff like this adds at most 10-14 bits of entropy - and that's only if you pick one truly randomly.
@dalias @matthew_d_green But still, an attack program might start with the word dictionary before bothering with the transformations, so it might put those out on the far end of the attempt list.
@benfulton @matthew_d_green All entropy modeling with brute force attacks is assuming on average the attacker tries half of the combinations before succeeding. So bits already encompass that.
@benfulton @matthew_d_green Or randomly misspelling one word in an uncommon way 🙌
@benfulton @matthew_d_green
No but if you add one somewhere in the middle it will.
@matthew_d_green I thought banks considered 10⁴ strong enough to secure your account. 🤪 🤣

@matthew_d_green

(runs in with the obligatory)

https://xkcd.com/936/

Password Strength

xkcd

@That_AC @matthew_d_green

It should be noted, though, that this recommendation as cited is now out of date. Another response here included the table of bits in passwords and how long it would take to crack them; the XKCD says four words, for ~44 bits, and that is now in the low rows -- it would take only hours for an attacker with significant, but not excessive, computational power. To be reasonably secure, you need six words now (as noted in original thread).

https://infosec.exchange/@davep/109727386234680841

David Penfold :verified: (@[email protected])

Attached: 1 image @[email protected] I like this

Infosec Exchange
@shaib @That_AC @matthew_d_green surely this number depends on the language. German would probably need fewer words than French if the number of characters is the concern.
@alan @That_AC @matthew_d_green
The number of characters is of almost no concern at all. The important number is the size of the set of words to be chosen from (the XKCD used an estimate of ~2K common words in English; according to OP, the 1Password set is about an order of magnitude larger).
@shaib @That_AC @matthew_d_green Doesn't this assume the attacker knows the password is a pass phrase (and what characters are used as separators if any, and if capitalization is used in any way - e.g. in German this effectively doubles the attempts for any pass phrase containing an umlaut or esszett, plus also doubling any pass phrase containing a noun).

@shaib
E.g. "Wirsing gekrönt Hof blau Kinder" could be any of
wirsing-gekroent-hof-blau-kinder
Wirsing-gekroent-Hof-blau-Kinder
Wirsing-gekrönt-Hof-blau-Kinder
wirsing-gekrönt-hof-blau-kinder
Plus the same for any other separator.

Not to mention that password managers like Bitwarden offer appending a random 3-4 digit number, which further complicates brute forcing.

I understand that the entropy in theory is less than generating an arbitrary string of the same length, but using word count and pool size alone also doesn't sound right.

@alan
Yes, spelling variations should be useful in terms of entropy. It's a trade-off, though: if you don't always use the most straightforward spelling, you pay in memorability.
@That_AC @matthew_d_green Also https://xkcd.com/538/ but that's only if they specifically care about one account, as opposed to "okay we stole a whole database, which ones are easiest to crack and then do something useful with".
Security

xkcd
@matthew_d_green I like this
@matthew_d_green never come across this before. I like it! Nice share.
@matthew_d_green I stole it ages ago for my server-side passphrase generator https://penfold.fr/ which you shouldn't use unless you trust me totally. And even then you shouldn't use it.
Seven Word Passphrase Generator from 58000 words

Seven Word Passphrase Generator from 58000 words

@davep @matthew_d_green
The number of guesses per second depends not only on the attacker's hardware, but also on the algorithm used for hashing the password before storing it. Bitcoin mining hardware is optimised for SHA-256, it would be useless when used against passwords hashed with scrypt or bcrypt.
Keep in mind that the password is a secret shared between you and the computer: do you trust the way the computer stores it? If not, use a high-entropy password.
@matthieu @matthew_d_green of course. It depends on the attacker's rate of attack

@davep @matthew_d_green which is hard to evaluate. For example, if your password is one in a million leaked together, how much time will the attacker be willing to spend on it?

I'm as lazy as the next person, is there a way to balance the convenience of a low-entropy password and the security of a high-entropy one by estimating the probability of my password being cracked if leaked? E.g. what rate of attack should be expected today?