Somebody asked whether dictionary-word passphrases (“correct horse battery staple”, like the ones generated by 1Password) are any good. Short answer: good means different things. Shorter answer: yes!

I’ll talk about why in a thread below.

The basic idea of these passphrases is that you have a dictionary of D words. You pick N words at random. That’s the whole idea. Example: “overlook-hooey-valance-flood-useless-ladyship”.

Cryptocurrency BIP32 passwords use a 2048 (2^11) list, and use 12-24 words per passphrase. 1Password seems to use a larger list, between 18000-18500 words (2^14.15) and you can pick your length (6-8 is common.) https://github.com/1Password/spg/blob/master/agilewords.go

Someone in my timeline asked for papers saying these were good passwords. From a purely mathematical perspective we don’t need a paper, just a toot. But there’s more than math here.

Password quality is about three things: strength (how long til Mallory guesses it, perhaps with a powerful computer), memorability (can you keep it in your head) and usability (can you enter it into a website or device.) Only the first one involves any math.

The math for dictionary passphrases is pretty simple. Assuming you choose words uniformly at random: if your dictionary has D words and your oassphrase is N words long, then there are D^N total passphrases.

The total matters because for a random passphrase the best strategy for guessing is to try all (or most) of them. This D^N determines password cracking time.

A simpler way to do this math is with powers of 2. The 1 password dictionary is about 2^14 in size, so for a 6 word password we get 2^{14*6} = 2^84.

Cryptographers tend to treat anything over 2^80 as “probably good enough to secure your Bank of America account” and anything over 2^128 as “probably good enough to secure really important stuff”. I told you there’d be science.

For comparison, last I checked the Bitcoin network was computing about 2^64 hashes every 10 minutes and using as much electricity as Argentina.

Bitcoin doesn’t crack passwords, but if it could & the entire Bitcoin network was cracking your 6-word 1Password phrase, it would take about 9.5 years on average.

But what about human memorability? Can people memorize such complex passwords? The answer is “yes”, because I just memorized one.

If you don’t accept N=1 studies, then there are a few studies. This one looks at 3-4 word passphrases: https://cups.cs.cmu.edu/soups/2012/proceedings/a7_Shay.pdf

Here is another more recent study that focuses on 56-bit (2^56 strength) 6-word passphrases and discusses strategies that help people memorize them. It turns out that “spaced repetition” (making people learn the password over a period of time) works well enough that many don’t have to write it down. https://www.usenix.org/system/files/conference/usenixsecurity14/sec14-paper-bonneau.pdf
The final barrier is usability: can people actually use passphrases on the Internet? Sadly here the answer is “it depends.” The problem is that many website designers have decided on cargo-cult security procedures like “letters, numbers, special characters required”. Some even institute character limits.

If you’re looking for a recommendation here, I would urge you to do the following:

1. Use a good password manager with a strong random 6-8 word master passphrase.
2. Write it down (one safe place) and practice entering it from memory on a regular basis. You will eventually remember it.
3. Let the password manager generate passwords for individual sites.

There are no guarantees, but this is probably the safest way to keep passwords online.

And finally, as someone reminds me in replies (need quote toot here!): use 2FA/MFA wherever possible. Preferably 2FA/MFA based on an app/YubiKey rather than SMS codes.

(Do not ask me about backing up app 2FA, I don’t have a great answer.)

Also re-reading the early part of this thread I was a little fuzzy on how passphrases are generated, argh.

You have a dictionary of D words. You pick one word at random. And then you repeat this process N times. The clarification is: specific words can repeat more than once within a single passphrase. In practice this rarely happens (for large dictionaries) but it would certainly change the math.

@matthew_d_green Are there any studies on whether you can increase memorability by imposing some structure on the sequence of words but increasing the dictionary size to compensate?
@dalias I thought I saw one with natural language in the title but now I can’t find it.
@dalias In principle, you can do this with algorithms like GPT-x. These algorithms work by sampling tokens output by the model, so each phrase has a known entropy above and beyond the model output.
@dalias @matthew_d_green you only gain a small amount of extra entropy for each doubling of the word list (from, say, 32k to 64k goes from 15 to 16 bits of entropy per word) whereas each adding of an extra word to the passphrase increases entropy by 16 bits (for a 64k wordlist) for example.
@matthew_d_green for Italian speaking people it's even easier because we have a lot of slightly different verb forms and we almost never have doubt about how to spell them
@matthew_d_green Might be worth pointing out that "bunch of seemingly unconnected words" is not the same thing as "uniformly at random." It is extremely hard to get humans to do the latter; esp when choosing from a dictionary where some entries are orders of magnitude more familiar/likely than others.
@matthew_d_green Authy has a really nice 2FA backup solution. The private keys are encrypted locally using a user-supplied password (cached in device secure storage), and the whole bundle is stored on their servers. Since you aren't entering the backup password as part of auth flows, it's very safe to randomly generate something very strong and stick it in your password manager.
@djspiewak What I don't like about Authy is that it's hard to get the tokens out. OTOH, it's cool that you can recover with your phone number. I always worry about all my devices getting lost or burning in a fire.

@matthew_d_green Thank you. This is very interesting, and clear.

But I'm manually picking words at 'random' (from my Fediverse timeline, actually). I wonder how I can tell how big my D is?

@matthew_d_green You can use diceware for a low-tech way to generate truly random, word-based passphrases with pen, paper and 6-side dice.

See https://theworld.com/~reinhold/diceware.html for the original implementation. There are many more pages on that topic on the Web nowadays.

@matthew_d_green

One point worth noting:
In situations where it's possible to do so, adding MFA is likely going to be dramatically more impactful than worrying about the difference between words and random characters.

@matthew_d_green I don’t think that a word appearing multiple times in a generated password “changes the math” unless you are somehow disallowing such cases. If a word appears multiple times in a list, then you have problems.
@jpgoldberg @matthew_d_green Any entropy/crackability calculations must assume replacement no? Otherwise entropy would drop after each selection and odds of selecting a given word would change, violating the assumption that all words have equal chance of being selected.
@scottlougheed @matthew_d_green @jpgoldberg Hmm, but the difference is negligible, given a reasonably large dictionary 🤷‍♂️
6 words out of 18k words means 2^(6*14.135) possible outcomes, i.e. about 2^84, with or without repetitions.
For a 10-words dictionary, however, there would be an order of magnitude of difference between the repetitions/no-repetition variants.
@scottlougheed @matthew_d_green @odony true, and it would be easy enough to design as generator to do that, but I don’t see much motivation to do so.
@odony @matthew_d_green @jpgoldberg and to your point, the odds of seeing a repeated word given a large enough dictionary is also likely very limited. I suppose we’re ultimately nibbling at the margins of password strength here.

@jpgoldberg @matthew_d_green Speaking of usability, I find typing those long passwords pretty painful, especially on smartphones. Using spaces as separator rather than hyphens/underscore helps, as you don’t have to switch keyboard mode. Maybe that should be the default for 1Password? 😇
I guess it looks less passwordly though.

No separator at all is nice too, even if it costs a little entropy (in-put-clammy = input-clam-my). Intuitively, I don’t think those collisions put a significant dent in the entropy 🤔

@jpgoldberg What I meant is sampling with replacement (D^N), ie each word is sampled independent from the other words and a given word can be chosen twice in the same passphrase. What I originally wrote sounds like sampling without replacement (D choose N), that is: pick N unique words out of the dictionary, one at a time so that a given word only appears once in a passphrase. The number of passphrases is slightly smaller in the second case.
@matthew_d_green, ah. I didn’t read the original as (D choose N) so I misunderstood your correction.