TryHackme Walkthrough — Overpass
The vulnerability discovered was an Authentication Bypass due to weak password hashing. The application used the MD5 algorithm, which is insecure and easily reversible, for password hashing. By analyzing leaked passwords from a previous data breach, the researcher found weak credentials that allowed them to guess the hash of the target user account (e.g., 'admin' => '5eb6fb193f2cc04e9bf03a8971b5955d'). Using Burp Suite's Intruder tool, they injected a payload to brute-force the login with the known hash. The application accepted the hashed password without verifying its integrity, leading to unauthorized access. This flaw could have resulted in sensitive data exposure and potential account takeovers. The researcher received a reward for reporting this issue. Proper remediation involves using strong password hashing algorithms such as BCrypt or Argon2. Key lesson: Always use secure hashing algorithms (e.g., BCrypt, Argon2) instead of insecure ones like MD5 to protect user credentials. #BugBounty #Cybersecurity #WebSecurity #AuthenticationBypass #PasswordHashing

https://seclak07.medium.com/tryhackme-walkthrough-overpass-838c1e204334?source=rss------bug_bounty-5

TryHackme Walkthrough — Overpass

Introduction

Medium
🚀 Ah, the riveting saga of Argon2: the password hashing algorithm that's as widely adopted as a fax machine in 2023 📠. This academic snoozefest boldly explores what no one asked—just how effective is #Argon2 in the "real world" where nobody uses it anyway. 🤦‍♂️
https://arxiv.org/abs/2504.17121 #PasswordHashing #CyberSecurity #TechTrends #AcademicDiscussion #HackerNews #ngated
Evaluating Argon2 Adoption and Effectiveness in Real-World Software

Modern password hashing remains a critical defense against credential cracking, yet the transition from theoretically secure algorithms to robust real-world implementations remains fraught with challenges. This paper presents a dual analysis of Argon2, the Password Hashing Competition winner, combining attack simulations quantifying how parameter configurations impact guessing costs under realistic budgets, with the first large-scale empirical study of Argon2 adoption across public GitHub software repositories. Our economic model, validated against cryptocurrency mining benchmarks, demonstrates that OWASP's recommended 46 MiB configuration reduces compromise rates by 42.5% compared to SHA-256 at \$1/account attack budgets for strong user passwords. However, memory-hardness exhibits diminishing returns as increasing allocations to RFC 9106's 2048 MiB provides just 23.3% (\$1) and 17.7% (\$20) additional protection despite 44.5 times greater memory demands. Crucially, both configurations fail to mitigate risks from weak passwords, with 96.9-99.8% compromise rates for RockYou-like credentials regardless of algorithm choice. Our repository analysis shows accelerating Argon2 adoption, yet weak configuration practices: 46.6% of deployments use weaker-than-OWASP parameters. Surprisingly, sensitive applications (password managers, encryption tools) show no stronger configurations than general software. Our findings highlight that a secure algorithm alone cannot ensure security, effective parameter guidance and developer education remain essential for realizing Argon2's theoretical advantages.

arXiv.org

Beyond Bcrypt

In 2010, Coda Hale wrote How To Safely Store A Password which began with the repeated phrase, “Use bcrypt”, where the word bcrypt was linked to a different implementation for various programming languages.

This had two effects on the technology blogosphere at the time:

  • It convinced a lot of people that bcrypt was the right answer for storing a password.
  • It created a meme for how technology bloggers recommend specific cryptographic algorithms when they want attention from Hacker News.
  • At the time, it was great advice!

    Credit: CMYKat

    In 2010, bcrypt was the only clearly good answer for password hashing in most programming languages.

    In the intervening almost fifteen years, we’ve learned a lot more about passwords, password cracking, authentication mechanism beyond passwords, and password-based cryptography.

    If you haven’t already read my previous post about password-based cryptography, you may want to give that one a once-over before you continue.

    But we’ve also learned a lot more about bcrypt, its limitations, the various footguns involved with using it in practice, and even some cool shit you can build with it.

    In light of a recent discussion about switching WordPress’s password hashing algorithm from PHPass (which is based on MD5) to bcrypt, I feel now is the perfect time to dive into this algorithm and its implications on real-world cryptography.

    Understanding Bcrypt in 2024

    Bcrypt is a password hashing function, but not a password KDF or general-purpose cryptographic hash function.

    If you’re using a sane password storage API, such as PHP’s password API, you don’t even need to think about salting your passwords, securely verifying passwords, or handling weird error conditions. Instead, you only need concern yourself with the “cost” factor, which exponentially increases the runtime of the algorithm.

    There’s just one problem: bcrypt silently truncates after 72 characters (or rather, bytes, if you’re pedantic and assume non-ASCII passwords, such as emoji).

    Here’s a quick script you can run yourself to test this:

    <?php$example1 = str_repeat('A', 72);$example2 = $example1 . 'B';$hash = password_hash($example1, PASSWORD_BCRYPT);var_dump(password_verify($example2, $hash));

    This may sound ludicrous (“who uses 72 character passwords anyway?”) until you see security advisories like this recent one from Okta.

    The Bcrypt algorithm was used to generate the cache key where we hash a combined string of userId + username + password. Under a specific set of conditions, listed below, this could allow users to authenticate by providing the username with the stored cache key of a previous successful authentication.

    (…)

    • The username is 52 characters or longer

    The other thing to consider is that many people use passphrases, such as those generated from Diceware, which produce longer strings with less entropy per character.

    If you use bcrypt as-is, you will inevitably run into this truncation at some point.

    “Let’s pre-hash passwords!”

    In response to this limitation, many developers will suggest pre-hashing the password with a general purpose cryptographic hash function, such as SHA-256.

    And so, in pursuit of a way to avoid one footgun, developers introduced two more.

    AJ

    Truncation on NUL Bytes

    If you use the raw binary output of a hash function as your password hash, be aware that bcrypt will truncate on NUL (0x00) bytes.

    With respect to the WordPress issue linked above, the default for PHP’s hashing API is to output hexadecimal characters.

    This is a bit wasteful. Base64 is preferable, although any isomorphism of the raw hash output that doesn’t include a 0x00 byte is safe from truncation.

    Hash Shucking

    When a system performs a migration from a cryptographic hash function (e.g., MD5) to bcrypt, they typically choose to re-hash the existing hash with bcrypt.

    Because users typically reuse passwords, you can often take the fast, unsalted hashes from another breach and use it as your password dictionary for bcrypt.

    If then you succeed in verifying the bcrypt password for a fast hash, you can then plug the fast hash into software like Hashcat, and then crack the actual password at a much faster rate (tens of billions of candidates/second, versus thousands per second).

    This technique is called hash shucking (YouTube link).

    You can avoid hash shucking by using HMAC with a static key–either universal for all deployments of your software, or unique per application.

    It doesn’t really matter which you choose; all you really need from it is domain separation from naked hashes.

    I frequently see this referred to as “peppering”, but the term “pepper” isn’t rigidly defined anywhere.

    One benefit of using a per-application HMAC secret does make your hashes harder to crack if you don’t know this secret.

    For balance, one downside is that your hashes are no longer portable across applications without managing this static key.

    Disarming Bcrypt’s Footguns

    Altogether, it’s quite straightforward to avoid bcrypt’s footguns, as I had recommended to WordPress last week.

  • Pre-hash with HMAC-SHA512.
  • Ensure the output of step 1 is base64-encoded.
  • Pass the output of step 2 to PHP’s password API.
  • Easy, straightforward, and uncontroversial. Right?

    Objections to Bcrypt Disarmament

    The linked discussion was tedious, so I will briefly describe the objections raised to my suggestion.

  • This is “rolling our own crypto”.
    • Answer: No, it’s a well-understood pattern that’s been discussed in the PHP community for well over a decade.
  • Passwords over 72 characters are rare and not worthy of our consideration.
    • Answer: No, this has bit people in unexpected ways before (see: Okta).

      When you develop a popular CMS, library, or framework, you cannot possibly know all the ways that your software will be used by others. It’s almost always better to be misuse-resistant.

  • Pre-hashing introduces a Denial-of-Service attack risk.
    • Answer: No. Bcrypt with a cost factor of 10 is about 100,000 times as expensive as SHA2.
  • This introduces a risk of hash shucking.
    • As demonstrated above, HMAC doesn’t suffer this problem (assuming the key is reasonably selected).
  • Base64 encoding reduces entropy.
    • Answer: No, it’s isomorphic.
  • Base64 with the 72 character truncation reduces entropy.
    • Answer: We’re still truncating SHA-512 to more than 256 bits of its output, so this doesn’t actually matter for any practical security reason.
  • This would necessitate a special prefix (e.g. $2w$) to distinguish disarmed bcrypt from vanilla bcrypt that PHP’s password API wouldn’t know what to do with.
    • This is a trivial concern, for which the fix is also trivial:
      After password_hash(), modify the prefix with a marker to indicate pre-hashing.
      Before password_verify(), swap the original prefix back in.
  • There were some other weird arguments (such as “Bcrypt is approved by NIST for FIPS”, which is just plain false).

    Why Bcrypt Truncating SHA-512 Doesn’t Matter

    If you have a random secret key, HMAC-SHA-512 is a secure pseudorandom function that you can treat as a Random Oracle.

    Because it’s HMAC, you don’t have to worry about Length Extension Attacks at all. Therefore, the best known attack strategy is to produce a collision.

    The raw binary output of SHA-512 is 64 characters, but may contain NUL characters (which would truncate the hash). To avoid this, we base64-encode the output.

    When you base64-encode a SHA-512 hash, the output is 88 characters (due to base64 padding). This is longer than the 72 characters supported by bcrypt, so it will truncate silently after 72 characters.

    This is still secure, but to prove this, I need to use math.

    First, let’s assume you’re working with an extremely secure, high-entropy password, and might be negatively impacted by this truncation. How bad is the damage in this extreme case?

    There are 64 possible characters in the base64 alphabet. That’s tautology, after all.

    If you have a string of length 72, for which each character can be one of 64 values, you can represent the total probability space of possible strings as .

    If you know that , you can do a little bit of arithmetic and discover this quantity equal to .

    As I discussed in my deep dive on the birthday bound, you can take the cube root of this number to find what I call the Optimal Birthday Bound.

    This works out to samples in order to find a probability of a single collision.

    This simply isn’t going to happen in our lifetimes.

    2^-144 is about 17 trillion times less likely than 2^-100.

    The real concern is the entropy of the actual password, not losing a few bits from a truncated hash.

    After all, even though the outputs of HMAC-SHA512 are indistinguishable from random when you don’t know the HMAC key, the input selection is entirely based on the (probably relatively easy-to-guess) password.

    “Why not just use Argon2 or Scrypt?”

    Argon2 and scrypt don’t have the bcrypt footguns. You can hash passwords of arbitrary length and not care about NUL characters. They’re great algorithms.

    Several people involved in the Password Hashing Competition (that selected Argon2 as its winner) have since lamented the emphasis on memory-hardness over cache-hardness. Cache-hardness is more important for short run-times (i.e., password-based authentication), while memory-hardness is more important for longer run-times (i.e., key derivation).

    As Sc00bz explains in the GitHub readme for his bscrypt project:

    Cache hard algorithms are better than memory hard algorithms at shorter run times. Basically cache hard algorithms forces GPUs to use 1/4 to 1/16 of the memory bandwidth because of the large bus width (commonly 256 to 1024 bits). Another way to look at it is memory transactions vs bandwidth. Also the low latency of L2 cache on CPUs and the 8 parallel look ups let’s us make a lot of random reads. With memory hard algorithms, there is a point where doubling the memory quarters a GPU attacker’s speed. There then is a point at which a memory hard algorithm will overtake a cache hard algorithm. Cache hard algorithms don’t care that GPUs will get ~100% utilization of memory transactions because it’s already very limiting.

    Ironically, bcrypt is cache-hard, while scrypt and the flavors of Argon2 that most people use are not.

    Most of my peers just care that you use a password hashing algorithm at all. They don’t particularly care which. The bigger, and more common, vulnerability is not using one of them in the first place.

    I’m mostly in agreement with them, but I would prefer that anyone that chooses bcrypt takes steps to disarm its footguns.

    Turning Bcrypt Into a KDF

    Earlier, I noted that bcrypt is not a password KDF. That doesn’t mean you can’t make one out of bcrypt. Ryan Castellucci is an amazing hacker; they managed to do just that.

    To understand why this is difficult, and why Ryan’s hack works, you need to understand what bcrypt actually is.

    Bcrypt is a relatively simple algorithm at its heart:

  • Run the Blowfish key schedule, several times, over both the password and salt.
  • Encrypt the string "OrpheanBeholderScryDoubt" 64 times in ECB mode using the expanded key from step 1.
  • Most of the heavy work in bcrypt is actually done in the key schedule; the encryption of three blocks (remember, Blowfish is a 64-bit block cipher) just ensures you need the correct resultant key from the key schedule.

    “So how do you get an encryption key out of bcrypt?”

    It’s simple: we, uh, hash the S-box.

    static void BF_kwk(struct BF_data *data, uint8_t kwk[BLAKE2B_KEYBYTES]) { BF_word *S = (BF_word *)data->ctx.S; BF_htobe(S, 4*256); // it should not be possible for this to fail... int ret = blake2b_simple(kwk, BLAKE2B_KEYBYTES, S, sizeof(BF_word)*4*256); assert(ret == 0); BF_betoh(S, 4*256);}

    Using BLAKE2b to hash the S-box from the final Blowfish key expansion yields a key-wrapping key that can be used to encrypt whatever data is being protected.

    The only feasible way to recover this key is to provide the correct password and salt to arrive at the same key schedule.

    Any attack against the selection of S implies a cryptographic weakness in bcrypt, too. (I’ve already recommended domain separation in a GitHub issue.)

    CMYKat

    It’s worth remembering that Ryan’s design is a proof-of-concept, not a peer-reviewed design ready for production. Still, it’s a cool hack.

    It’s also not the first of its kind (thanks, Damien Miller).

    If anyone was actually considering using this design, first, they should wait until it’s been adequately studied. Do not pass Go, do not collect $200.

    Additionally, the output of the BLAKE2b hash should be used as the input keying material for, e.g., HKDF. This lets you split the password-based key into multiple application-specific sub-keys without running the password KDF again for each derived key.

    Wrapping Up

    Although bcrypt is still an excellent cache-hard password hashing function suitable for interactive logins, it does have corner cases that sometimes cause vulnerabilities in applications that misuse it.

    If you’re going to use bcrypt, make sure you use bcrypt in line with my recommendations to WordPress: HMAC-SHA-512, base64 encode, then bcrypt.

    Here’s a quick proof-of-concept for PHP software:

    <?phpdeclare(strict_types=1);class SafeBcryptWrapperPoC{ private $staticKey; private $cost = 12; public function __construct( #[\SensitiveParameter] string $staticKey, int $cost = 12 ) { $this->staticKey = $staticKey; $this->cost = $cost; } /** * Generate password hashes here */ public function hash( #[\SensitiveParameter] string $password ): string { return \password_hash( $this->prehash($password), PASSWORD_BCRYPT, ['cost' => $this->cost] ); } /** * Verify password here */ public function verify( #[\SensitiveParameter] string $password, #[\SensitiveParameter] string $hash ): bool { return \password_verify( $this->prehash($password), $hash ); } /** * Pre-hashing with HMAC-SHA-512 here * * Note that this prefers the libsodium base64 code, since * it's implemented in constant-time */ private function prehash( #[\SensitiveParameter] string $password ): string { return \sodium_bin2base64( \hash_hmac('sha512', $password, $this->staticKey, true), \SODIUM_BASE64_VARIANT_ORIGINAL_NO_PADDING ); }}

    You can see a modified version of this proof-of-concept on 3v4l, which includes the same demo from the top of this blog post to demonstrate the 72-character truncation bug.

    If you’re already using bcrypt in production, you should be cautious with adding this pre-hashing alternative. Having vanilla bcrypt and non-vanilla bcrypt side-by-side could introduce problems that need to be thoroughly considered.

    I can safely recommend it to WordPress because they weren’t using bcrypt before. Most of the people reading this are probably not working on the WordPress core.

    Addendum (2024-11-28)

    More of the WordPress team has chimed in to signal support for vanilla bcrypt, rather than disarming the bcrypt footgun.

    The reason?

    That would result in maximum compatibility for existing WordPress users who use the Password hashes outside of WordPress, while also not introducing yet-another-custom-hash into the web where it’s not overly obviously necessary, but while still gaining the bcrypt advantages for where it’s possible.

    dd32

    The hesitance to introduce a custom hash construction is understandable, but the goal I emphasized with bold text is weird and not a reasonable goal for any password storage system.

    It’s true that the overwhelming non-WordPress code written in PHP is just using the password hashing API. But that means they aren’t compatible with WordPress today. PHP’s password hashing API doesn’t implement phpass, after all.

    In addition to being scope creep for a secure password storage strategy, it’s kind of a bonkers design constraint to expect password hashes be portable. Why are you intentionally exposing hashes unnecessarily?

    CMYKat

    At this point, it’s overwhelmingly likely that WordPress will choose to not disarm the bcrypt footguns, and will just ship it.

    That’s certainly not the worst outcome, but I do object to arriving there for stupid reasons, and that GitHub thread is full of stupid reasons and misinformation.

    The most potent source of misinformation also barked orders at me and then tried to dismiss my technical arguments as the concerns of “the hobbyist community”, which was a great addition to my LinkedIn profile.

    If WordPress’s choice turns out to be a mistake–that is to say, that their decision for vanilla bcrypt introduces a vulnerability in a plugin or theme that uses their password hashing API for, I dunno, API keys?–at least I can say I tried.

    Additionally, WordPress cannot say they didn’t know the risk existed, especially in a courtroom, since me informing them of it is so thoroughly documented (and archived).

    CMYKat

    Here’s to hoping the risk never actually manifests. Saying “I told you so” is more bitter than sweet in security. Happy Thanksgiving.

    Header image: Art by Jim and CMYKat; a collage of some DEFCON photos, as well as Creative Commons photos of Bruce Schneier (inventor of the Blowfish block cipher) and Niels Provos (co-designer of bcrypt, which is based on Blowfish).

    #bcrypt #cryptography #passwordHashing #passwords #SecurityGuidance

    How To Safely Store A Password

    In which I recommend bcrypt.

    codahale.com

    Key Transparency and the Right to be Forgotten

    This post is the first in a new series covering some of the reasoning behind decisions made in my project to build end-to-end encryption for direct messages on the Fediverse.

    (Collectively, Fedi-E2EE.)

    Although the reasons for specific design decisions should be immediately obvious from reading the relevant specification (and if not, I consider that a bug in the specification), I believe writing about it less formally will improve the clarity behind the specific design decisions taken.

    In the inaugural post for this series, I’d like to focus on how the Fedi-E2EE Public Key Directory specification aims to provide Key Transparency and an Authority-free PKI for the Fediverse without making GDPR compliance logically impossible.

    CMYKat‘s art, edited by me.

    Background

    Key Transparency

    For a clearer background, I recommend reading my blog post announcing the focused effort on a Public Key Directory, and then my update from August 2024.

    If you’re in a hurry, I’ll be brief:

    The goal of Key Transparency is to ensure everyone in a network sees the same view of who has which public key.

    How it accomplishes this is a little complicated: It involves Merkle trees, digital signatures, and a higher-level protocol of distinct actions that affect the state machine.

    If you’re thinking “blockchain”, you’re in the right ballpark, but we aren’t propping up a cryptocurrency. Instead, we’re using a centralized publisher model (per Public Key Directory instance) with decentralized verification.

    Add a bit of cross-signing and replication, and you can stitch together a robust network of Public Key Directories that can be queried to obtain the currently-trusted list of public keys (or other auxiliary data) for a given Fediverse user. This can then be used to build application-layer protocols (i.e., end-to-end encryption with an identity key more robust than “trust on first use” due to the built-in audit trail to Merkle trees).

    I’m handwaving a lot of details here. The Architecture and Specification documents are both worth a read if you’re curious to learn more.

    Harubaki

    Right To Be Forgotten

    I am not a lawyer, nor do I play one on TV. This is not legal advice. Other standard disclaimers go here.

    Okay, now that we’ve got that out of the way, Article 17 of the GDPR establishes a “Right to erasure” for Personal Data.

    What this actually means in practice has not been consistently decided by the courts yet. However, a publicly readable, immutable ledger that maps public keys (which may be considered Personal Data) with Actor IDs (which includes usernames, which are definitely Personal Data) goes against the grain when it comes to GDPR.

    It remains an open question of there is public interest in this data persisting in a read-only ledger ad infinitum, which could override the right to be forgotten. If there is, that’s for the courts to decide, not furry tech bloggers.

    I know it can be tempting, especially as an American with no presence in the European Union, to shrug and say, “That seems like a them problem.” However, if other folks want to be able to use my designs within the EU, I would be remiss to at least consider this potential pitfall and try to mitigate it in my designs.

    So that’s exactly what I did.

    AJ

    Almost Contradictory

    At first glance, the privacy goals of both Key Transparency and the GDPR’s Right To Erasure are at odds.

    • One creates an immutable, append-only history.
    • The other establishes a right for EU citizens’ history to be selectively censored, which means history has to be mutable.

    However, they’re not totally impossible to reconcile.

    An untested legal theory circulating around large American tech companies is that “crypto shredding” is legally equivalent to erasure.

    Crypto shredding is the act of storing encrypted data, and then when given a legal takedown request from an EU citizen, deleting the key instead of the data.

    AJ

    This works from a purely technical perspective: If the data is encrypted, and you don’t know the key, to you it’s indistinguishable from someone who encrypted the same number of NUL bytes.

    In fact, many security proofs for encryption schemes are satisfied by reaching this conclusion, so this isn’t a crazy notion.

    Is Crypto Shredding Plausible?

    In 2019, the European Parliamentary Research Service published a lengthy report titled Blockchain and the General Data Protection Regulation which states the following:

    Before any examination of whether blockchain technology is capable of complying with Article 17 GDPR; it must be underscored that the precise meaning of the term ‘erasure’ remains unclear.

    Article 17 GDPR does not define erasure, and the Regulation’s recitals are equally mum on how this term should be understood. It might be assumed that a common-sense understanding of this terminology ought to be embraced. According to the Oxford English Dictionary, erasure means ‘the removal or writing, recorded material, or data’ or ‘the removal of all traces of something: obliteration’.494

    From this perspective, erasure could be taken to equal destruction. It has, however, already been stressed that the destruction of data on blockchains, particularly these of a public and permissionless nature, is far from straightforward.

    There are, however, indications that the obligation inherent to Article 17 GDPR does not have to be interpreted as requiring the outright destruction of data. In Google Spain, the delisting of information from research results was considered to amount to erasure. It is important to note, however, that in this case, this is all that was requested of Google by the claimant, who did not have control over the original data source (an online newspaper publication). Had the claimant wished to obtain the outright destruction of the relevant data it would have had to address the newspaper, not Google. This may be taken as an indication that what the GDPR requires is that the obligation resting on data controllers is to do all they can to secure a result as close as possible to the destruction of their data within the limits of [their] own factual possibilities.

    Dr Michèle Finck, Blockchain and the General Data Protection Regulation, pp. 75-76

    From this, we can kind of intuit that the courts aren’t pedantic: The cited Google Spain case was satisfied by merely delisting the content, not the erasure of the newspaper’s archives.

    The report goes on to say:

    As awareness regarding the tricky reconciliation between Article 17 GDPR and distributed ledgers grows, a number of technical alternatives to the outright destruction of data have been considered by various actors. An often-mentioned solution is that of the destruction of the private key, which would have the effect of making data encrypted with a public key inaccessible. This is indeed the solution that has been put forward by the French data protection authority CNIL in its guidance on blockchains and the GDPR. The CNIL has suggested that erasure could be obtained where the keyed hash function’s secret key is deleted together with information from other systems where it was stored for processing.

    Dr Michèle Finck, Blockchain and the General Data Protection Regulation, pp. 76-77

    That said, I cannot locate a specific court decision that affirms that crypto erasure is legally sufficient for complying with data erasure requests (nor any that affirm that it’s necessary).

    I don’t have a crystal ball that can read the future on what government compliance will decide, nor am I an expert in legal matters.

    Given the absence of a clear legal framework, I do think it’s totally reasonable to consider crypto-shredding equivalent to data erasure. Most experts would probably agree with this. But it’s also possible that the courts could rule totally stupidly on this one day.

    Therefore, I must caution anyone that follows a similar path: Do not claim GDPR compliance just because you implement crypto-shredding in a distributed ledger. All you can realistically promise is that you’re not going out of your way to make compliance logically impossible. All we have to go by are untested legal hypotheses, and very little clarity (even if the technologists are near-unanimous on the topic!).

    Towards A Solution

    With all that in mind, let’s start with “crypto shredding” as the answer to the GDPR + transparency log conundrum.

    This is only the start of our complications.

    CMYKat

    Protocol Risks Introduced by Crypto Shredding

    Before the introduction of crypto shredding, the job of the Public Key Directory was simple:

  • Receive a protocol message.
  • Validate the protocol message.
  • Commit the protocol message to a transparency log (in this case, Sigsum).
  • Retrieve the protocol message whenever someone requests it to independently verify its inclusion.
  • Miscellaneous other protocol things (cross-directory checkpoint commitment, replication, etc.).
  • Point being: there was very little that the directory could do to be dishonest. If they lied about the contents of a record, it would invalidate the inclusion proofs of every successive record in the ledger.

    In order to make a given record crypto-shreddable without breaking the inclusion proofs for every record that follows, we need to commit to the ciphertext, not the plaintext. (And then, when a takedown request comes in, wipe the key.)

    Now, things are quite more interesting.

    Do you…

    • …Distribute the encryption key alongside the ciphertext and let independent third parties decrypt it on demand?

      …OR…

    • Decrypt the ciphertext and serve plaintext through the public API, keeping the encryption key private so that it may be shredded later?

    The first option seems simple, but runs into governance issues: How do you claim the data was crypto-shredded if countless individuals have a copy of the encryption key, and can therefore recover the plaintext from the ciphertext?

    I don’t think that would stand up in court.

    CMYKat

    Clearly, your best option is the second one.

    Okay, so how does an end user know that the ciphertext that was committed to the transparency ledger decrypts to the specific plaintext value served by the Public Key Directory? How do users know it’s not lying?

    Quick aside: This question is also relevant if you went with the first option and used a non-committing AEAD mode for the actual encryption scheme.

    In that scenario, a hostile nation state adversary could pressure a Public Key Directory to selectively give one decryption key to targeted users, and another to the rest of the Internet, in order to perform a targeted attack against citizens they’d rather didn’t have civil rights.

    My entire goal with introducing key transparency to my end-to-end encryption proposal is to prevent these sorts of attacks, not enable them.

    There are a lot of avenues we could explore here, but it’s always worth outlining the specific assumptions and security goals of any design before you start perusing the literature.

    AJ

    Assumptions

    This is just a list of things we assume are true, and do not need to prove for the sake of our discussion here today. The first two are legal assumptions; the remainder are cryptographic.

    Ask your lawyer if you want advice about the first two assumptions. Ask your cryptographer if you suspect any of the remaining assumptions are false.

  • Crypto-shredding is a legally valid way to provide data erasure (as discussed above).
  • EU courts will consider public keys to be Personal Data.
  • The SHA-2 family of hash functions is secure (ignoring length-extension attacks, which won’t matter for how we’re using them).
  • HMAC is a secure way to build a MAC algorithm out of a secure hash function.
  • HKDF is a secure KDF if used correctly.
  • AES is a secure 128-bit block cipher.
  • Counter Mode (CTR) is a secure way to turn a block cipher into a stream cipher.
  • AES-CTR + HMAC-SHA2 can be turned into a secure AEAD mode, if done carefully.
  • Ed25519 is a digital signature algorithm that provides strong security against existent forgery under a chosen-message attack (SUF-CMA).
  • Argon2id is a secure, memory-hard password KDF, when used with reasonable parameters. (You’ll see why in a moment.)
  • Sigsum is a secure mechanism for building a transparency log.
  • This list isn’t exhaustive or formal, but should be sufficient for our purposes.

    Security Goals

  • The protocol messages stored in the Public Key Directory are accompanied by a Merkle tree proof of inclusion. This makes it append-only with an immutable history.
  • The Public Key Directory cannot behave dishonestly about the decrypted plaintext for a given ciphertext without clients detecting the deception.
  • Whatever strategy we use to solve this should be resistant to economic precomputation and brute-force attacks.
  • Can We Use Zero-Knowledge Proofs?

    At first, this seems like an ideal situation for a succinct, non-interactive zero-knowledge proof.

    After all, you’ve got some secret data that you hold, and you want to prove that a calculation is correct without revealing the data to the end user. This seems like the ideal setup for Schnorr’s identification protocol.

    CMYKat

    Unfortunately, the second assumption (public keys being considered Personal Data by courts, even though they’re derived from random secret keys) makes implementing a Zero-Knowledge Proof here very challenging.

    First, if you look at Ed25519 carefully, you’ll realize that it’s just a digital signature algorithm built atop a Schnorr proof, which requires some sort of public key (even an ephemeral one) to be managed.

    Worse, if you try to derive this value solely from public inputs (rather than creating a key management catch-22), the secret scalar your system derives at will have been calculated from the user’s Personal Data–which only strengthens a court’s argument that the public key is therefore personally identifiable.

    CMKat

    There may be a more exotic zero-knowledge proof scheme that might be appropriate for our needs, but I’m generally wary of fancy new cryptography.

    Here are two rules I live by in this context:

  • If I can’t get the algorithms out of the crypto module for whatever programming language I find myself working with, it may as well not even exist.
    • Corollary: If libsodium bindings are available, that counts as “the crypto module” too.
  • If a developer needs to reach for a generic Big Integer library (e.g., GMP) for any reason in the course of implementing a protocol, I do not trust their implementation.
  • Unfortunately, a lot of zero-knowledge proof designs fail one or both of these rules in practice.

    (Sorry not sorry, homomorphic encryption enthusiasts! The real world hasn’t caught up to your ideas yet.)

    What About Verifiable Random Functions (VRFs)?

    It may be tempting to use VRFs (i.e., RFC 9381), but this runs into the same problem as zero-knowledge proofs: we’re assuming that an EU court would deem public keys Personal Data.

    But even if that assumption turns out false, the lifecycle of a protocol message looks like this:

  • User wants to perform an action (e.g., AddKey).
  • Their client software creates a plaintext protocol message.
  • Their client software generates a random 256-bit key for each potentially-sensitive attribute, so it can be shredded later.
  • Their client software encrypts each attribute of the protocol message.
  • The ciphertext and keys are sent to the Public Key Directory.
  • For each attribute, the Public Key Directory decrypts the ciphertext with the key, verifies the contents, and then stores both. The ciphertext is used to generate a commitment on Sigsum (signed by the Public Key Directory’s keypair).
  • The Public Key Directory serves plaintext to requestors, but does not disclose the key.
  • In the future, the end user can demand a legal takedown, which just wipes the key.
  • Let’s assume I wanted to build a VRF out of Ed25519 (similar to what Signal does with VXEdDSA). Now I have a key management problem, which is pretty much what this project was meant to address in the first place.

    VRFs are really cool, and more projects should use them, but I don’t think they will help me.

    CMYKat

    Soatok’s Proposed Solution

    If you want to fully understand the nitty-gritty implementation details, I encourage you to read the current draft specification, plus the section describing the encryption algorithm, and finally the plaintext commitment algorithm.

    Now that we’ve established all that, I can begin to describe my approach to solving this problem.

    First, we will encrypt each attribute of a protocol message, as follows:

    • For subkey derivation, we use HKDF-HMAC-SHA512.
    • For encrypting the actual plaintext, we use AES-256-CTR.
    • For message authentication, we use HMAC-SHA512.
    • Additional associated data (AAD) is accepted and handled securely; i.e., we don’t use YOLO as a hash construction.

    This prevents an Invisible Salamander attack from being possible.

    This encryption is performed client-side, by each user, and the symmetric key for each attribute is shared with the Public Key Directory when publishing protocol messages.

    If they later issue a legal request for erasure, they can be sure that the key used to encrypt the data they previously published isn’t secretly the same key used by every other user’s records.

    They always know this because they selected the key, not the server. Furthermore, everyone can verify that the hash published to the Merkle tree matches a locally generated hash of the ciphertext they just emitted.

    This provides a mechanism to keep everyone honest. If anything goes wrong, it will be detected.

    Next, to prevent the server from being dishonest, we include a plaintext commitment hash, which is included as part of the AAD (alongside the attribute name).

    (Implementing crypto-shredding is straightforward: simply wipe the encryption keys for the attributes of the records in scope for the request.)

    If you’ve read this far, you’re probably wondering, “What exactly do you mean by plaintext commitment?”

    Art by Scruff.

    Plaintext Commitments

    The security of a plaintext commitment is attained by the Argon2id password hashing function.

    By using the Argon2id KDF, you can make an effective trapdoor that is easy to calculate if you know the plaintext, but economically infeasible to brute-force attack if you do not.

    However, you need to do a little more work to make it safe.

    Harubaki

    The details here matter a lot, so this section is unavoidably going to be a little dense.

    Pass the Salt?

    Argon2id expects both a password and a salt.

    If you eschew the salt (i.e., zero it out), you open the door to precomputation attacks (see also: rainbow tables) that would greatly weaken the security of this plaintext commitment scheme.

    You need a salt.

    If you generate the salt randomly, this commitment property isn’t guaranteed by the algorithm. It would be difficult, but probably not impossible, to find two salts (, ) such that .

    Deriving the salt from public inputs eliminates this flexibility.

    By itself, this reintroduces the risk of making salts totally deterministic, which reintroduces the risk of precomputation attacks (which motivated the salt in the first place).

    If you include the plaintext in this calculation, it could also create a crib that gives attackers a shortcut for bypassing the cost of password hashing.

    Furthermore, any two encryptions operations that act over the same plaintext would, without any additional design considerations, produce an identical value for the plaintext commitment.

    CMYKat

    Public Inputs for Salt Derivation

    The initial proposal included the plaintext value for Argon2 salt derivation, and published the salt and Argon2 output next to each other.

    Hacker News comex pointed out a flaw with this technique, so I’ve since revised how salts are selected to make them independent of the plaintext.

    The public inputs for the Argon2 salt are now:

  • The version identifier prefix for the ciphertext blob.
  • The 256-bit random value used as a KDF salt (also stored in the ciphertext blob).
  • A recent Merkle tree root.
  • The attribute name (prefixed by its length).
  • These values are all hashed together with SHA-512, and then truncated to 128 bits (the length required by libsodium for Argon2 salts).

    This salt is not stored, but can deterministically be calculated from public information.

    Crisis Averted?

    This sure sounds like we’ve arrived at a solution, but let’s also consider another situation before we declare our job done.

    High-traffic Public Key Directories may have multiple users push a protocol message with the same recent Merkle root.

    This may happen if two or more users query the directory to obtain the latest Merkle root before either of them publish their updates.

    Later, if both of these users issue a legal takedown, someone might observe that the recent-merkle-root is the same for two messages, but their commitments differ.

    Is this enough leakage to distinguish plaintext records?

    In my earlier design, we needed to truncate the salt and rely on understanding the birthday bound to reason about its security. This is no longer the case, since each salt is randomized by the same random value used in key derivation.

    Choosing Other Parameters

    As mentioned a second ago, we set the output length of the Argon2id KDF to 32 bytes (256 bits). We expect the security of this KDF to exceed , which to most users might as well be infinity.

    With apologies to Filippo.

    The other Argon2id parameters are a bit hand-wavey. Although the general recommendation for Argon2id is to use as much memory as possible, this code will inevitably run in some low-memory environments, so asking for several gigabytes isn’t reasonable.

    For the first draft, I settled on 16 MiB of memory, 3 iterations, and a parallelism degree of 1 (for widespread platform support).

    Plaintext Commitment Algorithm

    With all that figured out, our plaintext commitment algorithm looks something like this:

  • Calculate the SHA512 hash of:
    • A domain separation constant
    • The header prefix (stored in the ciphertext)
    • The randomness used for key-splitting in encryption (stored in the ciphertext)
    • Recent Merkle Root
    • Attribute Name Length (64-bit unsigned integer)
    • Attribute Name
  • Truncate this hash to the rightmost 16 bytes (128 bits). This is the salt.
  • Calculate Argon2id over the following inputs concatenated in this order, with an output length of 32 bytes (256 bits), using the salt from step 2:
    • Recent Merle Root Length (64-bit unsigned integer)
    • Recent Merkle Root
    • Attribute Name Length (64-bit unsigned integer)
    • Attribute Name
    • Plaintext Length (64-bit unsigned integer)
    • Plaintext
  • The output (step 3) is included as the AAD in the attribute encryption step, so the authentication tag is calculated over both the randomness and the commitment.

    To verify a commitment (which is extractable from the ciphertext), simply recalculate the commitment you expect (using the recent Merkle root specified by the record), and compare the two in constant-time.

    If they match, then you know the plaintext you’re seeing is the correct value for the ciphertext value that was committed to the Merkle tree.

    If the encryption key is shredded in the future, an attacker without knowledge of the plaintext will have an enormous uphill battle recovering it from the KDF output (and the salt will prove to be somewhat useless as a crib).

    AJ

    Caveats and Limitations

    Although this design does satisfy the specific criteria we’ve established, an attacker that already knows the correct plaintext can confirm that a specific record matches it via the plaintext commitment.

    This cannot be avoided: If we are to publish a commitment of the plaintext, someone with the plaintext can always confirm the commitment after the fact.

    CMYKat

    Whether this matters at all to the courts is a question for which I cannot offer any insight.

    Remember, we don’t even know if any of this is actually necessary, or if “moderation and platform safety” is a sufficient reason to sidestep the right to erasure.

    If the courts ever clarify this adequately, we can simply publish the mapping of Actor IDs to public keys and auxiliary data without any crypto-shredding at all.

    Trying to attack it from the other direction (download a crypto-shredded record and try to recover the plaintext without knowing it ahead of time) is attack angle we’re interested in.

    Herd Immunity for the Forgotten

    Another interesting implication that might not be obvious: The more Fediverse servers and users publish to a single Public Key Directory, the greater the anonymity pool available to each of them.

    Consider the case where a user has erased their previous Fediverse account and used the GDPR to also crypto-shred the Public Key Directory entries containing their old Actor ID.

    To guess the correct plaintext, you must not only brute-force guessing possible usernames, but also permute your guesses across all of the instances in scope.

    The more instances there are, the higher the cost of the attack.

    CMYKat

    Recap

    I tasked myself with designing a Key Transparency solution that doesn’t make complying with Article 17 of the GDPR nigh-impossible. To that end, crypto-shredding seemed like the only viable way forward.

    A serialized record containing ciphertext for each sensitive attribute would be committed to the Merkle tree. The directory would store the key locally and serve plaintext until a legal takedown was requested by the user who owns the data. Afterwards, the stored ciphertext committed to the Merkle tree is indistinguishable from random for any party that doesn’t already know the plaintext value.

    I didn’t want to allow Public Key Directories to lie about the plaintext for a given ciphertext, given that they know the key and the requestor doesn’t.

    After considering zero-knowledge proofs and finding them to not be a perfect fit, I settled on designing a plaintext commitment scheme based on the Argon2id password KDF. The KDF salts can be calculated from public inputs.

    Altogether, this meets the requirements of enabling crypto-shredding while keeping the Public Key Directory honest. All known attacks for this design are prohibitively expensive for any terrestrial threat actors.

    As an added bonus, I didn’t introduce anything fancy. You can build all of this with the cryptography available to your favorite programming language today.

    CMYKat

    Closing Thoughts

    If you’ve made it this far without being horribly confused, you’ve successfully followed my thought process for developing message attribute shreddability in my Public Key Directory specification.

    This is just one component of the overall design proposal, but one that I thought my readers would enjoy exploring in greater detail than the specification needed to capture.

    Header art: Harubaki, CMYKat.

    (This post was updated on 2024-11-22 to replace the incorrect term “PII” with “personal data”. Apologies for the confusion!)

    #Argon2 #crypto #cryptography #E2EE #encryption #FederatedPKI #fediverse #passwordHashing #symmetricCryptography

    Towards End-to-End Encryption for Direct Messages in the Fediverse - Dhole Moments

    Update (2024-06-06): There is an update on this project. As Twitter’s new management continues to nosedive the platform directly into the ground, many people are migrating to what seem like d…

    Dhole Moments
    Brute force password cracking takes longer than ever, according to Hive Systems' latest audit. #PasswordCracking #BruteForceAttacks #HiveSystems #PasswordHashing #CyberSecurity #bcrypt #MD5
    thttps://www.blogger.com/blog/post/edit/2393063772924596666/7373948891148112675

    @valorin

    Indeed! I know I'm preaching to the choir, but for those playing along at home:

    Selecting a bcrypt cost falls into two classes of use case:

  • individual UX (personal/per-user), and

  • aggregate UX (for non-trivial numbers of concurrent auths and/or thundering herds / reauth storms).

  • For self-contained / standalone, and smaller user populations, the first case is primary. Admins can maximize bcrypt cost based solely on whether their (small group of users) can tolerate X milliseconds of delay - without regard to how users will impact each other. Bcrypt cost 13 - or even higher - can be feasible here.

    In the second case (non-trivial user populations) - the game is to balance both use cases - individual UX, but adding in auth-per-second statistics, thundering herd scenarios, sufficient hardware budget to support robust user password protection, etc.

    And if you're maintaining a general framework that needs to support either use case ... the best thing for the ecosystem may be to guide the downstream admin towards the best choice for them (explanatory comment in a config file, interactive + informed selection during an install, etc.).

    The nice thing about most bcrypt implementations is that they're forward- and backward-compatible, supporting multiple costs simultaneously (with all new users, and all password resets, following the new default). So for larger user populations, the admin can shape their aggregate load over time, to manage performance "forward" to increase bcrypt cost apace with CPU speed increases, to keep per-auth speed as close to that "500ms to 1 second" UX window as possible over time.

    #bcrypt #passwords #passwordhashing

    @timwolla

    Bitwarden’s been throwing warnings on my phone telling me to scale back my hashing parameters because they might fail on this device.
    Of course, now that I post about it, it’s not doing it so I can’t screenshot it…
    #Bitwarden #PasswordHashing #Infosec
    Is there a hash function that's more expensive for an attacker than for the server?

    Say a server wants to hash a password $p$. It would use a secure hash function $H$ and a unique salt $s$ to hash the password as $H(p,s)$. If one has access to the salt, each password candidate req...

    Cryptography Stack Exchange