Two Attacks on Naive Tree Hashes, https://jacko.io/tree_hashing.html. Nice little article. While this boils down to https://en.wikipedia.org/wiki/Domain_separation, which you always need to do when hashing data types, I've definitely seen people mess this up with ad-hoc tree hashes in particular. I find the easiest way of avoiding this class of problems is to think of it as streaming serialization. Non-injective serialization is not reversible! Only the primitive compression function is allowed to be non-injective.
Two Attacks on Naive Tree Hashes

@pervognsen Streaming serialization is pretty close to how I think about it too. Although if I were asked to describe it, I'd probably first say something unclear about automata and the theory of computation.

I spent a good part of the aughts thinking about the connections between streaming and continuations, which I also find extremely helpful for thinking about this.

I agree that prefixed domain separation is almost indisputably better when it comes to distinguishing root nodes from branches, but there's a subtle advantage to suffixed domain separation: prefixed domain separation can potentially be partially evaluated away, whereas I'm reasonably confident in my conjecture that suffixed domain separation constants cannot be efficiently removed from the computation except by homomorphic encryption.

Thus, you can use suffixed domain separation parameters as a means of communication. Salt that parameter with say, a URI link to your website's security.txt means that your hashing protocol becomes self-documenting, and anybody wishing to execute it must be "honest" about what is actually going on unless they are okay with paying the substantial overhead of homomorphic encryption.

I've written about this, but it's taken lot of effort to make it relatively comprehensible. There's a couple of minor problems with this section, but I still agree with the overall argument:

https://github.com/auth-global/self-documenting-cryptography/blob/prerelease/design-documents/g3pb2.md#combinatorics-of-cryptographic-continuations

Basically, what this section is attempting to do is argue there's enough "bandwidth" in this virtual transmission "medium" that it can plausibly support relatively "direct" communication of arbitrary plaintext messages. Prefixed salts clearly do not have enough "bandwidth" in this sense. Suffixed salts almost certainly have effectively unlimited "bandwidth".

self-documenting-cryptography/design-documents/g3pb2.md at prerelease · auth-global/self-documenting-cryptography

Contribute to auth-global/self-documenting-cryptography development by creating an account on GitHub.

GitHub
@leon_p_smith Regarding prefix vs suffix tagging, it reminds me of something like ANS compression where the communication channel is LIFO instead of FIFO. The producer has to serialize data that can be deserialized unambiguously by consumers in the reverse order.
@leon_p_smith If you're not familiar with ANS, it's worth checking out. It's a relatively recent (by compression standards) innovation that has led to significant progress in a previously somewhat stale field.