So the first thing the researcher does is go figure out how OOXML works in the first place. Surprise, it's a nightmare! 40 years of stratified features dating back to Multi-Tool Word for Xenix. The first 3 weeks of your project are probably just reading the spec.
You come to learn that an OOXML .DOCX is a ZIP archive with what appear to be three different manifests inside, because why not. They are:
1. [Content_Types.xml], mapping filenames to content types; any file in the bundle not in this manifest will trigger an error.
2. document.xml.rels, mapping filenames to symbolic IDs, which are referenced from documents themselves.
3. The "Package Info" section of "sig1.xml”, the signature block, which maps files to hashes of those files.
The "index.html" of a .DOCX file is a file in the bundle called "document.xml", and that's where all rendering starts.