Last night thanks to @orman I had the epiphany of why:

XML parsers don't use or even need regex to parse XML in the first place.

XML parsers go through the text one char at a time, and if they encounter a <, >, </, or />, those chars form flags that signal to the parser if it's entering or leaving a tag, and whether it's a closing or self-closing tag respectively, all of which changes the parsing rules and builds a node tree on the fly.

This will be useful for the UTC.

Although DOMParser() does not provide the functionality I desire, I can build my own rudimentary parser that does provide the functionality I desire, and builds a DOM with the structure that I need for processing raw input to formatted output.

@orman

This approach is actually going to be A LOT more flexible than attempting to develop The One Regex To Rule Them All for the UTC, and it'll also be A LOT less computationally expensive, since parsing a string by iterating straight through it is just a for loop with a switch statement inside.

String.prototype.matchAll() I'm pretty sure does the same, but with the added overhead of a regex engine trying to match patterns starting from where each previous match ends.

@orman

@dragonarchitect @orman What about a SAX based parser? One pass, event driven.
@rubber_nero_bln @dragonarchitect a one-pass approach was kinda what I was suggesting in the Discord conversation because outright constructing a parse tree is kinda overly elaborate, and a top down approach involves heavy recursion which I'm not sure would be approachable
@orman @rubber_nero_bln Yeah I was thinking of a one-pass as well and just using the angle brackets as literal flags in the input to switch the parsing logic. Then going word by word and processing the input accordingly.
@dragonarchitect @rubber_nero_bln you can technically do it that way but IMO it'll be a lot more ergonomic to do one and a half passes by doing a matchAll to find where all the tags are and then using the indices you get to chop up the input into a sequence of tags and literal text fragments. Otherwise you'll have two parsers effectively inside each other because you need to parse the tag one character at a time and then on top of that, parse the tree based on each other tag as you complete the inner parse loop
@dragonarchitect @rubber_nero_bln the latter is technically the more efficient method, but the regex involved should be simple enough and the inputs short enough it won't matter

@dragonarchitect @orman regular grammar vs context-free grammar

:3

@ShadowJonathan @orman I am actually too sleep-deprived this morning to figure out which is which. Can you clarify please?

@dragonarchitect @orman regular expressions = regular grammar, while html can only be parsed by context free grammars, or more complex than that :3

https://en.m.wikipedia.org/wiki/Chomsky_hierarchy

Chomsky hierarchy - Wikipedia

@ShadowJonathan @dragonarchitect well formed trees with matching end tags are actually context sensitive IIRC, but since that's even more hairy you just cheat and check if the end tag matches the start after parsing