Mastodawn

Although DOMParser() does not provide the functionality I desire, I can build my own rudimentary parser that does provide the functionality I desire, and builds a DOM with the structure that I need for processing raw input to formatted output.

@orman

This approach is actually going to be A LOT more flexible than attempting to develop The One Regex To Rule Them All for the UTC, and it'll also be A LOT less computationally expensive, since parsing a string by iterating straight through it is just a for loop with a switch statement inside.

String.prototype.matchAll() I'm pretty sure does the same, but with the added overhead of a regex engine trying to match patterns starting from where each previous match ends.

@orman

Rubber Nero BLN Aug 11, 2025

@dragonarchitect @orman What about a SAX based parser? One pass, event driven.

@rubber_nero_bln @dragonarchitect a one-pass approach was kinda what I was suggesting in the Discord conversation because outright constructing a parse tree is kinda overly elaborate, and a top down approach involves heavy recursion which I'm not sure would be approachable

@orman @rubber_nero_bln Yeah I was thinking of a one-pass as well and just using the angle brackets as literal flags in the input to switch the parsing logic. Then going word by word and processing the input accordingly.

@dragonarchitect @rubber_nero_bln you can technically do it that way but IMO it'll be a lot more ergonomic to do one and a half passes by doing a matchAll to find where all the tags are and then using the indices you get to chop up the input into a sequence of tags and literal text fragments. Otherwise you'll have two parsers effectively inside each other because you need to parse the tag one character at a time and then on top of that, parse the tree based on each other tag as you complete the inner parse loop

@dragonarchitect @rubber_nero_bln the latter is technically the more efficient method, but the regex involved should be simple enough and the inputs short enough it won't matter

Spring Jo 🥚

🍀Aug 11, 2025

@dragonarchitect @orman regular grammar vs context-free grammar

@ShadowJonathan @orman I am actually too sleep-deprived this morning to figure out which is which. Can you clarify please?

https://en.m.wikipedia.org/wiki/Chomsky_hierarchy

Spring Jo 🥚

🍀Aug 11, 2025

@dragonarchitect @orman regular expressions = regular grammar, while html can only be parsed by context free grammars, or more complex than that :3

Chomsky hierarchy - Wikipedia