Mastodawn

JdeBP Jun 6, 2025

I remember when MediaWiki's parser underwent a major overhaul, some years ago.

At the time I thought how awful it must be to parse (MediaWiki) wikitext, given the terrible things that it does in the name of accommodating humans.

I had been musing at the time on the idea of a MediaWiki wikitext to #HTML convertor for static WWW sites, so that one could author in #MediaWiki wikitext and run make. But I concluded that one would learn through Second System Effect and not begin there at all.

Allowing for embedded HTML in #wikitext brings its own problems, such as the question of whether one first converts the other markup to its HTML equivalent and parses that, which in turn means one has to lex the HTML twice, once before transclusion, just to find the transclusion markup correctly, and then again afterwards.

And of course because of embedded HTML, the rule about it not being lexable by regular expressions, holds.

Unlike the centre. (-:

https://stackoverflow.com/a/1732454/340790

#DocBookXML

RegEx match open tags except XHTML self-contained tags

I need to match all of these opening tags: <p> <a href="foo"> But not self-closing tags: <br /> <hr class="foo" /> I came up with this and wanted to make

Stack Overflow