I built a pure Swift XML parser (no DTD, external entity, transcoding support) at https://github.com/compnerd/xylem. Some informal testing indicated that it is faster than libxml2 and xmloxide, a rust port of libxml2.
GitHub - compnerd/xylem: XML Parsing Library

XML Parsing Library. Contribute to compnerd/xylem development by creating an account on GitHub.

GitHub
@compnerd Sigh, a tad too late for me, I would have just used that :-) (I just did a similar Expat variant for my own needs, I did some very heavy perf testing and in many things its almost twice as fast as Expat)
@compnerd In the SAX parser, if you make the ResolvedAttributes "borrowing", you can keep one shared instance in the parser, reusing it for any start call (and the delegate can extract what it actually needs). The XML.Byte is a little weird, are there actually systems where a byte is not UInt8? 🙂
@helge I don’t think that Swift really needs to worry about those platforms. I do however prefer the Byte spelling over UInt8.
@helge did I understand your suggestion properly in https://github.com/compnerd/xylem/commit/84b0f1386abfc4b2100c5b68b1f5b7d8658cb891 ? Either way, turns out to be beneficial.
XMLCore,SAXParser: convert ResolvedAttributes to a View · compnerd/xylem@84b0f13

Convert to borrowing rather than consuming as suggested by @helje5. This avoids some unnecessary copies which improves performace further and reduces memory pressure.

GitHub
@compnerd Can't really review that, sorry, too many own things :-) But I have one shared buffer for all attributes of a tag and then keep a list of ranges into that buffer. The shared buffer is reused between all calls to the delegate.startTag and involves no ARC/copying at all, because it is borrowing (a quick glimpse showed that you had that already though?).
I'm probably going to release my version later this month, you can then have a look and steal stuff you find useful, or not if not 🙂

@helge totally get it (and it was split up to make the change smaller).

The index technique is definitely what I ended up with as well. Except, I have a few different representations as you have the unresolved attributes (raw), and then the resolved attributes (binding URIs for namespace and entity expanded).

Looking forward to comparing notes!

@compnerd Your package does a lot more (DOM, XPath etc), mine is strictly SAX (Expat like). I use it for protocols (like WebDAV), not for loading DOMs or such. In the protocol parsers, I don't even copy in the delegate, but match up the spans to InlineArray<UInt8>'s for the known elements/attributes.
@helge I should add, that depending on the workload, I did see similar results. libxml2 has a long time horizon to have been tweaked, and there are places where it is already entirely optimal.
@compnerd One would have to check the asm but I suspect the reason why Swift is actually faster (which I didn’t anticipate given the maturity of both Expat and libxml2) is that the optimizer can’t be as aggressive in C.
@helge yes, as a concrete example is alias analysis. Swift is just helping the compiler. But expat and libxml2 have a long history of people improving performance so, it’s a huge deal to meet or beat at first pass.
@compnerd What a great project, thanks for making it!
@numist thanks :) I hope you find it useful!