Looking for some good, clean C++ fun? [1] Here's a vectorised HTML parser based on @lemire's incredible work in simdjson [2]:

https://source.chromium.org/chromium/chromium/src/+/main:third_party/blink/renderer/core/html/parser/html_document_parser_fastpath.cc

[1]: yes, I understand. That's the joke.
[2]: https://arxiv.org/pdf/1902.08318

@slightlyoff @lemire Daniel has a follow up blog post (https://lemire.me/blog/2024/07/20/scan-html-even-faster-with-simd-instructions-c-and-c/). I asked whether Anton Bikineev (who implemented the original approach from Daniel in Chromium) tried that follow up in Chromium and it has no further improvements: https://chromium-review.googlesource.com/c/chromium/src/+/7246251. Nevertheless all very interesting.
Scan HTML even faster with SIMD instructions (C++ and C#)

Earlier this year, both major Web engines (WebKit/Safari and Chromium/Chrome/Edge/Brave) accelerated HTML parsing using SIMD instructions. These 'SIMD' instructions are special instructions that are present in all our processors that can process multiple bytes at once (e.g., 16 bytes). The problem that WebKit and Chromium solve is to jump to the next target character as … Continue reading Scan HTML even faster with SIMD instructions (C++ and C#)

Daniel Lemire's blog