sxpp 0.6 has been released.

This major release features improved error diagnostics, a new API function to stacklessly walk SX trees, as well as small optimizations and fixes.

https://git.sr.ht/~duangle/sxpp

#programming #cpp #parsing #sxpp

0xMarioNawfal (@RoundtableSpace)

Firecrawl이 Rust 기반 PDF 파서를 출시했다. PDF를 마크다운으로 5배 더 빠르게 변환하고, 표를 추출하며 수식까지 보존하고, 설정 없이 바로 사용할 수 있어 AI 파이프라인의 핵심 병목인 PDF 처리 문제를 크게 개선할 수 있다.

https://x.com/RoundtableSpace/status/2048413036772483307

#firecrawl #rust #pdf #parsing #aipipeline

0xMarioNawfal (@RoundtableSpace) on X

Firecrawl just shipped a Rust-based PDF parser & it's not close. - 5x faster PDF to markdown conversion - Extracts full tables and preserves formulas - Zero config required PDF parsing has been a pain point for AI pipelines. This might actually fix it.

X (formerly Twitter)

Parse, Don’t Validate—in a Language That Doesn’t Want You To, by (not on Mastodon or Bluesky):

https://cekrem.github.io/posts/parse-dont-validate-typescript/

#parsing #validation #programming #typescript #typesafety

Parse, Don't Validate — In a Language That Doesn't Want You To

Applying Alexis King's parse-don't-validate principle in TypeScript, where the type system fights back just enough to be annoying.

cekrem.github.io
Gecko: a fast GLR parser with automatic syntax error recovery

Gecko: A Fast, Standalone GLR Parser Library in C

Vladimir Makarov

Somi AI (@somi_ai)

구조화된 출력(structured outputs)이 과소평가되고 있으며, 에이전트가 불안정해지는 큰 원인 중 하나가 자유형 텍스트 파싱에 있다고 지적한다. AI 에이전트와 개발 도구의 안정성을 높이는 데 중요한 기술적 통찰이다.

https://x.com/somi_ai/status/2046767315774279854

#structuredoutputs #agents #llm #parsing #ai

Somi AI (@somi_ai) on X

@googledevs structured outputs pulls more weight than people give it credit for. half of agent brittleness is downstream parsing of freeform text

X (formerly Twitter)

The fastest way to match characters on ARM processors?, https://lemire.me/blog/2026/04/19/the-fastest-way-to-match-characters-on-arm-processors/.

In this article, Lemir talks about two SIMD ARM SVE/SVE2 instructions: `match` and `nmatch`, which fit nicely in the _vectorized classification_ step of `simdjson`. These instructions improve the performance of `simdjson` from 11.4Gb/s to 14.4Gb/s.

#performance #simd #arm #json #parsing

The fastest way to match characters on ARM processors?

Consider the following problem. Given a string, you must match all of the ASCII white-space characters (\t, \n, \r, and the space) and some characters important in JSON (:, ,, [, ], {, }). JSON is a text-based data format used for web services. A toy JSON document looks as follows. { "name": "Alice", "age": … Continue reading The fastest way to match characters on ARM processors?

Daniel Lemire's blog

How do you build a complex parser without it becoming a mess? The swift-parsing library from Point-Free lets you compose smaller, focused parsers into powerful pipelines using the Parse block and map modifiers.

🔗: https://swiftdevjournal.com/posts/composing-parsers/ by Mark Szymczyk

#Swift #Parsing #iOSDev

Composing Parsers with the swift-parsing Library · Swift Dev Journal

I do not see why a recursive-descent parser cannot have numeric precedences like a Pratt parser. You just move the order of productions freely, and let any production call any other.

Am I missing something?

#parsing #programming

Rewriting Our Rust Wasm Parser in TypeScript, by (not on Mastodon or Bluesky):

https://www.openui.com/blog/rust-wasm-parser

#migrating #parsing #rust #typescript

Rewriting our Rust WASM Parser in TypeScript | OpenUI

We rewrote our Rust WASM Parser in TypeScript - and it got 3x Faster

Intuiting Pratt parsing

You already know that a + b * c + d is calculated as a + (b * c) + d. But how do you encode that knowledge precisely enough for a machine to act on it?

louisb0