๐Ÿ’ก Docling + Ollama + Qdrant: costruisci una knowledge base privata con Qwen3.6 in locale
https://gomoot.com/docling-la-libreria-open-source-per-il-pdf-parsing-nelle-pipeline-rag-con-ollama-e-qdrant/

#docling #ollama #parsing #pdf #qdrant #rag

sxpp 0.7 has been released.

This major release features an experimental LSP server, simple JSON parsing and generation with `--json`, binary output with `--cat`, a new API function to specify search paths, as well as small optimizations and fixes.

https://git.sr.ht/~duangle/sxpp

#programming #cpp #parsing #sxpp

LLM generated parsers and compliance checkers for Sparrow DSL

Sparrow DSL๋Š” ํ…์ŠคํŠธ ํŒŒ์‹ฑ๊ณผ ์ž๋™ํ™”๋ฅผ ์œ„ํ•œ ๋„๋ฉ”์ธ ํŠนํ™” ์–ธ์–ด๋กœ, ๋‹ค์–‘ํ•œ ํ”„๋กœ๊ทธ๋ž˜๋ฐ ์–ธ์–ด์šฉ SDK๋ฅผ ์ œ๊ณตํ•œ๋‹ค. Deep Seek์˜ LLM ๊ธฐ๋ฐ˜ ์‹œ์Šคํ…œ์ด Sparrow DSL์„ ํ™œ์šฉํ•ด sudoers, sshd, redis, forgejo ๋“ฑ ์—ฌ๋Ÿฌ ๊ตฌ์„ฑ ํŒŒ์ผ์— ๋Œ€ํ•œ ํŒŒ์„œ์™€ ์ปดํ”Œ๋ผ์ด์–ธ์Šค ์ฒด์ปค๋ฅผ ํšจ๊ณผ์ ์œผ๋กœ ์ƒ์„ฑํ•˜๋Š” ์‚ฌ๋ก€๊ฐ€ ์†Œ๊ฐœ๋˜์—ˆ๋‹ค. Python๊ณผ Raku SDK์šฉ ํ”„๋กฌํ”„ํŠธ ์˜ˆ์‹œ๋„ ๊ณต๊ฐœ๋˜์–ด ์žˆ์–ด, ๊ฐœ๋ฐœ์ž๊ฐ€ ์ž์œ ๋กญ๊ฒŒ ๊ตฌ์„ฑ ํŒŒ์ผ์„ ๋Œ€์ƒ์œผ๋กœ LLM ์ƒ์„ฑ ํŒŒ์„œ๋ฅผ ์‹คํ—˜ํ•ด๋ณผ ์ˆ˜ ์žˆ๋‹ค.

https://news.ycombinator.com/item?id=48075633

#llm #dsl #parsing #automation #sdk

LLM generated parsers and compliance checkers for Sparrow DSL | Hacker News

sxpp 0.6 has been released.

This major release features improved error diagnostics, a new API function to stacklessly walk SX trees, as well as small optimizations and fixes.

https://git.sr.ht/~duangle/sxpp

#programming #cpp #parsing #sxpp

0xMarioNawfal (@RoundtableSpace)

Firecrawl์ด Rust ๊ธฐ๋ฐ˜ PDF ํŒŒ์„œ๋ฅผ ์ถœ์‹œํ–ˆ๋‹ค. PDF๋ฅผ ๋งˆํฌ๋‹ค์šด์œผ๋กœ 5๋ฐฐ ๋” ๋น ๋ฅด๊ฒŒ ๋ณ€ํ™˜ํ•˜๊ณ , ํ‘œ๋ฅผ ์ถ”์ถœํ•˜๋ฉฐ ์ˆ˜์‹๊นŒ์ง€ ๋ณด์กดํ•˜๊ณ , ์„ค์ • ์—†์ด ๋ฐ”๋กœ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ์–ด AI ํŒŒ์ดํ”„๋ผ์ธ์˜ ํ•ต์‹ฌ ๋ณ‘๋ชฉ์ธ PDF ์ฒ˜๋ฆฌ ๋ฌธ์ œ๋ฅผ ํฌ๊ฒŒ ๊ฐœ์„ ํ•  ์ˆ˜ ์žˆ๋‹ค.

https://x.com/RoundtableSpace/status/2048413036772483307

#firecrawl #rust #pdf #parsing #aipipeline

0xMarioNawfal (@RoundtableSpace) on X

Firecrawl just shipped a Rust-based PDF parser & it's not close. - 5x faster PDF to markdown conversion - Extracts full tables and preserves formulas - Zero config required PDF parsing has been a pain point for AI pipelines. This might actually fix it.

X (formerly Twitter)

Parse, Donโ€™t Validateโ€”in a Language That Doesnโ€™t Want You To, by (not on Mastodon or Bluesky):

https://cekrem.github.io/posts/parse-dont-validate-typescript/

#parsing #validation #programming #typescript #typesafety

Parse, Don't Validate โ€” In a Language That Doesn't Want You To

Applying Alexis King's parse-don't-validate principle in TypeScript, where the type system fights back just enough to be annoying.

cekrem.github.io
Gecko: a fast GLR parser with automatic syntax error recovery

Gecko: A Fast, Standalone GLR Parser Library in C

Vladimir Makarov

Somi AI (@somi_ai)

๊ตฌ์กฐํ™”๋œ ์ถœ๋ ฅ(structured outputs)์ด ๊ณผ์†Œํ‰๊ฐ€๋˜๊ณ  ์žˆ์œผ๋ฉฐ, ์—์ด์ „ํŠธ๊ฐ€ ๋ถˆ์•ˆ์ •ํ•ด์ง€๋Š” ํฐ ์›์ธ ์ค‘ ํ•˜๋‚˜๊ฐ€ ์ž์œ ํ˜• ํ…์ŠคํŠธ ํŒŒ์‹ฑ์— ์žˆ๋‹ค๊ณ  ์ง€์ ํ•œ๋‹ค. AI ์—์ด์ „ํŠธ์™€ ๊ฐœ๋ฐœ ๋„๊ตฌ์˜ ์•ˆ์ •์„ฑ์„ ๋†’์ด๋Š” ๋ฐ ์ค‘์š”ํ•œ ๊ธฐ์ˆ ์  ํ†ต์ฐฐ์ด๋‹ค.

https://x.com/somi_ai/status/2046767315774279854

#structuredoutputs #agents #llm #parsing #ai

Somi AI (@somi_ai) on X

@googledevs structured outputs pulls more weight than people give it credit for. half of agent brittleness is downstream parsing of freeform text

X (formerly Twitter)

The fastest way to match characters on ARM processors?, https://lemire.me/blog/2026/04/19/the-fastest-way-to-match-characters-on-arm-processors/.

In this article, Lemir talks about two SIMD ARM SVE/SVE2 instructions: `match` and `nmatch`, which fit nicely in the _vectorized classification_ step of `simdjson`. These instructions improve the performance of `simdjson` from 11.4Gb/s to 14.4Gb/s.

#performance #simd #arm #json #parsing

The fastest way to match characters on ARM processors?

Consider the following problem. Given a string, you must match all of the ASCII white-space characters (\t, \n, \r, and the space) and some characters important in JSON (:, ,, [, ], {, }). JSON is a text-based data format used for web services. A toy JSON document looks as follows. { "name": "Alice", "age": โ€ฆ Continue reading The fastest way to match characters on ARM processors?

Daniel Lemire's blog

How do you build a complex parser without it becoming a mess? The swift-parsing library from Point-Free lets you compose smaller, focused parsers into powerful pipelines using the Parse block and map modifiers.

๐Ÿ”—: https://swiftdevjournal.com/posts/composing-parsers/ by Mark Szymczyk

#Swift #Parsing #iOSDev

Composing Parsers with the swift-parsing Library ยท Swift Dev Journal