Mastodawn

ParseBench is here!📊

We’ve just released ParseBench, an open benchmark + dataset for evaluating document parsing at scale.

It includes:
• 2,000+ human-reviewed enterprise documents
• 167,000 evaluation rules
• Coverage across 5 key areas: tables, charts, content faithfulness, semantic formatting, and visual grounding

Clelia Bertelli 5d ago

I'm building 𝘀𝘂𝗻𝗯𝗲𝗮𝗿𝘀, a CSV data loader library for TS written in Rust🦀
In Node, it can read a file with 1.000.000 rows in 0.3s, write the same amount of rows in 0.15s, respectively 4x and 2x faster than the `csv` package⚡
sunbears uses 𝘋𝘢𝘵𝘢𝘍𝘳𝘢𝘮𝘦 as its primary data structure, a columnar format with strict typing, null and NaN filtering, and convenient column-to-array transformations📄
📦 Get started now: 𝘯𝘱𝘮 𝘪𝘯𝘴𝘵𝘢𝘭𝘭 @𝘤𝘭𝘦-𝘥𝘰𝘦𝘴-𝘵𝘩𝘪𝘯𝘨𝘴/𝘴𝘶𝘯𝘣𝘦𝘢𝘳𝘴

Clelia Bertelli 6d ago

How can you improve your agentic search pipeline?
I just wrote a blog post in collaboration with LanceDB to answer exactly that.
TLDR:
- Parse files and take page-level screenshots with LiteParse, the parser we just open sourced at LlamaIndex
- Chunk and embed text, and store everything (text, image bytes, vector data) in a local LanceDB instance
- Expose text and image retrieval tools to a Claude agent, and let it reason on both data types

Clelia Bertelli Apr 6

At @llamaindex, we're committed to building the most capable document agents.

That starts with powerful document processing building blocks like LlamaParse and LlamaExtract, but great agents also need the right access controls, as they should only see the documents they’re authorized to use.

Clelia Bertelli Apr 3

I just published a TypeScript library for loading CSV data, with an API inspired by Pandas and Polars, but fully written in Rust 🦀

𝘀𝘂𝗻𝗯𝗲𝗮𝗿𝘀 converts CSV files into a DataFrame, a tabular data structure with strictly typed columns whose values can be easily extracted as arrays and used with familiar operations like 𝘮𝘢𝘱 and 𝘧𝘪𝘭𝘵𝘦𝘳📊

Clelia Bertelli Mar 29

Hey there 👋 , I built 𝗹𝗶𝘁𝗲𝘀𝗲𝗮𝗿𝗰𝗵, a fully local document ingestion and retrieval CLI and TUI app, powered by LiteParse⚡
- Parse your unstructured documents with LiteParse, the lightning fast parser that we just open sourced at @llamaindex
- Chunk with Chonkie
- Embed with a local model through transformers.js
- Store embeddings in a local Qdrant edge shard (custom-built in Rust and compiled as a native add-on🦀)

Clelia Bertelli Mar 26

The Google DeepMind team really cooked with Gemini 3.1 in the Live API: it's fast and the output quality is great🔥
That's why at @llamaindex we decided to test it out with our bread and butter: document processing📄
The voice agent we built:
- Takes voice command from terminal
- Calls tools to explore available files and parse them, powered by LiteParse, our fully-local parser
- Live-updates you on its task🔊
Take a look at the demo👇
Repo: https://github.com/run-llama/voice-document-assistant

Clelia Bertelli Mar 26

I created 𝘀𝗸𝗶𝗹𝗹𝘇𝘆, a simple CLI written in Rust for your agent skills📝
→ 𝘴𝘬𝘪𝘭𝘭𝘻𝘺 𝘪𝘯𝘪𝘵 will allow you to create the frontmatter for a skill, also generating the required folders and skill file
→ 𝘴𝘬𝘪𝘭𝘭𝘻𝘺 𝘤𝘩𝘦𝘤𝘬 will validate your existing skills, and ensure they comply with the https://agentskills.io specification
skillzy also comes with a GitHub Action you can run in your skills repositories, 𝘈𝘴𝘵𝘳𝘢𝘉𝘦𝘳𝘵/𝘳𝘶𝘯-𝘴𝘬𝘪𝘭𝘭𝘻𝘺, and with its own agent skill

Clelia Bertelli Mar 24

notion-cli, the app I built in Golang to interact with Notion pages from your terminal and with your agents, got to v0.3.0🚀
This version adds a search command, which allows your (or your agent's) flow to be smoother: search → read → modify.
Install 👉 𝘣𝘳𝘦𝘸 𝘪𝘯𝘴𝘵𝘢𝘭𝘭 𝘈𝘴𝘵𝘳𝘢𝘉𝘦𝘳𝘵/𝘯𝘰𝘵𝘪𝘰𝘯-𝘤𝘭𝘪/𝘯𝘰𝘵𝘪𝘰𝘯-𝘤𝘭𝘪
Star on GitHub 👉 https://github.com/AstraBert/notion-cli

Clelia Bertelli Mar 23

So excited to see live the article Vishal and I wrote on LlamaParse x Google Gemini!
In this guide, we show you how you can leverage LlamaParse's advanced agentic OCR to extract text and tables from complex financial documents, and then use Gemini 3 state of the art context understanding capabilities to turn the parsed content into human-friendly insights.
Read the blog: https://developers.googleblog.com/build-a-smart-financial-assistant-with-llamaparse-and-gemini-31/