Clelia Bertelli

@itsclelia
10 Followers
8 Following
239 Posts
Girly dev ๐Ÿ’…๐Ÿ‘ฉโ€๐Ÿ’ป
Open Source Engineer at @llamaindex
I do things with python and AI๐Ÿ

ParseBench is here!๐Ÿ“Š

Weโ€™ve just released ParseBench, an open benchmark + dataset for evaluating document parsing at scale.

It includes:
โ€ข 2,000+ human-reviewed enterprise documents
โ€ข 167,000 evaluation rules
โ€ข Coverage across 5 key areas: tables, charts, content faithfulness, semantic formatting, and visual grounding

I'm building ๐˜€๐˜‚๐—ป๐—ฏ๐—ฒ๐—ฎ๐—ฟ๐˜€, a CSV data loader library for TS written in Rust๐Ÿฆ€
In Node, it can read a file with 1.000.000 rows in 0.3s, write the same amount of rows in 0.15s, respectively 4x and 2x faster than the `csv` packageโšก
sunbears uses ๐˜‹๐˜ข๐˜ต๐˜ข๐˜๐˜ณ๐˜ข๐˜ฎ๐˜ฆ as its primary data structure, a columnar format with strict typing, null and NaN filtering, and convenient column-to-array transformations๐Ÿ“„
๐Ÿ“ฆ Get started now: ๐˜ฏ๐˜ฑ๐˜ฎ ๐˜ช๐˜ฏ๐˜ด๐˜ต๐˜ข๐˜ญ๐˜ญ @๐˜ค๐˜ญ๐˜ฆ-๐˜ฅ๐˜ฐ๐˜ฆ๐˜ด-๐˜ต๐˜ฉ๐˜ช๐˜ฏ๐˜จ๐˜ด/๐˜ด๐˜ถ๐˜ฏ๐˜ฃ๐˜ฆ๐˜ข๐˜ณ๐˜ด
How can you improve your agentic search pipeline?
I just wrote a blog post in collaboration with LanceDB to answer exactly that.
TLDR:
- Parse files and take page-level screenshots with LiteParse, the parser we just open sourced at LlamaIndex
- Chunk and embed text, and store everything (text, image bytes, vector data) in a local LanceDB instance
- Expose text and image retrieval tools to a Claude agent, and let it reason on both data types

At @llamaindex, we're committed to building the most capable document agents.

That starts with powerful document processing building blocks like LlamaParse and LlamaExtract, but great agents also need the right access controls, as they should only see the documents theyโ€™re authorized to use.

I just published a TypeScript library for loading CSV data, with an API inspired by Pandas and Polars, but fully written in Rust ๐Ÿฆ€

๐˜€๐˜‚๐—ป๐—ฏ๐—ฒ๐—ฎ๐—ฟ๐˜€ converts CSV files into a DataFrame, a tabular data structure with strictly typed columns whose values can be easily extracted as arrays and used with familiar operations like ๐˜ฎ๐˜ข๐˜ฑ and ๐˜ง๐˜ช๐˜ญ๐˜ต๐˜ฆ๐˜ณ๐Ÿ“Š

Hey there ๐Ÿ‘‹ , I built ๐—น๐—ถ๐˜๐—ฒ๐˜€๐—ฒ๐—ฎ๐—ฟ๐—ฐ๐—ต, a fully local document ingestion and retrieval CLI and TUI app, powered by LiteParseโšก
- Parse your unstructured documents with LiteParse, the lightning fast parser that we just open sourced at @llamaindex
- Chunk with Chonkie
- Embed with a local model through transformers.js
- Store embeddings in a local Qdrant edge shard (custom-built in Rust and compiled as a native add-on๐Ÿฆ€)
The Google DeepMind team really cooked with Gemini 3.1 in the Live API: it's fast and the output quality is great๐Ÿ”ฅ
That's why at @llamaindex we decided to test it out with our bread and butter: document processing๐Ÿ“„
The voice agent we built:
- Takes voice command from terminal
- Calls tools to explore available files and parse them, powered by LiteParse, our fully-local parser
- Live-updates you on its task๐Ÿ”Š
Take a look at the demo๐Ÿ‘‡
Repo: https://github.com/run-llama/voice-document-assistant
I created ๐˜€๐—ธ๐—ถ๐—น๐—น๐˜‡๐˜†, a simple CLI written in Rust for your agent skills๐Ÿ“
โ†’ ๐˜ด๐˜ฌ๐˜ช๐˜ญ๐˜ญ๐˜ป๐˜บ ๐˜ช๐˜ฏ๐˜ช๐˜ต will allow you to create the frontmatter for a skill, also generating the required folders and skill file
โ†’ ๐˜ด๐˜ฌ๐˜ช๐˜ญ๐˜ญ๐˜ป๐˜บ ๐˜ค๐˜ฉ๐˜ฆ๐˜ค๐˜ฌ will validate your existing skills, and ensure they comply with the https://agentskills.io specification
skillzy also comes with a GitHub Action you can run in your skills repositories, ๐˜ˆ๐˜ด๐˜ต๐˜ณ๐˜ข๐˜‰๐˜ฆ๐˜ณ๐˜ต/๐˜ณ๐˜ถ๐˜ฏ-๐˜ด๐˜ฌ๐˜ช๐˜ญ๐˜ญ๐˜ป๐˜บ, and with its own agent skill
notion-cli, the app I built in Golang to interact with Notion pages from your terminal and with your agents, got to v0.3.0๐Ÿš€
This version adds a search command, which allows your (or your agent's) flow to be smoother: search โ†’ read โ†’ modify.
Install ๐Ÿ‘‰ ๐˜ฃ๐˜ณ๐˜ฆ๐˜ธ ๐˜ช๐˜ฏ๐˜ด๐˜ต๐˜ข๐˜ญ๐˜ญ ๐˜ˆ๐˜ด๐˜ต๐˜ณ๐˜ข๐˜‰๐˜ฆ๐˜ณ๐˜ต/๐˜ฏ๐˜ฐ๐˜ต๐˜ช๐˜ฐ๐˜ฏ-๐˜ค๐˜ญ๐˜ช/๐˜ฏ๐˜ฐ๐˜ต๐˜ช๐˜ฐ๐˜ฏ-๐˜ค๐˜ญ๐˜ช
Star on GitHub ๐Ÿ‘‰ https://github.com/AstraBert/notion-cli
So excited to see live the article Vishal and I wrote on LlamaParse x Google Gemini!
In this guide, we show you how you can leverage LlamaParse's advanced agentic OCR to extract text and tables from complex financial documents, and then use Gemini 3 state of the art context understanding capabilities to turn the parsed content into human-friendly insights.
Read the blog: https://developers.googleblog.com/build-a-smart-financial-assistant-with-llamaparse-and-gemini-31/