Clelia Bertelli

@itsclelia
10 Followers
8 Following
239 Posts
Girly dev ๐Ÿ’…๐Ÿ‘ฉโ€๐Ÿ’ป
Open Source Engineer at @llamaindex
I do things with python and AI๐Ÿ
What makes it different?
ParseBench optimizes for semantic correctness, not exact text matching.
Explore more:
๐Ÿ“– Blog: https://www.llamaindex.ai/blog/parsebench
๐Ÿ’ป Code: https://github.com/run-llama/ParseBench
๐Ÿค— Dataset: https://huggingface.co/datasets/llamaindex/ParseBench
ParseBench: The First Document Parsing Benchmark for AI Agents

Introducing ParseBench 2,000+ human-verified pages and 167K test rules to evaluate document OCR across tables, charts, formatting, and more for AI agents. Open source.

ParseBench is here!๐Ÿ“Š

Weโ€™ve just released ParseBench, an open benchmark + dataset for evaluating document parsing at scale.

It includes:
โ€ข 2,000+ human-reviewed enterprise documents
โ€ข 167,000 evaluation rules
โ€ข Coverage across 5 key areas: tables, charts, content faithfulness, semantic formatting, and visual grounding

๐Ÿ‘ฉโ€๐Ÿ’ป Repo: https://github.com/AstraBert/sunbears
GitHub - AstraBert/sunbears: A CSV data loader for TypeScript with an API similar to Polars and Pandas, written in pure Rust.

A CSV data loader for TypeScript with an API similar to Polars and Pandas, written in pure Rust. - AstraBert/sunbears

GitHub
I'm building ๐˜€๐˜‚๐—ป๐—ฏ๐—ฒ๐—ฎ๐—ฟ๐˜€, a CSV data loader library for TS written in Rust๐Ÿฆ€
In Node, it can read a file with 1.000.000 rows in 0.3s, write the same amount of rows in 0.15s, respectively 4x and 2x faster than the `csv` packageโšก
sunbears uses ๐˜‹๐˜ข๐˜ต๐˜ข๐˜๐˜ณ๐˜ข๐˜ฎ๐˜ฆ as its primary data structure, a columnar format with strict typing, null and NaN filtering, and convenient column-to-array transformations๐Ÿ“„
๐Ÿ“ฆ Get started now: ๐˜ฏ๐˜ฑ๐˜ฎ ๐˜ช๐˜ฏ๐˜ด๐˜ต๐˜ข๐˜ญ๐˜ญ @๐˜ค๐˜ญ๐˜ฆ-๐˜ฅ๐˜ฐ๐˜ฆ๐˜ด-๐˜ต๐˜ฉ๐˜ช๐˜ฏ๐˜จ๐˜ด/๐˜ด๐˜ถ๐˜ฏ๐˜ฃ๐˜ฆ๐˜ข๐˜ณ๐˜ด
With our eval dataset, the agent got near-perfect scores on most complex QA tasks, showing how a strong parsing foundation and multimodal retrieval can really improve your search๐Ÿš€
Read the full breakdown here: https://www.lancedb.com/blog/smart-parsing-meets-sharp-retrieval-combining-liteparse-and-lancedb
Smart Parsing Meets Sharp Retrieval: Combining LiteParse and LanceDB

Build a structure-aware PDF QA agent with LiteParse, LanceDB, and Claude to answer complex questions over visually rich documents.

How can you improve your agentic search pipeline?
I just wrote a blog post in collaboration with LanceDB to answer exactly that.
TLDR:
- Parse files and take page-level screenshots with LiteParse, the parser we just open sourced at LlamaIndex
- Chunk and embed text, and store everything (text, image bytes, vector data) in a local LanceDB instance
- Expose text and image retrieval tools to a Claude agent, and let it reason on both data types

Thatโ€™s why we teamed up with Auth0 to build a real-world demo of a secure document processing and retrieval pipeline, powered by fine-grained authentication so only trusted actors can access specific content.

๐Ÿ“š Learn how it works in the blog post: https://auth0.com/blog/securing-ai-documents-llamaindex-auth0/
๐Ÿฆ™ Get started with LlamaParse: https://cloud.llamaindex.ai/signup

Securing AI Document Agents with LlamaIndex and Auth0

Learn how to build secure AI document agents using LlamaIndex Workflows and Auth0 FGA. Implement fine-grained, relationship-based access ...

Auth0 - Blog

At @llamaindex, we're committed to building the most capable document agents.

That starts with powerful document processing building blocks like LlamaParse and LlamaExtract, but great agents also need the right access controls, as they should only see the documents theyโ€™re authorized to use.

๐Ÿ“ PS: I'll follow up with a blog post on my experience while creating this library!

In benchmarks, sunbears can load a CSV with 1 million rows in about 0.4 seconds, making it roughly 3ร— faster than ๐˜ค๐˜ด๐˜ท-๐˜ฑ๐˜ข๐˜ณ๐˜ด๐˜ฆ, although still about 2ร— slower than Polars in Python โš–๏ธ

For now, sunbears focuses on fast CSV reading, but Iโ€™m planning to expand the library further and keep improving performance over time ๐Ÿš€

โญ Give it a star: https://github.com/AstraBert/sunbears
๐Ÿ“ฆ Install with ๐˜ฏ๐˜ฑ๐˜ฎ ๐˜ช๐˜ฏ๐˜ด๐˜ต๐˜ข๐˜ญ๐˜ญ @๐˜ค๐˜ญ๐˜ฆ-๐˜ฅ๐˜ฐ๐˜ฆ๐˜ด-๐˜ต๐˜ฉ๐˜ช๐˜ฏ๐˜จ๐˜ด/๐˜ด๐˜ถ๐˜ฏ๐˜ฃ๐˜ฆ๐˜ข๐˜ณ๐˜ด