What if the problem isn't the model, but the data you're feeding it? ⚡

In this session, Eric Deandrea shows how Docling Java turns complex documents into structured, AI-ready data, helping Java devs build better RAG pipelines and AI applications

🎟️ https://www.dev2next.com/schedule

#Java #Docling #AI

Adieu le PDF ? La Linux Foundation lance DocLang, le premier format natif IA

IBM, NVIDIA, Red Hat et ABBYY s'allient pour concevoir un standard de document universel et lisible par les machines pour fiabiliser les architectures RAG et agentiques.

Goodtech

I'm ex[experimenting with #docling
nice, ran it on a pdf the markdown output was 10 times the original size and it took only over one hours of intense compute

guess adapting my procedures not necessary 😊

Article Update:

Taming Unstructured Data: From PDFs to JSON with Quarkus and Docling

https://www.the-main-thread.com/p/quarkus-docling-data-preparation-for-ai

#java #quarkus #docling

From PDFs to JSON with Quarkus and Docling

Build a fast, scalable converter to turn business documents into structured data using Quarkus and Docling—for RAG pipelines, search indexing, and LLM prep.

The Main Thread
Docling と Langflow で表・画像入り文書の RAG を行う - Qiita

RAG で表や画像入りの文書を扱いたい RAG とか Agentic AI で、プレーンなテキストではなくて、表とか画像とか含んだPDFとかPPTXとかの非構造化データを使いたいケースはあると思います。非構造化データもがんばって実装すればいろいろ解析できるんでしょうけど、...

Qiita
Docling

Docling converts messy documents into structured data and simplifies downstream document and AI processing by detecting tables, formulas, reading order, OCR, and much more.

Docling vs MarkItDown: GenAI向けのドキュメント処理における最適なツールはどっち? - Qiita

はじめに GenAI(生成AI)プロジェクトやRAG(検索拡張生成)システムを構築する際、データのクレンジングと準備はとても重要なステップですよね。でも実際には、企業の内部ドキュメントがきれいなテキスト形式になっていることはほとんどありません。 複数列のPDF、複雑な表が...

Qiita

RE: https://fedi.simonwillison.net/@simon/116457708120212477

#liteparse werde ich im Vergleich zu #Docling lokal mal testen, nutze das bisher nur per Web über #llamaindex.

Join Ming and I for a #Docling workshop at @pycon_austria this weekend! It's a free event with a wide range of talks, hands-on workshops, and networking opportunities.

"Workshop: Learn to Unlock Document Intelligence with Open-Source AI" will be on Sunday, April 19, at 10:00-12:00 in room E.HG 209. More details including venue & registration: https://2026.pycon.at/

#PyCon #PyConAT #PyConAT26 #opensource