Docling vs MarkItDown: GenAI向けのドキュメント処理における最適なツールはどっち?
https://qiita.com/TOMOSIA-LinhND/items/8ff4b27c4d9097380c18?utm_campaign=popular_items&utm_medium=feed&utm_source=popular_items
Docling vs MarkItDown: GenAI向けのドキュメント処理における最適なツールはどっち?
https://qiita.com/TOMOSIA-LinhND/items/8ff4b27c4d9097380c18?utm_campaign=popular_items&utm_medium=feed&utm_source=popular_items
RE: https://fedi.simonwillison.net/@simon/116457708120212477
#liteparse werde ich im Vergleich zu #Docling lokal mal testen, nutze das bisher nur per Web über #llamaindex.
Join Ming and I for a #Docling workshop at @pycon_austria this weekend! It's a free event with a wide range of talks, hands-on workshops, and networking opportunities.
"Workshop: Learn to Unlock Document Intelligence with Open-Source AI" will be on Sunday, April 19, at 10:00-12:00 in room E.HG 209. More details including venue & registration: https://2026.pycon.at/
Google for Developers (@googledevs)
RAG 파이프라인을 최적화해 더 정교한 AI 에이전트를 만드는 방법이 소개됐다. Docling으로 문서 구조화를 하고, dot product로 효율을 높이며, re-ranking으로 정확도를 개선하는 등 검색증강생성 기반 에이전트 개발 기법을 다룬다.
It was such a pleasure to share a stage with @cybette at #OpenSearchCon, and even more so to share the work of the #Docling team and how it can be integrated with #OpenSearch.
Check out the video of the talk here: https://www.youtube.com/watch?v=IqUJVGyI5to

Here's the presentation @philnash and I gave at #OpenSearchCon China about integrating #Docling with OpenSearch for advanced RAG: https://www.youtube.com/watch?v=IqUJVGyI5to
Our slides are available on sessionize: https://opensearchcon-china-2026.sessionize.com/session/1115191
Thanks to @OpenSearchProject, @krisfreedain for having us in #Shanghai!

Contrary to what I'm reading, #Claude is far worse than I remember. It often doesn't fully understand simple questions and needs nudging - and as in this case, can't solve even trivial problems which only require careful reading of the log.
It took #Kimi 10secs to solve a problem with an api call to a #docling
server crashing when accessed over its api. I used Kimi in the browser. #ClaudeCode Sonnet 4.6 had full access to the actual server and couldn't solve the issue after 30 mins of repeated attempts.
```
The root cause is clear: **Docling workers are crashing due to missing EasyOCR dependency**, causing the async jobs to fail. OpenWebUI polls for results, gets a 404 because the job died, and reports "Task result not found."
**Fix the Docling installation:**
```bash
# Stop the service
sudo launchctl unload /Library/LaunchDaemons/com.docling-serve.plist
# Install EasyOCR in the correct uv tool environment
uv tool install --upgrade docling-serve --with easyocr
# Or if that doesn't work, run the specific Python interpreter:
/Users/urisharf/.local/share/uv/tools/docling-serve/bin/python -m pip install easyocr
# Reload
sudo launchctl load /Library/LaunchDaemons/com.docling-serve.plist
```
**Alternative (if you don't need OCR):** Configure Docling to use a different OCR engine or disable it. Check Docling's env vars for `DOCLING_OCR_ENGINE` or similar.
**Verify the fix:**
```bash
curl -X POST http://mbp.lan:5001/v1/convert/source \
-H "Content-Type: application/json" \
-d '{"source": "https://arxiv.org/pdf/2206.01062.pdf"}'
```
The duplicate `libavdevice` dylibs warning is harmless noise from Gradio/CV2 conflicts, but the EasyOCR error is fatal.
```
Build Agent-Ready RAG Systems in Java with Quarkus and Docling https://www.the-main-thread.com/p/enterprise-rag-quarkus-docling-pgvector-tutorial
@karstenpe ich habe jetzt zwei Varianten der Notizbücher vom Remarkable lokal gespeichert: 1x als PDF mit Bitmap drin und 1x PDF mit Vektoren.
Welches CLI-Tool würdest du mir für #OCR empfehlen? #Tesseract?
Bei der Gelegenheit werde ich auch mal #Docling mit OCR-Option ausprobieren, das hat aber glaube ich keine eigene Engine.
Geht das auch mit #Ollama direkt aus PDF und einem lokalen LLM? Hat jemand Ideen?