heard about #docling at today's #OpenSource at Siemens
https://www.docling.ai/
https://github.com/docling-project/docling
https://pypi.org/project/docling/#wroBookMark
Docling converts messy documents into structured data and simplifies downstream document and AI processing by detecting tables, formulas, reading order, OCR, and much more.
The annual event series by Siemens for all topics around open source software. Learn more at opensource.siemens.com