Chinni Sree Addagalla

@chinnisree
0 Followers
0 Following
5 Posts
MS in Computer Science @ University of Oklahoma
Interested in AI, LLMs and Open Source
Outreachy 2026 applicant | Contributing to RamaLama RAG
Learning by building

What I learned:

Font style matters MORE than language support!

Old Telugu manuscript → all engines struggled
Modern Telugu novel → all engines worked well

And surprisingly EasyOCR beat Tesseract on modern Telugu!

Full writeup and all outputs here
https://github.com/ChinniSree/Docling-processing-multilingual-documents

GitHub - ChinniSree/Docling-processing-multilingual-documents: Exploring multilingual OCR with Docling, Tesseract EasyOCR and ocrmac

Exploring multilingual OCR with Docling, Tesseract EasyOCR and ocrmac - ChinniSree/Docling-processing-multilingual-documents

GitHub

What I tested:

→ Tesseract — classical OCR, 100+ languages
→ EasyOCR — AI-based, modern approach
→ ocrmac — Apple's built-in Vision framework

Documents: French+English textbook, Italian reader,
old Telugu manuscript and a modern Telugu novel.

Why does this matter?

Ramalama uses Docling to convert documents into text before
feeding them into AI models. If OCR fails, the AI gets garbage.

So getting multilingual OCR right is really important for
building good RAG pipelines in a global community like Fedora.

This task gave me my first real experience with OCR technology
and I came out of it with findings I did not see coming at all!

I used Docling to process scanned documents in French, Italian
and Telugu using three different OCR engines.

Here is what happened 👇

🚀 Just published another blog as part of my Outreachy journey!

I wrote a step-by-step guide to help beginners get started with Outreachy — from choosing the right project to making your first contribution and avoiding common mistakes.

https://chinnisree.hashnode.dev/a-step-by-step-guide-to-getting-started-with-outreachy

Would love to hear your thoughts! 😊

#Outreachy #OpenSource #Fedora