I built a new tool: https://tools.simonwillison.net/ocr - it runs OCR against images and PDFs entirely in your browser (no file upload needed) using Tesseract.js and PDF.js

I wrote more about the tool and how I built it (with copious amounts of Claude 3 Opus and a little bit of ChatGPT) here: https://simonwillison.net/2024/Mar/30/ocr-pdfs-images/

OCR PDFs and images directly in your browser

Johannes Baiter (@[email protected])

Attached: 1 video 👀 #demotime Ever wish you could search through a #IIIF manifest, but the provider had neither #OCR, nor a #ContentSearch endpoint available? 🪄 You can soon help yourself: Fully client-side OCR and Content Search + Autocomplete for Mirador 3. And it survives page reloads! ✨

OpenBiblio.Social