Mastodawn

Simon Willison Mar 30, 2024

I built a new tool: https://tools.simonwillison.net/ocr - it runs OCR against images and PDFs entirely in your browser (no file upload needed) using Tesseract.js and PDF.js

I wrote more about the tool and how I built it (with copious amounts of Claude 3 Opus and a little bit of ChatGPT) here: https://simonwillison.net/2024/Mar/30/ocr-pdfs-images/

OCR PDFs and images directly in your browser

Show thread

8thcross

@simon interested to hear more about your analysis on "very promising results with Gemini Pro 1.5, Claude 3 and GPT-4 Vision recently—I’ll write more about that soon. But those tools are still inconvenient for most people to use."

Show thread

Simon Willison Mar 30, 2024

@8thcross this is the issue to watch https://github.com/simonw/llm/issues/331

Multi-modal support for vision models such as GPT-4 vision · Issue #331 · simonw/llm

https://platform.openai.com/docs/guides/vision I think this is best handled by command line options --image and --image-urls to either encode and pass as base64, or to pass a URL.

GitHub