I built a new tool: https://tools.simonwillison.net/ocr - it runs OCR against images and PDFs entirely in your browser (no file upload needed) using Tesseract.js and PDF.js

I wrote more about the tool and how I built it (with copious amounts of Claude 3 Opus and a little bit of ChatGPT) here: https://simonwillison.net/2024/Mar/30/ocr-pdfs-images/

OCR PDFs and images directly in your browser

@simon Very cool. Though I get a Heroku error when I try to go to your site ("Application error: An error occurred in the application and your page could not be served. If you are the application owner, check your logs for details. You can do this from the Heroku CLI with the command heroku logs --tail")
@aaronjschaffer Huh... it looks like it's the Mastodon effect, where sending out a link causes thousands of Mastodon servers to all hit /.well-known/webfinger?resource=acct:[email protected] at the same time - but I've survived these storms just fine in the past, not sure why it's hurting the site today
@aaronjschaffer Worked through it here, should be working OK again now https://github.com/simonw/simonwillisonblog/issues/415
Get Cloudflare to cache /.well-known/webfinger · Issue #415 · simonw/simonwillisonblog

I'm getting a huge flurry of hits to this URL right now because I tooted a link: https://simonwillison.net/.well-known/webfinger?resource=acct:[email protected] It seems to have made my site ...

GitHub