Ah, the age-old quest for the elusive "plain text" hidden within the arcane tomes of #PDF lore 😱. Introducing #OlmOCR, the magical #open-source incantation that demands a sacrifice of #JavaScript before revealing the secrets of your own documents 📜. Because clearly, nothing screams "user-friendly" like the classic techie combo of forced scripting and DIY text extraction 🔧.
https://olmocr.allenai.org/ #text-extraction #HackerNews #ngated
olmOCR – Open-Source OCR for Accurate Document Conversion

olmOCR is an open-source tool for converting PDFs to text with high accuracy, preserving reading order and supporting tables, equations, and handwriting.

OlmOCR: Open-source tool to extract plain text from PDFs — https://olmocr.allenai.org/
#HackerNews #OlmOCR #OpenSource #PDFText #Extraction #Tool #OCR #Tech #Innovation
olmOCR – Open-Source OCR for Accurate Document Conversion

olmOCR is an open-source tool for converting PDFs to text with high accuracy, preserving reading order and supporting tables, equations, and handwriting.

OlmOCR: Open-source tool to extract plain text from PDFs — https://olmocr.allenai.org/
#HackerNews #OlmOCR #OpenSource #PDFText #Extraction #Tool #OCR #Tech #Innovation
olmOCR – Open-Source OCR for Accurate Document Conversion

olmOCR is an open-source tool for converting PDFs to text with high accuracy, preserving reading order and supporting tables, equations, and handwriting.

New setup guide explains how to extract accurate text from PDFs while preserving reading order, handling tables, equations and handwriting with open-source tools on standard Mac hardware.

• 🖥️ Download and run #LMStudio as local inference server to host the #OlmOCR model

#OlmOCR Runs on #macOS with #LMStudio: Simple #PDF Text Extraction 📄

#OlmOCR by #AllenAI now works on regular #macOS systems without specialized #GPU requirements using #LMStudio as inference server

🧵 👇 #OCR #ai #llm

#開源分享 一款新出的PDF文本提取工具:olmOCR,可以從PDF和文件圖像中提取乾淨且結構化的純文本
可以處理包含複雜布局、表格、方程式以及手寫文件

處理100萬頁PDF的成本約為190美元,相當於GPT-4o 1/32的成本

以Markdown格式輸出文本,可以準確處理方程、表格和手寫內容,能在複雜的多欄文件布局中保持正確的閱讀順序

性能優於Marker、MinerU以及GOT-OCR 2.0等

專案地址: github.com/allenai/olmocr

#文件處理工具 #文件文本提取工具 #olmOCR #OCR