**Thơm mới-estdeveloper! 🚀 Chào mừng extractous-go – thư viện trích xuất tài liệu nhanh, hỗ trợ OCR!**

🔹 Trích xuất từ PDF, DOCX, XLSX, HTML…
🔹 OCR với Tesseract (danh ảnh Gimick).
🔹 API stream (tiết nén, tiết kiệm nhớ).
🔹 Hoạt động trên Windows/macOS/Linux.

Thử dùng ngay, phản hồi!“https://github.com/rahulpoonia29/extractous-go”

#documentextraction #OCR #Golang #FextractionGO #tecnologiatáán #tríchxéttài liệu #OCR #gohaxe

https://www.reddit.com/r/SideProject/comments/1oakd6

🤖🎉 Breaking news: A budget model outsmarts the #AI giants in document extraction! Apparently, $196 is the price of embarrassing #OpenAI with a "fine-tuned" solution that sounds like a model from a sci-fi B movie. Who knew cutting-edge tech was one step away from being outclassed by a bargain bin special? 😂🔍
https://arxiv.org/abs/2509.22906 #Outsmarted #BudgetModel #DocumentExtraction #TechNews #HackerNews #ngated
Extract-0: A Specialized Language Model for Document Information Extraction

This paper presents Extract-0, a 7-billion parameter language model specifically optimized for document information extraction that achieves performance exceeding models with parameter counts several orders of magnitude larger. Through a novel combination of synthetic data generation, supervised fine-tuning with Low-Rank Adaptation (LoRA), and reinforcement learning via Group Relative Policy Optimization (GRPO), Extract-0 achieves a mean reward of 0.573 on a benchmark of 1,000 diverse document extraction tasks, outperforming GPT-4.1 (0.457), o3 (0.464), and GPT-4.1-2025 (0.459). The training methodology employs a memory-preserving synthetic data generation pipeline that produces 280,128 training examples from diverse document sources, followed by parameterefficient fine-tuning that modifies only 0.53% of model weights (40.4M out of 7.66B parameters). The reinforcement learning phase introduces a novel semantic similarity-based reward function that handles the inherent ambiguity in information extraction tasks. This research demonstrates that task-specific optimization can yield models that surpass general-purpose systems while requiring substantially fewer computational resource.

arXiv.org