Mastodawn

[Show GN: 언더로그 - 밑줄 친 문장을 촬영하면 AI가 서재로 옮겨주는 iOS 앱

언더로그는 AI(Gemini)를 활용해 밑줄 친 문장을 촬영하면 텍스트를 추출하고 책별로 정리해주는 iOS 앱입니다. Swift와 SwiftUI, Supabase, Gemini Vision 기술을 사용하며, 배치 촬영, 백그라운드 분석, 푸시 알림 등 다양한 기능을 제공합니다.

https://news.hada.io/topic?id=27217

#ai #ios #textextraction #gemini #swift

언더로그 - 밑줄 친 문장을 촬영하면 AI가 서재로 옮겨주는 iOS 앱

<p>안녕하세요, 언더로그 만들고 있는 개발자입니다.</p> <p>책을 읽다 밑줄 친 문장을 나중에 다시 보고 싶어서 사진을 찍어두는 분들<br /> 많으...

GeekNews

Reddit Tech VN Bot Dec 12

Cập nhật phương pháp trích xuất văn bản từ file PDF nhiều trang? Người dùng cần giải pháp cho các file không phải tiếng Anh và bảng biểu. Thảo luận thêm #textextraction #PDFtools #dataextraction #tríchxuấtchữ #côngthứcràpchữ #thảobảng #MastodonTech

https://www.reddit.com/r/LocalLLaMA/comments/1pklo87/any_latest_methods_to_extract_text_from_pdfs_with/

N-gated Hacker News Nov 25

🤖🔍 Behold the epic gladiatorial showdown of... text extraction models. The OCR Arena lets you upload your grocery list in JPEG form to watch anonymous #algorithms duke it out like socially awkward robots in a battle of paper cuts. Who knew PDFs could be so competitive? 😂📄
https://www.ocrarena.ai/battle #textExtraction #OCRArena #competition #groceryList #paperCuts #HackerNews #ngated

OCR Arena

OCR Arena is a free playground for testing and evaluating leading foundation VLMs and open source OCR models side-by-side. Upload a document, measure accuracy, and vote for the best models on a public leaderboard.

OCR Arena

OSTechNix Oct 31, 2025

Learn how to extract text from screenshots and images with spectacle-ocr utility in Linux. Go from image to text in one step!

Full details here: https://ostechnix.com/extract-text-from-screenshots-images-linux/

#Tesseract #OCR #Spectacle-ocr #Spectacle #KDE #Linux #TextExtraction #Opensource

How To Extract Text From Screenshots And Images In Linux - OSTechNix

Learn how to extract text from screenshots and images with spectacle-ocr utility in Linux. Go from image to text in one step!

OSTechNix

ResearchBuzz: Firehose Jul 16, 2025

British Library Digital Scholarship Blog: Automatic Text Recognition in Cultural Heritage Institutions survey: a brief analysis and a published dataset. “A few months ago, we circulated a brief survey to understand how other institutions use Automatic Text Recognition and to discuss the creation of a working group on the subject… I am happy to report that the anonymised data are available […]

https://rbfirehose.com/2025/07/16/automatic-text-recognition-in-cultural-heritage-institutions-survey-a-brief-analysis-and-a-published-dataset-british-library-digital-scholarship-blog/

Automatic Text Recognition in Cultural Heritage Institutions survey: a brief analysis and a published dataset (British Library Digital Scholarship Blog) | ResearchBuzz: Firehose

ResearchBuzz: Firehose | Individual posts from ResearchBuzz

Peter Beens

Jul 10, 2025

🚀 Just released: Find Keyword in PDFs
A Python tool to recursively search PDFs for any keyword (case-insensitive), show text snippets in context, and optionally export results to an HTML file with clickable links.

Check it out here:
🔗 https://github.com/pbeens/Find-Keyword-in-PDFs

It’s fast, lightweight, and works great for researchers, educators, and anyone managing large PDF archives.

Feedback and suggestions are welcome — would love to hear how you might use it!

#OpenSource #Python #PDFTools #TextExtraction #GitHub

GitHub - pbeens/Find-Keyword-in-PDFs: Find every mention of a word in a folder full of PDFs—quickly scan, preview context, and save results to a file.

Find every mention of a word in a folder full of PDFs—quickly scan, preview context, and save results to a file. - pbeens/Find-Keyword-in-PDFs

GitHub

Philip Kiff Dec 20, 2024

Looking for a handy free tool to quickly extract text from an image in Windows?

I've started using the "Text Extractor" feature from Microsoft PowerToys. I don't think it uploads text to the cloud or uses AI: results are instant, and I'm pretty sure it's just a simple, local OCR engine.

Use Win+Shift+T to activate a cross pointer to capture the text you want and it then gets saved to your clipboard immediately.

https://learn.microsoft.com/en-us/windows/powertoys/text-extractor

#Windows #PowerToys #TextExtraction #AltText #OCR

PowerToys Text Extractor Utility for Windows

Learn how to use PowerToys Text Extractor to copy text from anywhere on your Windows screen, including images and videos. Extract text with OCR technology using simple keyboard shortcuts.

doctorambient May 15, 2024

Has anyone else noticed that GPT-4o seems at least slightly better than previous (OpenAI) models at pure text extraction tasks with unstructured text?

It seems like GPT-4 paraphrases more than the new model.

#llm #LLMs #ai #textextraction #nlp #nlu

Doc Edward Morbius ❌Dec 29, 2019

Stupid Awk text-processing tricks: Reframe your record and field delimiters

A longer write-up on the text-processing stuff I've been mucking with for the past few weeks.

Changing your RS (record seps) and FS (field seps) values can be ... tremendously useful.

https://joindiaspora.com/posts/16861078

#awk #gawk #scripting #TextProcessing #TextExtraction

Stupid Awk text-processing tricks: Reframe your record and field d...

Stupid Awk text-processing tricks: Reframe your record and field delimiters TL;DR: sometimes changing record / field separators can be exceptionally useful. I've been wrestling with document conversions, from PDF, of what's really a set of structured data.[1] The tools for actually getting text out of PDFs has ... improved markedly over the years. The Poppler library's (https://poppler.freedesktop.org/) tools in particular. But you've still got to manage the output. And what I'm getting has semantic columns, spaces, indents, text, unicode, lions, tigers, bears... All structured within multi-paged documents. Awk's default processing model is to read a line of input at a time, and break that into fields based on whitespace. But ... you're not limited to this. There are a set of arguments and internal variables which can change all of this, as well as some ... suprisingly useful functions. The gawk(1) manpage and Gnu Awk User's Guide (https://www.gnu.org/software/gawk/manual/gaw...

La Bécasse Apr 6, 2018

#time #linkedData #data #textextraction

Database Research Group: HeidelTime Demonstration
http://heideltime.ifi.uni-heidelberg.de/heideltime/