[Show GN: 언더로그 - 밑줄 친 문장을 촬영하면 AI가 서재로 옮겨주는 iOS 앱
언더로그는 AI(Gemini)를 활용해 밑줄 친 문장을 촬영하면 텍스트를 추출하고 책별로 정리해주는 iOS 앱입니다. Swift와 SwiftUI, Supabase, Gemini Vision 기술을 사용하며, 배치 촬영, 백그라운드 분석, 푸시 알림 등 다양한 기능을 제공합니다.
[Show GN: 언더로그 - 밑줄 친 문장을 촬영하면 AI가 서재로 옮겨주는 iOS 앱
언더로그는 AI(Gemini)를 활용해 밑줄 친 문장을 촬영하면 텍스트를 추출하고 책별로 정리해주는 iOS 앱입니다. Swift와 SwiftUI, Supabase, Gemini Vision 기술을 사용하며, 배치 촬영, 백그라운드 분석, 푸시 알림 등 다양한 기능을 제공합니다.
Cập nhật phương pháp trích xuất văn bản từ file PDF nhiều trang? Người dùng cần giải pháp cho các file không phải tiếng Anh và bảng biểu. Thảo luận thêm #textextraction #PDFtools #dataextraction #tríchxuấtchữ #côngthứcràpchữ #thảobảng #MastodonTech
Learn how to extract text from screenshots and images with spectacle-ocr utility in Linux. Go from image to text in one step!
Full details here: https://ostechnix.com/extract-text-from-screenshots-images-linux/
#Tesseract #OCR #Spectacle-ocr #Spectacle #KDE #Linux #TextExtraction #Opensource
British Library Digital Scholarship Blog: Automatic Text Recognition in Cultural Heritage Institutions survey: a brief analysis and a published dataset. “A few months ago, we circulated a brief survey to understand how other institutions use Automatic Text Recognition and to discuss the creation of a working group on the subject… I am happy to report that the anonymised data are available […]
🚀 Just released: Find Keyword in PDFs
A Python tool to recursively search PDFs for any keyword (case-insensitive), show text snippets in context, and optionally export results to an HTML file with clickable links.
Check it out here:
🔗 https://github.com/pbeens/Find-Keyword-in-PDFs
It’s fast, lightweight, and works great for researchers, educators, and anyone managing large PDF archives.
Feedback and suggestions are welcome — would love to hear how you might use it!
Looking for a handy free tool to quickly extract text from an image in Windows?
I've started using the "Text Extractor" feature from Microsoft PowerToys. I don't think it uploads text to the cloud or uses AI: results are instant, and I'm pretty sure it's just a simple, local OCR engine.
Use Win+Shift+T to activate a cross pointer to capture the text you want and it then gets saved to your clipboard immediately.
https://learn.microsoft.com/en-us/windows/powertoys/text-extractor
Stupid Awk text-processing tricks: Reframe your record and field delimiters
A longer write-up on the text-processing stuff I've been mucking with for the past few weeks.
Changing your RS (record seps) and FS (field seps) values can be ... tremendously useful.
Stupid Awk text-processing tricks: Reframe your record and field delimiters TL;DR: sometimes changing record / field separators can be exceptionally useful. I've been wrestling with document conversions, from PDF, of what's really a set of structured data.[1] The tools for actually getting text out of PDFs has ... improved markedly over the years. The Poppler library's (https://poppler.freedesktop.org/) tools in particular. But you've still got to manage the output. And what I'm getting has semantic columns, spaces, indents, text, unicode, lions, tigers, bears... All structured within multi-paged documents. Awk's default processing model is to read a line of input at a time, and break that into fields based on whitespace. But ... you're not limited to this. There are a set of arguments and internal variables which can change all of this, as well as some ... suprisingly useful functions. The gawk(1) manpage and Gnu Awk User's Guide (https://www.gnu.org/software/gawk/manual/gaw...
#time #linkedData #data #textextraction
Database Research Group: HeidelTime Demonstration
http://heideltime.ifi.uni-heidelberg.de/heideltime/