*Google acaba de cambiar cómo se sacan datos de documentos.

Ha lanzado #LangExtract: una herramienta que convierte textos largos y desordenados en datos claros y verificables.

Es gratis y open-source 👇
https://bsky.app/profile/jesusgallent.com/post/3meja22duo22v

Jesús Gallent (@jesusgallent.com)

Google acaba de cambiar cómo se sacan datos de documentos. Ha lanzado LangExtract: una herramienta que convierte textos largos y desordenados en datos claros y verificables. Es gratis y open-source 👇

Bluesky Social

🧠 #LangExtract è una libreria Python open source di #Google pensata per estrarre informazioni strutturate da testi non strutturati usando i #LLM
👉 I dettagli: https://www.linkedin.com/posts/alessiopomaro_langextract-google-llm-activity-7426884315931668480-ROEI

___ 
✉️ 𝗦𝗲 𝘃𝘂𝗼𝗶 𝗿𝗶𝗺𝗮𝗻𝗲𝗿𝗲 𝗮𝗴𝗴𝗶𝗼𝗿𝗻𝗮𝘁𝗼/𝗮 𝘀𝘂 𝗾𝘂𝗲𝘀𝘁𝗲 𝘁𝗲𝗺𝗮𝘁𝗶𝗰𝗵𝗲, 𝗶𝘀𝗰𝗿𝗶𝘃𝗶𝘁𝗶 𝗮𝗹𝗹𝗮 𝗺𝗶𝗮 𝗻𝗲𝘄𝘀𝗹𝗲𝘁𝘁𝗲𝗿: https://bit.ly/newsletter-alessiopomaro

#AI #GenAI #GenerativeAI #IntelligenzaArtificiale #LLM 

Tech with Mak (@techNmak)

LangExtract라는 오픈소스·무료 문서 추출 도구가 소개되었습니다. 비정형 텍스트에서 구조화된 데이터를 추출하고, 모든 엔티티를 정확한 원문 위치에 매핑하며 100페이지 이상의 문서도 처리한다고 주장합니다. 기존 수만 달러짜리 엔터프라이즈 도구보다 우수하다고 선전하며 문서 추출 시장에 큰 영향을 미칠 가능성이 있습니다.

https://x.com/techNmak/status/2020867240753819983

#langextract #documentextraction #nlp #opensource

Tech with Mak (@techNmak) on X

Google just killed the document extraction industry. LangExtract: Open-source. Free. Better than $50K enterprise tools. What it does: → Extracts structured data from unstructured text → Maps EVERY entity to its exact source location → Handles 100+ page documents with high

X (formerly Twitter)
Le site de #Korben: #LangExtract - La nouvelle pépite de #Google pour extraire des données structurées avec l' #IA korben.info/langextract-...

LangExtract - La nouvelle pépi...
LangExtract - La nouvelle pépite de Google pour extraire des données structurées avec l'IA

Il y a des combats comme cela auxquels pas grand monde ne pense et qui pourtant sont très importants. Je parle évidemment de la lutte contre le chaos du texte non structuré. Si vous avez déjà essayé d'extraire des données propres d'un tas de PDF (après OCR), de rapports ou de notes griffonnées, vous voyez de quoi je parle : c'est l'enfer ! (oui j'aime me faire du mal en tentant des regex impossibles).

Le site de Korben
LangExtract - La nouvelle pépite de Google pour extraire des données structurées avec l'IA

Il y a des combats comme cela auxquels pas grand monde ne pense et qui pourtant sont très importants. Je parle évidemment de la lutte contre le chaos du texte non structuré. Si vous avez déjà essayé d'extraire des données propres d'un tas de PDF (après OCR), de rapports ou de notes griffonnées, vous voyez de quoi je parle : c'est l'enfer ! (oui j'aime me faire du mal en tentant des regex impossibles).

Le site de Korben

#ITByte: #LangExtract is a new open-source Python library from Google that uses large language models (LLMs) to extract structured information from unstructured text.

Instead of requiring domain-specific training, it uses prompts and examples to instruct LLMs on how to structure the data.

https://knowledgezone.co.in/posts/Google-LangExtract-68ee737e35590fd1b52bf506

Discover how LangExtract turns URLs and plain text lists into structured data using LLMs. From Gutenberg books to API endpoints, the open‑source toolkit shows seamless extraction with gpt‑4o, Gemini 2.5 Flash, and Ollama. See the code, benchmarks, and tips for your own projects. #LangExtract #LLM #gpt4o #Ollama

🔗 https://aidailypost.com/news/how-langextract-uses-urls-text-lists-data-extraction-llms

Meet #LangExtract - an #opensource #Python library!

Developers can now extract structured information from unstructured text using large language models such as the Gemini models.

Learn moreon #InfoQ 👉 https://bit.ly/45a1krY

#Google #LLMs #AI

🧠 #Google ha rilasciato #LangExtract, una libreria Python open-source che trasforma testo non strutturato in dati strutturati.

👉 I dettagli: https://www.linkedin.com/posts/alessiopomaro_google-langextract-llm-activity-7359097710513111040-oTij

___ 
✉️ 𝗦𝗲 𝘃𝘂𝗼𝗶 𝗿𝗶𝗺𝗮𝗻𝗲𝗿𝗲 𝗮𝗴𝗴𝗶𝗼𝗿𝗻𝗮𝘁𝗼/𝗮 𝘀𝘂 𝗾𝘂𝗲𝘀𝘁𝗲 𝘁𝗲𝗺𝗮𝘁𝗶𝗰𝗵𝗲, 𝗶𝘀𝗰𝗿𝗶𝘃𝗶𝘁𝗶 𝗮𝗹𝗹𝗮 𝗺𝗶𝗮 𝗻𝗲𝘄𝘀𝗹𝗲𝘁𝘁𝗲𝗿: https://bit.ly/newsletter-alessiopomaro 

#AI #GenAI #GenerativeAI #IntelligenzaArtificiale #LLM 

[LangExtract](https://developers.googleblog.com/en/introducing-langextract-a-gemini-powered-information-extraction-library/) has got me curious, but I don't get what makes it different from a [spacy-llm/prodigy](https://prodi.gy/docs/large-language-models) setup. Is it just that I am spared the effort of chunking long input and/or constructing output JSON from entities and offsets by writing the corresponding python code myself?...

Ah, one more difference is that langextract is #OpenSource whereas prodigy is not (?). (On the other hand, prodigy has a better integration with a correction+training workflow.)

#llm #google #langextract #nlp #spacy #prodigy #ner

Introducing LangExtract: A Gemini powered information extraction library- Google Developers Blog

Explore LangExtract: a Gemini-powered, open-source Python library for reliable, structured information extraction from unstructured text with precise source grounding.