[en] Paper: LLMs can be used to perform at-scale #deanonymization

"With full Internet access, our #agent can re-identify Hacker News users and Anthropic Interviewer participants at high precision, given #pseudonymous online profiles and conversations alone, matching what would take hours for a dedicated human investigator."

"Our results show that the practical #obscurity protecting pseudonymous users online no longer holds and that #threat models for online #privacy need to be reconsidered."

"We demonstrate that LLMs fundamentally change the picture, enabling fully automated deanonymization attacks that operate on #unstructured text at scale."

Note: also check paragraphs "Potential harms" and "Potential benefits".

https://arxiv.org/html/2602.16800

#llm #research

Large-scale online deanonymization with LLMs

Cerebras (@cerebras)

10편의 학술 논문을 파싱·분석·종합해 10초 이내에 처리하는 연구 에이전트를 발표했습니다. 이 에이전트는 Cerebras Inference와 Unstructured와 함께 구축되어 전체 문헌 리뷰를 빠르게 처리해 사용자가 특정 주제의 전문가가 되는 속도를 높여준다고 소개합니다.

https://x.com/cerebras/status/2026749042907304120

#researchagent #cerebras #unstructured #literaturereview #ai

Cerebras (@cerebras) on X

10 academic papers. Parsed, analyzed, and synthesized. In under 10 seconds. We built a research agent with @Cerebras Inference and @Unstructured that processes entire literature reviews so you become a subject expert faster

X (formerly Twitter)
I’m starting to think that this is the easiest way to describe #Web3. #Structureddata uses a spreadsheet-like format that machines read with speed and confidence. #Unstructured data is everything else and needs formatting before machines can search, group, and analyze it.

#BackToSchool #Recess #GetOutside #Play #Unstructured

In California, "EC Section 49056 prohibits school staff members from restricting a student’s recess unless there is an immediate threat to the physical safety of the student or the physical safety of one or more of the student’s peers. If a student’s recess period is denied, school staff members shall make all reasonable efforts to resolve such threats and minimize exclusion from recess (EC Section 49056(a)(4)).” https://www.cde.ca.gov/fg//it/sb291letter.asp / https://www.kpbs.org/news/health/2024/01/30/for-the-first-time-california-law-will-protect-students-right-to-recess

Other states also address, e.g., in Texas, "Senate Bill 25 bars schools from taking away recess as punishment for younger students.” https://www.houstonpublicmedia.org/articles/news/texas/2025/08/31/529807/more-than-830-new-texas-laws-take-effect-sept-1-heres-whats-changing/

PDFをLLMで解析する前処理のパーサーは何が良いのか?(pdfminer, PyMuPDF, pypdf, Unstructured)
https://qiita.com/cyberBOSE/items/142cdf91e0ee20b3114f?utm_campaign=popular_items&utm_medium=feed&utm_source=popular_items

#qiita #Python #pdfminer #PyMuPDF #pyPDF #Unstructured

PDFをLLMで解析する前処理のパーサーは何が良いのか?(pdfminer, PyMuPDF, pypdf, Unstructured) - Qiita

現状の LLM は PDF ファイルを直接処理出来ない為、予めなんらかのプレーンテキスト形式に変換する必要があります。(PDFを読める各チャットアプリも内部的には何らかの手段でプレーンテキスト形式…

Qiita

I'm reading, "Assisted Serendipity, Random Coffee and the power of the unstructured meeting" https://emilywebber.co.uk/assisted-serendipity-random-coffee-and-the-power-of-the-unstructured-meeting/ by @ewebber from March 2020.

#meetings
#serendipity
#unstructured
#randomcoffee

Assisted Serendipity, Random Coffee and the power of the unstructured meeting

We have some of the best conversations when they are unstructured and happen by chance. That moment when you bump into someone when you are out and about, and they happen to mention something that really helps you. Or you sit down to lunch with a work colleague, and it sparks a great new idea.

Emily Webber

testing it with Mark Twain's "The Adventures of Tom Sawyer"

#AI #GenAI #OpenAI #Langchain #Unstructured #JupyterNotebooks #Python

I've been wanting to try out #maturin for awhile now, and with some of the #LLM tinkering I've done at work, I finally had an excellent use case for it.
Its an opinionated #rust implementation of splitting #langchain documents as well as some #unstructured post processors. For cleaning and splitting, I've clocked it at between 40 and 75x faster than the python implementation, and on my machine it can clean and split 25,000 documents in a second.

Check it out at https://github.com/cam-barts/rs_document

GitHub - cam-barts/rs_document: A opinionated Rust implementation of various common functions of LangChain's Document model as well as Unstructured.io's post processors.

A opinionated Rust implementation of various common functions of LangChain's Document model as well as Unstructured.io's post processors. - GitHub - cam-barts/rs_document: A opinionated Rus...

GitHub
And this is also interesting for the upcoming #textplusplenary @Textplus regarding the integration of #unstructured #data

The usually interesting Paul Graham has written a longgg piece (I guess c. 8k-10k words) which meanders a bit; it is a bit unstructured, and is a gathering of thoughts around getting ‘great work done’. However - that’s the point: it’s a really interesting and worthwhile read, with lots of nuggets of wisdom throughout. The journey is the point!

http://paulgraham.com/greatwork.html
#paulgraham #unstructured #work #wisdom

How to Do Great Work