Threats by #AI:
“We show that large language models can be used to perform
at-scale #deanonymization. With full Internet access, our agent
can re-identify Hacker News users and Anthropic Interviewer
participants at high precision, given #pseudonymous online
profiles and conversations alone, matching what would take
hours for a dedicated human investigator.”

“Deanonymization is a two-step process at heart, involving
profiling an anonymous person from their posts, and then
matching them to a known identity. It’s well-known that large
language models can infer personal attributes from
text on online forums. Given this, it makes sense to
ask: how good are LLMs at full end-to-end deanonymization,
and is this a practical threat to pseudonymous accounts?”
https://arxiv.org/pdf/2602.16800v2

Wes Roth (@WesRoth)

새로운 연구가 발표되어, AI 에이전트(대형 언어 모델 기반)가 익명 소셜 계정을 대규모로 역추적해 신원을 밝혀낼 수 있음을 보여주었습니다. 산발적이고 무해해 보이는 게시물의 단서들을 스크래핑해 자율적으로 연결함으로써 익명성·프라이버시 침해 위험이 크게 증가한다는 경고성 연구 결과입니다.

https://x.com/WesRoth/status/2030916504410272227

#deanonymization #privacy #llm #aisafety

Wes Roth (@WesRoth) on X

A terrifying new study reveals that AI agents can now easily unmask anonymous social media accounts at scale. By simply scraping scattered, seemingly harmless details from pseudonymous posts, large language models can autonomously connect the dots and link them to a user's

X (formerly Twitter)

[en] Paper: LLMs can be used to perform at-scale #deanonymization

"With full Internet access, our #agent can re-identify Hacker News users and Anthropic Interviewer participants at high precision, given #pseudonymous online profiles and conversations alone, matching what would take hours for a dedicated human investigator."

"Our results show that the practical #obscurity protecting pseudonymous users online no longer holds and that #threat models for online #privacy need to be reconsidered."

"We demonstrate that LLMs fundamentally change the picture, enabling fully automated deanonymization attacks that operate on #unstructured text at scale."

Note: also check paragraphs "Potential harms" and "Potential benefits".

https://arxiv.org/html/2602.16800

#llm #research

Large-scale online deanonymization with LLMs

"TL;DR: We show that LLM agents can figure out who you are from your anonymous online posts. Across Hacker News, Reddit, LinkedIn, and anonymized interview transcripts, our method identifies users with high precision – and scales to tens of thousands of candidates.

While it has been known that individuals can be uniquely identified by surprisingly few attributes, this was often practically limited. Data is often only available in unstructured form and deanonymization used to require human investigators to search and reason based on clues. We show that from a handful of comments, LLMs can infer where you live, what you do, and your interests – then search for you on the web. In our new research, we show that this is not only possible but increasingly practical."

https://simonlermen.substack.com/p/large-scale-online-deanonymization

#AI #GenerativeAI #Anonymity #Deanonymization #LLMs

Large-Scale Online Deanonymization with LLMs

We measure the capabilities of LLMs to deanonymize users online.

Simon Lermen
LLMs can unmask pseudonymous users at scale with surprising accuracy https://arstechni.ca/5sbv #deanonymization #Security #privacy #Biz&IT #LLMs #AI
LLMs can unmask pseudonymous users at scale with surprising accuracy

Pseudonymity has never been perfect for preserving privacy. Soon it may be pointless.

Ars Technica
Large-scale online deanonymization with LLMs

We show that large language models can be used to perform at-scale deanonymization. With full Internet access, our agent can re-identify Hacker News users and Anthropic Interviewer participants at high precision, given pseudonymous online profiles and conversations alone, matching what would take hours for a dedicated human investigator. We then design attacks for the closed-world setting. Given two databases of pseudonymous individuals, each containing unstructured text written by or about that individual, we implement a scalable attack pipeline that uses LLMs to: (1) extract identity-relevant features, (2) search for candidate matches via semantic embeddings, and (3) reason over top candidates to verify matches and reduce false positives. Compared to classical deanonymization work (e.g., on the Netflix prize) that required structured data, our approach works directly on raw user content across arbitrary platforms. We construct three datasets with known ground-truth data to evaluate our attacks. The first links Hacker News to LinkedIn profiles, using cross-platform references that appear in the profiles. Our second dataset matches users across Reddit movie discussion communities; and the third splits a single user's Reddit history in time to create two pseudonymous profiles to be matched. In each setting, LLM-based methods substantially outperform classical baselines, achieving up to 68% recall at 90% precision compared to near 0% for the best non-LLM method. Our results show that the practical obscurity protecting pseudonymous users online no longer holds and that threat models for online privacy need to be reconsidered.

arXiv.org

Large-scale online deanonymization with LLMs
From Cornel University Computer Science

We show that large language models can be used to perform at-scale deanonymization. With full Internet access, our agent can re-identify Hacker News users and Anthropic Interviewer participants at high precision, given pseudonymous online profiles and conversations alone, matching what would take hours for a dedicated human investigator. We then design attacks for the closed-world setting. Given two databases of pseudonymous individuals, each containing unstructured text written by or about that individual, we implement a scalable attack pipeline that uses LLMs to: (1) extract identity-relevant features, (2) search for candidate matches via semantic embeddings, and (3) reason over top candidates to verify matches and reduce false positives. Compared to classical deanonymization work (e.g., on the Netflix prize) that required structured data, our approach works directly on raw user content across arbitrary platforms. We construct three datasets with known ground-truth data to evaluate our attacks. The first links Hacker News to LinkedIn profiles, using cross-platform references that appear in the profiles. Our second dataset matches users across Reddit movie discussion communities; and the third splits a single user's Reddit history in time to create two pseudonymous profiles to be matched. In each setting, LLM-based methods substantially outperform classical baselines, achieving up to 68% recall at 90% precision compared to near 0% for the best non-LLM method. Our results show that the practical obscurity protecting pseudonymous users online no longer holds and that threat models for online privacy need to be reconsidered.

Simon Lermen, Daniel Paleka, Joshua Swanson, Michael Aerni, Nicholas Carlini, Florian Tramè

https://arxiv.org/abs/2602

#computerscience #cornelluniversity #AiResearch #privacy #anonymity #llm #HackNews #athropic #pseudonymity
#deanonymization

You've got nothing to hide, do you?

»We show that large language models can be used to perform at-scale #deanonymization. With full Internet access, our agent can re-identify Hacker News users and Anthropic Interviewer participants at high precision, given pseudonymous online profiles and conversations alone, matching what would take hours for a dedicated human investigator. We then design attacks for the closed-world setting. Given two databases of pseudonymous individuals, each containing unstructured text written by or about that individual, we implement a scalable attack pipeline«

https://arxiv.org/abs/2602.16800

"#AI" #privacy #pseudonymity #anonymity #LLM

Large-scale online deanonymization with LLMs

We show that large language models can be used to perform at-scale deanonymization. With full Internet access, our agent can re-identify Hacker News users and Anthropic Interviewer participants at high precision, given pseudonymous online profiles and conversations alone, matching what would take hours for a dedicated human investigator. We then design attacks for the closed-world setting. Given two databases of pseudonymous individuals, each containing unstructured text written by or about that individual, we implement a scalable attack pipeline that uses LLMs to: (1) extract identity-relevant features, (2) search for candidate matches via semantic embeddings, and (3) reason over top candidates to verify matches and reduce false positives. Compared to classical deanonymization work (e.g., on the Netflix prize) that required structured data, our approach works directly on raw user content across arbitrary platforms. We construct three datasets with known ground-truth data to evaluate our attacks. The first links Hacker News to LinkedIn profiles, using cross-platform references that appear in the profiles. Our second dataset matches users across Reddit movie discussion communities; and the third splits a single user's Reddit history in time to create two pseudonymous profiles to be matched. In each setting, LLM-based methods substantially outperform classical baselines, achieving up to 68% recall at 90% precision compared to near 0% for the best non-LLM method. Our results show that the practical obscurity protecting pseudonymous users online no longer holds and that threat models for online privacy need to be reconsidered.

arXiv.org

RE: https://tldr.nettime.org/@remixtures/116148578797801271

#privacy #identity #deanonymization

This is going to more and more problematic as we go. And I guarantee many companies are already using this to identify you or any individual of their interest.

This is not surprising though, but it reinforces that we must have better methods for privacy protection and data anonymization. Changing names and removing PII information is not enough.

#metadata matters more than ever.

Large-scale online deanonymization with LLMs

"We show that large language models can be used to perform at-scale deanonymization. With full Internet access, our agent can re-identify Hacker News users and Anthropic Interviewer participants at high precision, given pseudonymous online profiles and conversations alone, matching what would take hours for a dedicated human investigator. We then design attacks for the closed-world setting. Given two databases of pseudonymous individuals, each containing unstructured text written by or about that individual, we implement a scalable attack pipeline that uses LLMs to: (1) extract identity-relevant features, (2) search for candidate matches via semantic embeddings, and (3) reason over top candidates to verify matches and reduce false positives. Compared to prior deanonymization work (e.g., on the Netflix prize) that required structured data or manual feature engineering, our approach works directly on raw user content across arbitrary platforms. We construct three datasets with known ground-truth data to evaluate our attacks. The first links Hacker News to LinkedIn profiles, using cross-platform references that appear in the profiles. Our second dataset matches users across Reddit movie discussion communities; and the third splits a single user’s Reddit history in time to create two pseudonymous profiles to be matched. In each setting, LLM-based methods substantially outperform classical baselines, achieving up to 68% recall at 90% precision compared to near 0% for the best non-LLM method. Our results show that the practical obscurity protecting pseudonymous users online no longer holds and that threat models for online privacy need to be reconsidered."

https://arxiv.org/html/2602.16800v1

#AI #GenerativeAI #LLMs #Anonymity #Privacy #Deanonymization

Large-scale online deanonymization with LLMs