Large-scale online deanonymization with LLMs

"We show that large language models can be used to perform at-scale deanonymization. With full Internet access, our agent can re-identify Hacker News users and Anthropic Interviewer participants at high precision, given pseudonymous online profiles and conversations alone, matching what would take hours for a dedicated human investigator. We then design attacks for the closed-world setting. Given two databases of pseudonymous individuals, each containing unstructured text written by or about that individual, we implement a scalable attack pipeline that uses LLMs to: (1) extract identity-relevant features, (2) search for candidate matches via semantic embeddings, and (3) reason over top candidates to verify matches and reduce false positives. Compared to prior deanonymization work (e.g., on the Netflix prize) that required structured data or manual feature engineering, our approach works directly on raw user content across arbitrary platforms. We construct three datasets with known ground-truth data to evaluate our attacks. The first links Hacker News to LinkedIn profiles, using cross-platform references that appear in the profiles. Our second dataset matches users across Reddit movie discussion communities; and the third splits a single user’s Reddit history in time to create two pseudonymous profiles to be matched. In each setting, LLM-based methods substantially outperform classical baselines, achieving up to 68% recall at 90% precision compared to near 0% for the best non-LLM method. Our results show that the practical obscurity protecting pseudonymous users online no longer holds and that threat models for online privacy need to be reconsidered."

https://arxiv.org/html/2602.16800v1

#AI #GenerativeAI #LLMs #Anonymity #Privacy #Deanonymization

Large-scale online deanonymization with LLMs

@remixtures

This is exactly what they will be used for.

Here's a correlation:

1 - DHS is building a number of high capacity concentration camps for "illegal immigrants". Although we know it's about ethnic and demographic engineering, since they're also capturing legal immigrants, citizens, etc., based on racial profiles.
Next, they will be used to imprison dissidents, aka " radical left wing agitators" or "antisemitic" or "terrorists", whatever label suits best.

2 - Simultaneously, there's a great expanse of massive data centres guzzling power and water for what appears to be quite an unprofitable endeavour. Kind of a mistery, right?

But what if a totalitarian government is willing to pay top dollar to be able to identify, track and monitor dissidents, disseminate effective propaganda at ease, create believable content to " flood the zone", effectively weaponising computing power to shape not just a narrative, but perceived reality?

I feel that this, along with media concentration among loyal oligarchs (Tiktok, X, Meta, Paramount, Warner Brothers, Fox), is the perfect trifecta of propaganda and surveillance to crush any potential resistance - or at least to muddy the waters enough to sow doubt and incite paralysis.

@remixtures

That, and of course, our old friend fear and self-censorship. How many will change their behaviour and discourse online, precisely because they know they can be tracked and identified, despite any countermeasures or caution?
How many of us in Europe and the US avoid referring to Gaza as a genocide, for example? From the top echelon of government representatives to humble working class protesters.

As Larry Ellison said: "Everyone will be on their best behaviour".