Revisiting 2 of the 5 docs from the Snowden leaks that mention 'cookies'.

GCHQ 2009 on 'target detection identifiers':
https://snowden.glendon.yorku.ca/items/show/188/

NSA 2011 on 'selector types':
https://snowden.glendon.yorku.ca/items/show/172

...featuring cookie/browser IDs from Google/Doubleclick, Facebook, Microsoft and many more.

It's breathtaking how the surveillance marketing industry has still managed to claim for many years that unique personal identifiers processed in the web browser are 'anonymous', and sometimes still does.

Target Detection Identifiers · Snowden Archive

Browser-based personal identifiers aka 'target detection identifiers' are 'unique and persistent for a user/machine', and they are a 'SIGINT standardised code', according to GCHQ (2009).

Ryan Gallagher reported on Snowden docs that mention browser/cookie IDs aka 'target detection identifiers' in 2014 and 2015:
https://theintercept.com/2015/09/25/gchq-radio-porn-spies-track-web-users-online-identities/
https://theintercept.com/2014/12/13/belgacom-hack-gchq-inside-story/

From Radio to Porn, British Spies Track Web Users’ Online Identities

Top-secret documents from whistleblower Edward Snowden expose UK eavesdropping agency GCHQ's attempts to create world's largest mass surveillance system.

The Intercept

This slide shows that the GCHQ stored 'all TDIs seen in last 6 months' in 'bulk' in their 'MUTANT BROTH' system.

'TDI <-> website correlations' were stored in the 'KARMA POLICE' system, which was reported on a lot and has an extra Wikipedia entry:
https://en.wikipedia.org/wiki/Karma_Police_(surveillance_programme)

They also stored 'TDI correlations' in another system they referred to as 'AUTO ASSOC'. I wonder has anyone ever reported on this? Was this about linking identifiers associated with the same person/entity to each other?

Karma Police (surveillance programme) - Wikipedia

Not sure whether this report was referring to data on a single person?

In addition to Facebook's famous 'datr' identifier, it shows 'utma' and 'utmz' identifiers processed by Google Analytics on YouPorn.

It also shows non-persistent session IDs, which were possibly not very useful.

The GCHQ even specified in detail whether browser-based personal identifiers refer to a user account or device, and whether they are processed in a web request or in a browser cookie.

Ironically, I also spent quite some time doing the same work - categorizing browser and cookie identifiers - much later, starting in 2018/19. And I spent less than 250 hours per ID 🤖

In 2009, they 'discovered' only '70 distinct TDI types', though. Today, there are thousands of distinct browser/cookie IDs.

Another slide explains that they collected 18 billion 'target detection identifiers' in the period between 25 Dec 2007 and 20 Jun 2008.

This was certainly a large number in 2007/8. There's a much larger number of digital identifiers per person available today.

And yep, these revelations led to most of today's HTTP traffic being encrypted. Still, many entities have access to browser/cookie IDs, and some of them are now accessible via trillions of RTB bid requests in digital marketing, for many.

And then there's this perhaps better known 2011 NSA slide on 'selector types'.

A few major cookie IDs.

'Browser tags', was this related to malware/adware?

For mobile phones, they accessed only IMEIs and 'Apple UDID' in 2011.

Google AAID and Apple IDFA were introduced later (2014/16), and then became the most pervasive digital identifiers for digital tracking and profiling across a myriad of entities. Google/Apple have never been held accountable till today.

And there was Bluetooth, already.

Not least, the GCHQ had so much fun discussing 'target detection identifiers' (TIDs) and mass surveillance, oh my 🥴

There's another Snowden doc from 2011 that provides more information about the GCHQ's AUTO ASSOC system, which calculates correlations between 'target detection identifiers' (TDIs):
https://maths.ed.ac.uk/~tl/docs/Problem-Book-Redacted.pdf

I think the doc was first published by
@pluralistic in 2016:
https://boingboing.net/2016/02/02/doxxing-sherlock-3.html

Seems like the GCHQ operated a kind of probabilistic ID graph that aims to link cookie IDs, device IDs, email addresses and other TDI identifiers based on communication, timing and geolocation behavior.

In the doc, they think about how to improve the ID graph by adding further data sources ("SIGINT truthed datasets").

This includes potential internal collaborations that aim to "automatically determine the relationship between entities based on communication content" and even based on finding "triggers in audio content".

Btw. What inspired me to revisit these docs is @byrontau's excellent book Means of Control, which not only details how US defense, intelligence and law enforcement buy commercial data from digital marketing but also provides deep historical context, tracing back to early-2000s debates on Total Information Awareness (TIA).

While I was casually following debates on mass surveillance from Echelon and TIA to Snowden, I didn't seriously start to investigate commercial data practices until 2014/15.

@wchr @byrontau
I don’t care how silly I look in a #mask, because I understand respiratory contagion.

I don’t care how tinfoil I appear for using #tor #linux and #monero, because I know how #internet protocols work.