A Word on Data Anonymization 🔥🔒:

Data anonymization is the
process of removing any identifiable information to insure a piece of data cannot be linked to an individual anymore.

Anyone using this technique must be extremely careful about it.

Only removing the obvious identifiers such as name and email might not be enough. When applying anonymization techniques, it is vital to consider the data in context.

Here are a couple of examples
to illustrate my point 🧵👇:

1/4 #DataAnonymization #Privacy #TinyPrivacyTip

CASE 1 ✅
It's okay to do this in many cases:

IP addresses (Internet Protocol adresses) are collected by an app. Each IP address can be linked to an individual and a location. For anonymization, the IP addresses are completely deleted and only the information about the country of origin is kept.

The data is then detached from the user and aggregated to keep only the percentage of usage per country. It is impossible to get a list of all users and IP addresses from this because everything else has been thoroughly deleted. The only final data that is kept is the percentage of this app's usage per country.

This can be an acceptable example of data anonymization in many situations.

2/4 #DataAnonymization #Privacy

CASE 2 ❌
It is not enough to only do this:

An AI feature keeps the content of text sent to it for improvement purposes. All directly identifiable information is removed from the text before storage. For example, names, emails, phone numbers, addresses are automatically removed. The rest of the text is stored and used for AI training purposes (some apps are actually doing this by the way, remain vigilant).

This data might not be anonymized at all.

It depends on the content of the text. If the content is a very personal and specific story, this anonymization technique might be useless.

For example, let’s imagine a psychologist puts a report of a patient's consultation in this app for writing optimization. Even without any name or email or address, this story could be so specific the patient is easily identifiable from the content.

For example, let’s imagine:
“The patient REDACTED was anxious about the expensive purchase of this popular social media company REDACTED. He was already having some trouble with his other car company REDACTED, and was hoping things would go better with this new one. He then confided this very personal story that happened on his private jet: ...”

Despite not containing any personal identifiers, this data would NOT be properly anonymized.

3/4 #DataAnonymization #Privacy

Keep in mind that data anonymization is a difficult task that must always be considered in context.

When keeping this anonymized data isn't fundamental, it is much safer to just delete the data entirely. Whenever possible, it is even better to not collect it at all in the first place.

Not only must you have very good reasons to keep anonymized data, but you also expose yourself to potential legal problems if you neglect a proper anonymization process that considers context.

Be careful with the promise of
data anonymization, for yourself and for others.

Remember, data is a toxic asset  : https://www.schneier.com/blog/archives/2016/03/data_is_a_toxic.html

4/4 #DataAnonymization #Privacy

Data Is a Toxic Asset - Schneier on Security

@Em0nM4stodon
¡Muchas gracias, Em, por tan útil información 👍🏽👏🏽🙂

Thank you very much, Em, for such useful information 👍🏽👏🏽🙂

@Em0nM4stodon my dad worked on selling anonymized transaction data at a major credit card company before moving to "AI". I always felt it was an invasion of privacy.

Yet he is the type of person who claims if we mine every piece of information, we "make better decisions" even when that's clearly false.

Big data is a massive disaster.

@Em0nM4stodon Note also that the Schneier article predated the GDPR. Under the GDPR, there are situations where, if you anonymise data, you can still be liable if it is later deanonymised. At the very least, if you’re considering this, you need a privacy red team that can try to determine which other data sets can be merged with your data to deanonymise it and can quantify the risks.

I like the toxic material analogy. There are lots of reasons that businesses may need to store toxic chemicals but they need to have a clear understanding of the risks if they do and have processes to mitigate that risk. Just sticking them in a warehouse is not enough,

@Em0nM4stodon what's worse, is vulnerable individuals are often easier to identify from anonymised data, as they often have unusual indicators.
@Em0nM4stodon This is basically the same as a problem that goes back as far as WWIi, though computers and big data have made it worse. I believe it was called “data aggregation” back then, though the term has a little different today. The classification (actually, I guess it’s a handling caveat) EFTO—Encrypted For Transmission Only—was created to deal with this. The example I was given was you have a bunch of Generals who like to compare their golf scores. There is nothing secret about their golf scores, but it also reveals their location, which might reveal clues of operations in progress. (That was an example—Generals probably don’t have much time to golf during wars.)
@Em0nM4stodon haha love the example😂

@Em0nM4stodon

I do love a good data de-anonymization hacker conference talk...

It's really the only way to scare some people/bosses/companies into taking this seriously...