So, I recently saw some quiet discussion about a paper where researchers reverse-engineered and disclosed some attacks against PhotoDNA, the very-super-duper-secret algorithm used by tech megacorps to scan for illegal images.

They didn't make any code public, and so... I did: https://github.com/ArcaneNibble/open-alleged-photodna

A _complete_ reverse-engineering and commented Python reimplementation of the algorithm from publicly-leaked binaries.

This means that studying the algorithm and any potential flaws is now much more accessible.

This took only about two days (once I knew that there even _was_ a leaked binary to compare against), which just goes to again show that security through obscurity never works.

🔁 encouraged

GitHub - ArcaneNibble/open-alleged-photodna: because research belongs to _everybody_

because research belongs to _everybody_. Contribute to ArcaneNibble/open-alleged-photodna development by creating an account on GitHub.

GitHub

I don't think I'm going to implement any of the published attacks, but other people are certainly free to have a go at it.

It's certainly scary how just one fuckywucky leak and... honestly not _that_ much research nor computational complexity can have major impacts on this algorithm. Especially when said algorithm serves a purpose that deeply affects lives....

Also, the leaked binary this is derived from is from 2021

If anything, it's a shock it took _this long_ for whitebox attacks and other such holes

Oh, and guess how much all the secrecy amounts to?

only 500 lines of Python, including comments

@r Great work! I was curious and added the distance calculation (basic stupid Euclidian distance) to compare two images:

https://github.com/adulau/open-alleged-photodna/commit/c0275801088442cd4f5693b6403678daf5f75b7a

and the results are surprisingly good with rescaled images.

adulau@blakley:~/git/open-alleged-photodna$ python3.10 oaphotodna.py /home/adulau/Downloads/55147310088_ced977bdee_c.jpg /home/adulau/Downloads/55147310088_45f9e4b2cc_k.jpg
Distance (euclidean): 8.4261
Similarity: 0.997246

The source image https://www.flickr.com/photos/adulau/55147310088/

I just did PR (feel free to discard it if you think it's out-of-scope ;-)

new: [compare images] Add a quick way to compare two images with Eucl… · adulau/open-alleged-photodna@c027580

…idian distance Output is normalised to ease the comparison (1 is very close, 0 is far away)

GitHub

@adulau @r huh, no perceptual hashing? That's surprising

Edit: Ah, should've looked at the code first. Distance between hashes that I suppose are perceptual then

@dngrs Indeed it’s a perceptual hashing. Having the reversed version is pretty nifty to be able to use it for some other projects. I’ll do some tests to see how good it is compared to other perceptual hash algorithm.

@r

@r I had never seen the original repository until now, and... holy shit

https://github.com/jankais3r/pyPhotoDNA/blob/main/install.sh#L29

pyPhotoDNA/install.sh at main · jankais3r/pyPhotoDNA

Calculate PhotoDNA hashes using Python. Contribute to jankais3r/pyPhotoDNA development by creating an account on GitHub.

GitHub