So, I recently saw some quiet discussion about a paper where researchers reverse-engineered and disclosed some attacks against PhotoDNA, the very-super-duper-secret algorithm used by tech megacorps to scan for illegal images.

They didn't make any code public, and so... I did: https://github.com/ArcaneNibble/open-alleged-photodna

A _complete_ reverse-engineering and commented Python reimplementation of the algorithm from publicly-leaked binaries.

This means that studying the algorithm and any potential flaws is now much more accessible.

This took only about two days (once I knew that there even _was_ a leaked binary to compare against), which just goes to again show that security through obscurity never works.

🔁 encouraged

GitHub - ArcaneNibble/open-alleged-photodna: because research belongs to _everybody_

because research belongs to _everybody_. Contribute to ArcaneNibble/open-alleged-photodna development by creating an account on GitHub.

GitHub

@r Great work! I was curious and added the distance calculation (basic stupid Euclidian distance) to compare two images:

https://github.com/adulau/open-alleged-photodna/commit/c0275801088442cd4f5693b6403678daf5f75b7a

and the results are surprisingly good with rescaled images.

adulau@blakley:~/git/open-alleged-photodna$ python3.10 oaphotodna.py /home/adulau/Downloads/55147310088_ced977bdee_c.jpg /home/adulau/Downloads/55147310088_45f9e4b2cc_k.jpg
Distance (euclidean): 8.4261
Similarity: 0.997246

The source image https://www.flickr.com/photos/adulau/55147310088/

I just did PR (feel free to discard it if you think it's out-of-scope ;-)

new: [compare images] Add a quick way to compare two images with Eucl… · adulau/open-alleged-photodna@c027580

…idian distance Output is normalised to ease the comparison (1 is very close, 0 is far away)

GitHub

@adulau @r huh, no perceptual hashing? That's surprising

Edit: Ah, should've looked at the code first. Distance between hashes that I suppose are perceptual then

@dngrs Indeed it’s a perceptual hashing. Having the reversed version is pretty nifty to be able to use it for some other projects. I’ll do some tests to see how good it is compared to other perceptual hash algorithm.

@r