So, I recently saw some quiet discussion about a paper where researchers reverse-engineered and disclosed some attacks against PhotoDNA, the very-super-duper-secret algorithm used by tech megacorps to scan for illegal images.

They didn't make any code public, and so... I did: https://github.com/ArcaneNibble/open-alleged-photodna

A _complete_ reverse-engineering and commented Python reimplementation of the algorithm from publicly-leaked binaries.

This means that studying the algorithm and any potential flaws is now much more accessible.

This took only about two days (once I knew that there even _was_ a leaked binary to compare against), which just goes to again show that security through obscurity never works.

🔁 encouraged

GitHub - ArcaneNibble/open-alleged-photodna: because research belongs to _everybody_

because research belongs to _everybody_. Contribute to ArcaneNibble/open-alleged-photodna development by creating an account on GitHub.

GitHub

I don't think I'm going to implement any of the published attacks, but other people are certainly free to have a go at it.

It's certainly scary how just one fuckywucky leak and... honestly not _that_ much research nor computational complexity can have major impacts on this algorithm. Especially when said algorithm serves a purpose that deeply affects lives....

Also, the leaked binary this is derived from is from 2021

If anything, it's a shock it took _this long_ for whitebox attacks and other such holes

Oh, and guess how much all the secrecy amounts to?

only 500 lines of Python, including comments

Adding on to this thread, I now have a harness which can load the PhotoDNA leaked DLL *on Linux and macOS*

https://github.com/ArcaneNibble/open-alleged-photodna/blob/main/binary-harness.py

It also checks intermediate computations against my re-implementation, so that it's possible to further prove/validate that Alleged-PhotoDNA produces the same results as the binary.

This requires quite a bit of knowledge about platforms and ABIs and similar nonsense in order to understand, but it also just goes to show _how_ something such as Wine could possibly work.
@r crimes against python /lh
@ww come arrest me then
@r i'm not the python police! i think your crimes are cool :)
@ww heh, that was the goal (to make people realize that such things are even _possible_! slowly creating more "ABI and platforms and systems programming" wizards)
@r it's impressive! idk how much you have to learn to be able to do that, but it's a lot fewer lines of python than i'd expect to be required for that kinda thing

i tried downloading photodna using the bat file linked in the hackerfactor blog article linked in your readme, apparently it's the same dll as yours, but signed an hour later. it's weird there's not just one, but two leaked copies of the same version of the library!

https://www.virustotal.com/gui/file/b91f77124065ae7d7c3cbd382d7cf8ab8283af4a942aff3fd9fdacd55af08091/details
https://www.virustotal.com/gui/file/90b8043030793cd3948ab2c0561511276fec19f6b6d2acacd9548e89f7a48ed6/details
VirusTotal

VirusTotal

@ww this is a really simple case because the DLL doesn't depend on much system functionality

tavis ormandy has a much more complete implementation that was being used to do things like fuzz windows defender at scale

and yeah, i spent much of my teens studying how other teens managed to cheat at MapleStory, which was a good way to learn a lot of this

@r Great work! I was curious and added the distance calculation (basic stupid Euclidian distance) to compare two images:

https://github.com/adulau/open-alleged-photodna/commit/c0275801088442cd4f5693b6403678daf5f75b7a

and the results are surprisingly good with rescaled images.

adulau@blakley:~/git/open-alleged-photodna$ python3.10 oaphotodna.py /home/adulau/Downloads/55147310088_ced977bdee_c.jpg /home/adulau/Downloads/55147310088_45f9e4b2cc_k.jpg
Distance (euclidean): 8.4261
Similarity: 0.997246

The source image https://www.flickr.com/photos/adulau/55147310088/

I just did PR (feel free to discard it if you think it's out-of-scope ;-)

new: [compare images] Add a quick way to compare two images with Eucl… · adulau/open-alleged-photodna@c027580

…idian distance Output is normalised to ease the comparison (1 is very close, 0 is far away)

GitHub

@adulau @r huh, no perceptual hashing? That's surprising

Edit: Ah, should've looked at the code first. Distance between hashes that I suppose are perceptual then

@dngrs Indeed it’s a perceptual hashing. Having the reversed version is pretty nifty to be able to use it for some other projects. I’ll do some tests to see how good it is compared to other perceptual hash algorithm.

@r

@r I had never seen the original repository until now, and... holy shit

https://github.com/jankais3r/pyPhotoDNA/blob/main/install.sh#L29

pyPhotoDNA/install.sh at main · jankais3r/pyPhotoDNA

Calculate PhotoDNA hashes using Python. Contribute to jankais3r/pyPhotoDNA development by creating an account on GitHub.

GitHub