I found this Veratasium documentary on the xz Jia Tan backdoor adventure quite good and surprisingly detailed:

https://www.youtube.com/watch?v=aoag03mSuXQ

The Internet Was Weeks Away From Disaster and No One Knew

YouTube

@bagder I'm confused to as why binary blobs are allowed to be stored in public source code repositories anyways.

I mean, I understand if you want to include assets for a game, but wouldn't it then be safer to store them in readable format before compression? As a simplified example, png's could be stored as xpm in source and then converted into the better format using provided tools, also in the repo.

Tldr being: If blobs are to be used in tests, write a tool that generates the blob for them.

@thanius @bagder just guessing but it might be to hit certain corner cases. For example you might want a file with a certain type of noise to test that your changes to the algorithm didn't cause it to spit out a packed file that's significantly bigger than the incompressible source. Or something that was known to cause crashes or lossy behaviour in previous versions to prevent regressions

@duckz The whole point of unit tests is that they are reproducable. They're tailored for specific scenarios, and should thus be recreateable imho.

If you know how to reproduce a certain scenario, where the application expects a blob for the mockup, then build a tool that creates the blob before testing.

Prepare -> mock -> test

@thanius I get that and you can probably do what you say to generate certain sequences including noise even though it might be non-trivial, but if there is a certain binary sequence that was once a chunk of a file that someone attached to a bug report, you can't reasonably generate that. It doesn't have to be megs either, so why look for a function that generates 10 specific bytes when you can just commit those 10 bytes?
@duckz Because it wouldn't be source code. :)