@jcoglan When I first met @suhacker years ago, I actually asked about this sort of thing (using "prepared statements" from SQL as an analogy), but she patiently explained how much worse the whole ecosystem is than I was imagining with my (admittedly naive) question.
The amount of Pickle exploits is too damn high
@jcoglan Yeah, that was my intuition.
If you haven't seen Suha's excellent DEFCON talk, it's a good watch: https://www.youtube.com/watch?v=Z38pTFM0FyU
Machine learning (ML) pipelines are vulnerable to model backdoors that compromise the integrity of the underlying system. Although many backdoor attacks limi...
@soatok @jcoglan in my experience a lot of it comes from a YAGNI attitude around caring about security or robustness.
At least half the ecosystem is grad students or other researchers slapping things together to get a paper out the door, and that stuff is later maybe absorbed into something calling itself a library. And I don't mean to criticize those people because those things *aren't* usually concerns for them.
But blurring the lines between the levels of seriousness between "we need to publish this paper" and "this code trains a model that decides if you get health insurance" is… not great.
@soatok @jcoglan oh btw did I mention that the pickles in question are usually downloaded from the web without host/content verification?
imagine there will be some attacks exploiting that when the domains start to evaporate once the VC flood recedes. haven't heard of any to date, but that's just a ticking clock
I have spent many years as an software engineer who was a total outsider to machine-learning, but with some curiosity and occasional peripheral interactions with it. During this time, a recurring theme for me was horror (and, to be honest, disdain) every time I encountered the widespread usage of Python pickle in the Python ML ecosystem. In addition to their major security issues1, the use of pickle for serialization tends to be very brittle, leading to all kinds of nightmares as you evolve your code and upgrade libraries and Python versions.