Mastodawn

I used to give a lecture on software engineering in the scientific computing context, entitled "R is a Ford Pinto". I hadn't thought at the time that "unsafe at any speed" would include a nasty CVE with RCE. https://hiddenlayer.com/research/r-bitrary-code-execution/

HiddenLayer Research | R-bitrary Code Execution

HiddenLayer uncovered a zero-day deserialization vulnerability in the popular programming language R, widely used within government and medical research that could result in a supply chain attack.

HiddenLayer | Security for AI

Show thread

Merovius Apr 30, 2024

@kortschak "R-bitrary code execution" is a great pun, though.

Show thread

Dan Kortschak Apr 30, 2024

@Merovius Though expected in that ecosystem; everything starts with "R" or has a capitilised "R" somewhere in the name that is intended to be pronounced /ˈɑːr/.

Show thread

Merovius Apr 30, 2024

@kortschak "Arbitrary Go execution"

Show thread

Merovius Apr 30, 2024

@kortschak Or maybe "Gode"?

Show thread

Koantig Apr 30, 2024

@kortschak
What's particularly unsafe with #RStats , newly discovered CVE excepted?
Compared to what and in what context?

Show thread

Dan Kortschak Apr 30, 2024

@koantig There are a bunch of things that make it unreliable for reproducible science if care is not taken. I think probably things have improved since, but in no particular order: library loading order impacts on what functions are run due to name shadowing (and this can happen transitively); version pinning is not commonly done and semantics can change subtly between versions; there are subtle class- and type-dependent behavioural differences (e.g. `reshape2:: melt`); there are arbitrary and undocumented parameter-dependent behavioural differences in some functions (`summary.lm` with and without an intercept, `MASS::lda` randomly chooses which side of the boundary you are on if you are withing 1e-5 of the boundary); what does `fn()` return when `fn<-function() { return (1+2+3+4)/5 }` (this kind of thing is easily missed in code review); global state persists (packages, default options, variables…).

Show thread

Koantig Apr 30, 2024

@kortschak Thank you, I think it's mostly fair and I confirm that many things have improved on that front. (e.g. tibbles are thankfully much less forgiving than data frames)

There's always going to be a tension between rapid development of data analysis and statistical research, and a hardened finished product.

I can't help but ask what you would suggest instead for the same use-case though. Python comes to mind but I don't think it's any better.