I just learned that MRAN will be shut down this year
So, am copying all their CRAN binaries, to eventually put in a public page

This is for `groundhog`, but will be open to all

If you know of alternatives or have thoughts: email me
#rstats
(BTW: http://datacolada.org/100)

[100] Groundhog 2.0: Further addressing the threat R poses to reproducible research - Data Colada

About a year ago I wrote Colada[95], a post on the threat R poses to reproducible research. The core issue is the 'packages'. When using R, you can run library(some_package) and R can all of a sudden scrape a website, cluster standard errors, maybe even help you levitate. The problem is that packages get updated...

Data Colada

@urisohn Also, as a counterpoint to the title of your blog post, that R poses a threat to reproducibility, I found exactly the opposite: https://www.brodrigues.co/blog/2022-12-21-longevity/

generally speaking, R itself is quite good when it comes to running old code. The problem is that authors of scripts don’t write their scripts such that they can be programmatically rerun, hence the very low rate of success in that Nature paper (for example, they would typically use setwd() to a path that exists only on their machines)

Code longevity of the R programming language

@brodriguesco
I have wondered about that, to what extent the failures to reproduce really are on that kind of stuff. The authors of the paper I discuss claim it is not just that for they fix those things by hand. I have since heard from an editor from a journal that reproduces all code that they had plenty of issues.

I think looking at examples from pkgs is an upper bound and automatically running posted code is a lower bound. I wonder what the expected value is.

@urisohn Yeah for sure, code from examples very likely gives you the absolute best case scenario. But I don’t know if it could be argued that the lower bound is itself R’s fault, and that thus R poses a threat to reproducibility. I would argue that lack of skills from the part of researchers is to blame...

@brodriguesco
I am sure it plays a role.
But there are notable examples of the developers being at fault

1) stringasfactors=TRUE vs FALSE broke a high % of existing code.

2) tidyverse routinely introduces breaking changes that they document.

Blame aside, are you skeptical that providing access to old versions of packages helps?

@urisohn Oh no I believe that providing access to old packages is crucial! Definitely needed, but IMHO only really works in conjunction with Docker, because installing older versions of R can be quite difficult.

@brodriguesco
OK.
In Windows it is trivial
It is easy'ish in Mac
In Unix, harder I understand.

I don't know Docker enough.
How accessible a solution is it for the masses? Like if I am user unfamiliar with Docker, could a Docker based solution help me?

@urisohn ok so that’s actually quite nice that you are not familiar with Docker, because I was going to provide a short how to in my blog post. Could you wait until then and then see for yourself? I think it is not that hard, but it is difficult for me to judge now that I’m familiar with it. The only thing I can say is that it didn’t take me long to get to use it, but I may be more knowledgeable about this kind of stuff than the average researcher...
@urisohn btw, you might not need to save every binary from MRAN, as Posit’s package manager has binaries starting on october 2017 https://packagemanager.rstudio.com/client/#/repos/2/overview
Posit Package Manager

@brodriguesco
that's great.
do you know if there is an easy way to pull all the dates for which they have binaries? (they don't have daily snapshots)
@urisohn hum, no I don’t, sorry
@brodriguesco
alright, easy enough with a short script to figure out.