Mastodawn

Ethan White May 5, 2023

Code for Publication - Level 1

Take whatever you've got, archive it online, put a link in the paper. This is low cost & provides an extended methods section where the detailed decisions you made can be found by a sufficiently motivated person. This is a major positive even if no one can successfully run the code.

Everyone's code is messy. We understand. Post your code & I will personally fight anyone who complains that it's messy.

When I say archive I mean https://zenodo.org/ or similar

Zenodo

Show thread

Ethan White May 5, 2023

Code for Publication - Level 2 (Level 1+)

Written instructions for rerunning the code to replicate the analysis (ideally in a file called README). E.g., 1) install packages A, B, & C; 2) download data D; 2) Run script E; 3) manually change F; 4) Run script G, etc.

Ideally go through this process yourself on a computer other than the one you did the analysis on to make sure it works. Even better have a friend do this to make sure someone other than you can follow the steps.

Show thread

Ethan White May 5, 2023

Code for Publication - Level 3 (Level 1/2+)

Automate all the steps for rerunning the analysis using a script. This could be a bash script of a script in the language your code is written in. This should include package installation (ideally w/fixed version numbers if the language allows it). Test this yourself on a computer other than the one you did the analysis, have a friend test it, try to test it on multiple operating systems. Make sure the outputs match the paper.

Show thread

Ethan White May 5, 2023

For research paper code if you get to Level 3 (a single script that reproduces your analysis including package install and any external data acquisition) you're doing great. Thank you. You've provided a clear accounting for exactly how you you produced your final results. I'm happy. Others should be happy. If you do this & someone complains because it's not Level 4+ I will kindly tell them they shouldn't let the perfect be the enemy of the very good & that there are real tradeoffs going further.

Show thread

Ethan White May 5, 2023

Code for Publication - Level 4+ (Level 3+)

There are lots of extra things you can do to make all of this even better:

* Use a workflow system instead of script for automation
* Provide a container (e.g., Docker) with code and data
* Have your code produce either a documented version of itself or the entire paper using literature programming tools (e.g., notebooks , Rmarkdown) (I see @tpoisot already getting ahead of me here in the replies; listen to Tim, he's awesome)

Show thread

Ethan White May 5, 2023

The Level 4+ stuff is great. I love these kinds of tools. We use them daily (though not always for publishing code). But, don't let not using them stop you from publishing code at Levels 1, 2, or 3.

Show thread

Timothée Poisot May 6, 2023

@ethanwhite I have never learned how to use containers (I am still processing the trauma of pyenv, in my defense). Do you find the overhead worth it for code that isn't going to be widely reused?

Titus Brown May 8, 2023

Show thread

Ethan White

@tpoisot for paper code (which we don't expect to be widely reused, otherwise we'd package it) we've moved away from containers. I think the likelihood of anyone spinning them up is low and the code mostly serves the "extended documentation" role anyway. Much of their value also assumes long-term public hosting and with recent ongoings at Docker I'm not sure that's a safe assumption.

Show thread

Noam Ross May 8, 2023

@ethanwhite @tpoisot I think if you have a strong interest in a container working long-term you should store the built container rather than the Dockerfile. We should probably template up the workflow of depositing the container binary with code in a Zenodo-like repository.

Show thread

Ethan White May 8, 2023

@noamross agreed. that would definitely be handy. @tpoisot