Mastodawn

Ethan White May 5, 2023

Code for Publication - Level 1

Take whatever you've got, archive it online, put a link in the paper. This is low cost & provides an extended methods section where the detailed decisions you made can be found by a sufficiently motivated person. This is a major positive even if no one can successfully run the code.

Everyone's code is messy. We understand. Post your code & I will personally fight anyone who complains that it's messy.

When I say archive I mean https://zenodo.org/ or similar

Zenodo

Show thread

Ethan White May 5, 2023

Code for Publication - Level 2 (Level 1+)

Written instructions for rerunning the code to replicate the analysis (ideally in a file called README). E.g., 1) install packages A, B, & C; 2) download data D; 2) Run script E; 3) manually change F; 4) Run script G, etc.

Ideally go through this process yourself on a computer other than the one you did the analysis on to make sure it works. Even better have a friend do this to make sure someone other than you can follow the steps.

Show thread

Ethan White May 5, 2023

Code for Publication - Level 3 (Level 1/2+)

Automate all the steps for rerunning the analysis using a script. This could be a bash script of a script in the language your code is written in. This should include package installation (ideally w/fixed version numbers if the language allows it). Test this yourself on a computer other than the one you did the analysis, have a friend test it, try to test it on multiple operating systems. Make sure the outputs match the paper.

Show thread

Ethan White May 5, 2023

For research paper code if you get to Level 3 (a single script that reproduces your analysis including package install and any external data acquisition) you're doing great. Thank you. You've provided a clear accounting for exactly how you you produced your final results. I'm happy. Others should be happy. If you do this & someone complains because it's not Level 4+ I will kindly tell them they shouldn't let the perfect be the enemy of the very good & that there are real tradeoffs going further.

Show thread

Ethan White May 5, 2023

Code for Publication - Level 4+ (Level 3+)

There are lots of extra things you can do to make all of this even better:

* Use a workflow system instead of script for automation
* Provide a container (e.g., Docker) with code and data
* Have your code produce either a documented version of itself or the entire paper using literature programming tools (e.g., notebooks , Rmarkdown) (I see @tpoisot already getting ahead of me here in the replies; listen to Tim, he's awesome)

Show thread

Ethan White May 5, 2023

The Level 4+ stuff is great. I love these kinds of tools. We use them daily (though not always for publishing code). But, don't let not using them stop you from publishing code at Levels 1, 2, or 3.

Show thread

Timothée Poisot

@ethanwhite I have never learned how to use containers (I am still processing the trauma of pyenv, in my defense). Do you find the overhead worth it for code that isn't going to be widely reused?

Show thread

Naupaka Zimmerman May 6, 2023

@tpoisot @ethanwhite I do -- it's fast/easy if you start with a good base. I make huge use of parameterized docker containers for teaching in most of my classes. For me the selling point is getting any project up to speed on any server basically instantly. And, for example having a loop take a csv to make environments for 50 students at once. I'm not using them as they were intended I suppose, but I think they're pretty great as lightweight VMs (vs something slightly heavier like vagrant)

Show thread

Naupaka Zimmerman May 6, 2023

@tpoisot @ethanwhite e.g. https://figshare.com/articles/presentation/Docker_for_Teaching/8132849

Docker for Teaching

Slides for Carpentries Skill Share on using Docker for Teaching held on May 15, 2019.

figshare

Show thread

Timothée Poisot May 6, 2023

@naupaka @ethanwhite this is brilliant, and will in no way be supported by our local IT...

Show thread

Naupaka Zimmerman May 6, 2023

@tpoisot @ethanwhite depending on computational needs, easy to do on cloud compute as well. but yeah, need lots of open ports (can be only open inside campus firewall if students use vpn) and root on a large-ish server. Singularity (https://docs.sylabs.io/guides/3.5/user-guide/introduction.html) might be more palatable to IT than docker, but you still need open ports

Introduction to Singularity — Singularity container 3.5 documentation

Show thread

Ethan White May 8, 2023

@naupaka singularly + inside the campus firewall is the exact combination that makes it possible for us to convince our IT folks to let us do things in this space @tpoisot

Show thread

Ethan White May 8, 2023

@tpoisot for paper code (which we don't expect to be widely reused, otherwise we'd package it) we've moved away from containers. I think the likelihood of anyone spinning them up is low and the code mostly serves the "extended documentation" role anyway. Much of their value also assumes long-term public hosting and with recent ongoings at Docker I'm not sure that's a safe assumption.

Show thread

Noam Ross May 8, 2023

@ethanwhite @tpoisot I think if you have a strong interest in a container working long-term you should store the built container rather than the Dockerfile. We should probably template up the workflow of depositing the container binary with code in a Zenodo-like repository.

Show thread

Ethan White May 8, 2023

@noamross agreed. that would definitely be handy. @tpoisot