Mastodawn

Ethan White May 5, 2023

Code for Publication - Level 1

Take whatever you've got, archive it online, put a link in the paper. This is low cost & provides an extended methods section where the detailed decisions you made can be found by a sufficiently motivated person. This is a major positive even if no one can successfully run the code.

Everyone's code is messy. We understand. Post your code & I will personally fight anyone who complains that it's messy.

When I say archive I mean https://zenodo.org/ or similar

Zenodo

Show thread

Ethan White May 5, 2023

Code for Publication - Level 2 (Level 1+)

Written instructions for rerunning the code to replicate the analysis (ideally in a file called README). E.g., 1) install packages A, B, & C; 2) download data D; 2) Run script E; 3) manually change F; 4) Run script G, etc.

Ideally go through this process yourself on a computer other than the one you did the analysis on to make sure it works. Even better have a friend do this to make sure someone other than you can follow the steps.

Show thread

Ethan White May 5, 2023

Code for Publication - Level 3 (Level 1/2+)

Automate all the steps for rerunning the analysis using a script. This could be a bash script of a script in the language your code is written in. This should include package installation (ideally w/fixed version numbers if the language allows it). Test this yourself on a computer other than the one you did the analysis, have a friend test it, try to test it on multiple operating systems. Make sure the outputs match the paper.

Show thread

Ethan White May 5, 2023

For research paper code if you get to Level 3 (a single script that reproduces your analysis including package install and any external data acquisition) you're doing great. Thank you. You've provided a clear accounting for exactly how you you produced your final results. I'm happy. Others should be happy. If you do this & someone complains because it's not Level 4+ I will kindly tell them they shouldn't let the perfect be the enemy of the very good & that there are real tradeoffs going further.

Show thread

Ethan White May 5, 2023

Code for Publication - Level 4+ (Level 3+)

There are lots of extra things you can do to make all of this even better:

* Use a workflow system instead of script for automation
* Provide a container (e.g., Docker) with code and data
* Have your code produce either a documented version of itself or the entire paper using literature programming tools (e.g., notebooks , Rmarkdown) (I see @tpoisot already getting ahead of me here in the replies; listen to Tim, he's awesome)

Show thread

Boud May 9, 2023

@ethanwhite @tpoisot

In #Maneage [1], level 4+ is different:

* in analysis/ we use 'make' for the higher-level workflow, encouraging bash scripts for details;

* in software/ we use 'make' to build all the software with sha512sum checks on the downloads, starting from a minimal unix-like system;

* the makefiles initialize.mk and paper.mk are the workflow for the paper

Fully reproduce:
./project configure
./project make

Example: [2]

[1] https://maneage.org
[2] https://zenodo.org/record/7792910

Maneage -- Managing data lineage

Show thread

Ethan White May 9, 2023

@boud very nice! @tpoisot

Show thread

Timothée Poisot

@ethanwhite @boud Oh yeah, data semantics and provenance breaks my brain in ways that code can't come close to -- I'm definitely going to have a look at maneage!

Show thread

Boud May 10, 2023

@tpoisot

Cool! :) The main 'tasks' and 'bugs' are coordinated at savannah [1]; we have some loosely organised irc/matrix channels e.g. [2].

To test a real science paper (peer-reviewed), not too heavy computationally, I recommend [3] (a not-quite-final version ran fully from scratch on a pinephone).

[1] https://savannah.nongnu.org/support/?func=additem&group=reproduce

[2] irc: https://libera.chat ##maneage
matrix bridge: #maneage-community:matrix.org

[3] https://arxiv.org/abs/2112.14174 = https://zenodo.org/record/6794222

@ethanwhite

Maneage - Support: Submit Item [Savannah]

Savannah is a central point for development, distribution and maintenance of free software, both GNU and non-GNU.