Code for Publication - Level 1

Take whatever you've got, archive it online, put a link in the paper. This is low cost & provides an extended methods section where the detailed decisions you made can be found by a sufficiently motivated person. This is a major positive even if no one can successfully run the code.

Everyone's code is messy. We understand. Post your code & I will personally fight anyone who complains that it's messy.

When I say archive I mean https://zenodo.org/ or similar

Zenodo

Code for Publication - Level 2 (Level 1+)

Written instructions for rerunning the code to replicate the analysis (ideally in a file called README). E.g., 1) install packages A, B, & C; 2) download data D; 2) Run script E; 3) manually change F; 4) Run script G, etc.

Ideally go through this process yourself on a computer other than the one you did the analysis on to make sure it works. Even better have a friend do this to make sure someone other than you can follow the steps.

Code for Publication - Level 3 (Level 1/2+)

Automate all the steps for rerunning the analysis using a script. This could be a bash script of a script in the language your code is written in. This should include package installation (ideally w/fixed version numbers if the language allows it). Test this yourself on a computer other than the one you did the analysis, have a friend test it, try to test it on multiple operating systems. Make sure the outputs match the paper.

For research paper code if you get to Level 3 (a single script that reproduces your analysis including package install and any external data acquisition) you're doing great. Thank you. You've provided a clear accounting for exactly how you you produced your final results. I'm happy. Others should be happy. If you do this & someone complains because it's not Level 4+ I will kindly tell them they shouldn't let the perfect be the enemy of the very good & that there are real tradeoffs going further.

Code for Publication - Level 4+ (Level 3+)

There are lots of extra things you can do to make all of this even better:

* Use a workflow system instead of script for automation
* Provide a container (e.g., Docker) with code and data
* Have your code produce either a documented version of itself or the entire paper using literature programming tools (e.g., notebooks , Rmarkdown) (I see @tpoisot already getting ahead of me here in the replies; listen to Tim, he's awesome)

@ethanwhite @tpoisot

In #Maneage [1], level 4+ is different:

* in analysis/ we use 'make' for the higher-level workflow, encouraging bash scripts for details;

* in software/ we use 'make' to build all the software with sha512sum checks on the downloads, starting from a minimal unix-like system;

* the makefiles initialize.mk and paper.mk are the workflow for the paper

Fully reproduce:
./project configure
./project make

Example: [2]

[1] https://maneage.org
[2] https://zenodo.org/record/7792910

Maneage -- Managing data lineage

@boud very nice! @tpoisot
@ethanwhite @boud Oh yeah, data semantics and provenance breaks my brain in ways that code can't come close to -- I'm definitely going to have a look at maneage!

@tpoisot

Cool! :) The main 'tasks' and 'bugs' are coordinated at savannah [1]; we have some loosely organised irc/matrix channels e.g. [2].

To test a real science paper (peer-reviewed), not too heavy computationally, I recommend [3] (a not-quite-final version ran fully from scratch on a pinephone).

[1] https://savannah.nongnu.org/support/?func=additem&group=reproduce

[2] irc: https://libera.chat ##maneage
matrix bridge: #maneage-community:matrix.org

[3] https://arxiv.org/abs/2112.14174 = https://zenodo.org/record/6794222

@ethanwhite

Maneage - Support: Submit Item [Savannah]

Savannah is a central point for development, distribution and maintenance of free software, both GNU and non-GNU.