Mastodawn

Christian Meesters Sep 16, 2025

Remember that I have been posting about the #SnakemakeHackathon2025 ?

I never really finished that series. But now, we have two late contributions by Ward Deboutte and @johanneskoester . One describing the polishing of the multiple extension handling of #Snakemake for named inputs (https://zenodo.org/records/17121446) and stabilizing the JSON validator (https://zenodo.org/records/17121551).

Cool!

#ReproducibleComputing #OpenScience

Snakemake Multiextension Support for named In- & Output

Brief update on bringing named input and output support to the multiext function in snakemake, work done during the snakemake hackathon 2025.

Zenodo

Show thread

Christian Meesters Jun 16, 2025

The #isc25 is over and I half-recovered from the weekend, too. Time to continue my thread summing up the #SnakemakeHackathon2025 !

To me, an important contribution was from Michael Jahn from the Charpentier Lab: A complete re-design of the workflow catalogue. Have a look: https://snakemake.github.io/snakemake-workflow-catalog/ - findability of ready-to-use workflows has greatly improved! Also, the description on how to contribute is now easy to find.

A detailed description has been published in the #researchequals collection https://www.researchequals.com/collections/hm1w-cg under https://doi.org/10.5281/zenodo.15574642

#Snakemake #ReproducibleComputing #ReproducibleResearch #OpenScience

Snakemake workflow catalog

Show thread

Christian Meesters Jun 13, 2025

Returning from the #isc25 I will continue this thread with something applicable everywhere, not just on #HPC clusters:

Workflow runs can crash. There are a number of possible reasons. Snakemake offers a `--rerun-incomple` flag (or short `--ri`) which lets a user resume a workflow.

This contribution from Filipe G. Viera describes a small fix to stabilize the feature. Not only will incomplete files be removed after a crash, now it is ensured that all metadata with them are deleted too, before resuming: https://zenodo.org/records/15490098

#Snakemake #SnakemakeHackathon2025 #ReproducibleComputing #OpenScience

Metadata Cleanup for Snakemake

Zenodo

Show thread

Christian Meesters Jun 11, 2025

Today tooting from the #ISC25 - the International Supercomputing Conference. What better opportunity to brag about something I've done to facilitate using GPUs with Snakemake?

Here is my contribution, simpler job configuration for GPU jobs:

https://doi.org/10.5281/zenodo.15551797

Not alone though: Without valuable input of @dryak . Without him, I would have overlooked something crucial.

And when we talk about reproducible AI, my take is that we ought to consider workflow managers, too. Something which protocols what you have done with little effort.

#SnakemakeHackathon2025 #Snakemake #ReproducibleComputing #OpenScience

Snakemake's SLURM Executor Plugin - improved Support for GPUs

This work was completed as part of the Snakemake Hackathon 2025.Previously, using GPUs or other special resources using Snakemake's SLURM executor plugin required using the `slurm_extra` resource. A catch-all parameter, which required nested quoting. This contribution describes how the Snakemake executor plugin for SLURM batch system used on HPC systems adds support for immediate GPU resource requests and improved SLURM account handling. Building on the existing plugin, these enhancements significantly simplify workflow submission for workflows using GPU resources (e.g. artificial intelligens or molecular dynamics simulations).

Zenodo

Show thread

Christian Meesters Jun 9, 2025

Before I continue uploading - and I do have a couple of more contributions to add to the #ResearchEquals collection - first another contribution by Johanna Elena Schmitz and Jens Zentgraf made at the #SnakemakeHackathon2025

One difficulty when dealing with a different scientific question: Do I need to re-invent the wheel (read: write a Workflow from scratch?) just to address my slightly different question?

Snakemake already allowed to incorporate "alien" workflows, even #Nextflow workflows, into desired workflows. The new contribution allows for a more dynamic contribution - with very little changes.

Check it out: https://zenodo.org/records/15489694

#Snakemake #ReproducibleComputing #OpenScience

Allowing Dynamic Load of Modules for Snakemake

Snakemake modules had to be explicitly defined and loaded at the beginning of a workflow. This limited the flexibility of workflows, particularly when dealing with complex dependency structures or when modules needed to be loaded conditionally based on runtime parameters. This contribution eases the procedure of dynamically adding 3rd party modules.

Zenodo

Show thread

Christian Meesters Jun 6, 2025

Let's take a look at another contribution of Johanna Elena Schmitz and Jens Zentgraf from the #SnakemakeHackathon2025

Snakemake users probably know that

`$ snakemake [args] --report`

will generate a self-contained HTML report. Including all plots and #metadata a researcher's heart longs for.

Now, why trigger this manually? If the workflow runs successfully, now we can write (or configure):

`$ snakemake [args] --report-after-run`

and Snakemake will autogenerate the same report.

For details see https://doi.org/10.5281/zenodo.15489764

#Snakemake #ReproducibleComputing
#OpenScience

Create Report After Running a Snakemake Workflow

This contribution adds a flag to Snakemake to allow for immediate report creation after a workflow finished.

Zenodo

Show thread

Christian Meesters Jun 5, 2025

One important feature implemented in the #SnakemakeHackathon2025 : Snakemake will calculate file checksums to detect changes. If a file changes, the rule producing it needs to be re-executed when a workflow it re-triggered. But what if a file is too big for reasonable checksum calculation? You do not what to wait forever, after all.

This contribution describes the implementation of a threshold users may set: https://doi.org/10.5281/zenodo.15489401

#Snakemake #ReproducibleComputing #OpenScience

Adjusting Snakemake's Maximum File Size for Checksums

Snakemake calculates checksums for input and output files during DAG building to determine if a rule needs to be re-executed. For huge files, computing these checksums can be a significant performance bottleneck, slowing down the DAG generation process. This paper describes the limiting of checksum calculation upon file sizes.

Zenodo

Show thread

Christian Meesters Jun 4, 2025

One important bug fix during the #SnakemakeHackathon2025 : the config replacement. Now, users can overwrite existing configurations entirely with `--replace-workflow-config`.

Details: https://zenodo.org/records/15479268

More at https://www.researchequals.com/collections/hm1w-cg/

#Snakemake #ReproducibleComputing #openscience

Config Replacement in Snakemake

This work was completed as part of the Snakemake Hackathon 2025. Previously, when a configuration file was specified via the Snakemake CLI (e.g., `snakemake --config config.yaml`), the contents of that file were *merged* with any configuration variables already defined within the Snakefile. This meant that the CLI config would only extend or update existing values, rather than completely replacing the workflow's default configuration. This work item introduces the `--replace-workflow-config` parameter to the Snakemake CLI.

Zenodo

Christian Meesters Jun 2, 2025

Did you know? During the #SnakemakeHackathon2025 we had a staggering 194 work items!

It took a while, but now we are gathering contribution reports and present them online as a ResearchEquals (https://fediscience.org/@ResearchEqual[email protected]) collection:

https://www.researchequals.com/collections/hm1w-cg

The first 10 are online and I will post some highlights in the coming weeks.

#Snakemake #ReproducibleComputing #ReproducibleResearch #OpenScience

FediScience.org

Christian Meesters Mar 28, 2025

Busy year:

- Workflow programming for Data Analysis on #HPC Systems (Course in Mainz in January): ✅
- Same Course in Dresden (February) ✅
- #SnakemakeHackathon2025 at the CERN in March: ✅
- upcoming: #OpenScience Retreat (no hashtag, yet?) in April
- International Supercomputing Conference in June (so, @boegel, I will be there, after all and hope to meet people from @irods, too ; will you be there folks from #iRODS ?)
- German Conference for #Bioinformatics and NHR Conference in September

And I do not know whether this will be all. I have a nagging feeling there is more to come 😉