Mastodawn

It is quite fun to occasionally come back to using #slurm to send commands to a super computer node.

Let's see if running a #TopicModelling script with a V100 GPU reduces the running time from 80+ hours to a few minutes, as I expect :)

Snakemake Release Robot Jun 1

Beep, Beep - I am your friendly #Snakemake release announcement bot.

There is a new release of the 𝐒𝐧𝐚𝐤𝐞𝐦𝐚𝐤𝐞 𝐄𝐱𝐞𝐜𝐮𝐭𝐨𝐫 𝐏𝐥𝐮𝐠𝐢𝐧 𝐟𝐨𝐫 𝐒𝐋𝐔𝐑𝐌 systems. Its version now is 2.7.1!

Give us some time, and you will automatically find the plugin on #Bioconda and #Pypi.

This plugin is relevant for #HPC users using the #SLURM batch system.
The maintainers are here on Mastodon -
@rupdecat and @johanneskoester.

If you discover any issues, please report them on https://github.com/snakemake/snakemake-executor-plugin-slurm/issues.

See https://github.com/snakemake/snakemake-executor-plugin-slurm/releases/tag/v2.7.1 for details. Here is the header of the changelog:
𝑅𝑒𝑙𝑒𝑎𝑠𝑒 𝑁𝑜𝑡𝑒𝑠 (𝑝𝑜𝑠𝑠𝑖𝑏𝑙𝑦 𝑎𝑏𝑏𝑟𝑖𝑔𝑒𝑑):
𝐁𝐮𝐠 𝐅𝐢𝐱𝐞𝐬

* gpu submission: https://github.com/snakemake/snakemake-executor-plugin-slurm/issues/464

Christian Meesters May 29

Today I've done:

- accepted a review request (difficult to even see them these days in the wave of other mails). It is from some colleagues at a place where I have some acquaintances. None of the persons I know, so I am lucky for otherwise I would have declined. As the review is anonymous, and I know many people in the field, mentioning this here on Mastodon will not become an issue.
- followed an online meeting. Actually one of the non-boring ones.
- re-installed an environment I involuntarily screwed up yesterday when testing things for a user
- had a nice lunch with @KrawallHamster , @moschlar and others
- written a number of mails
- debugged code, debugged code, debugged code - will postpone a new #Snakemake plugin release for #SLURM to next week, when I can think straight. At least the CI pipeline is fine for this PR I was working on. But I always do live tests on an actual cluster if the change is not trivial.
- actually finished the review task (first round)

And I 🚴 , up- and downhill, through the May heat (this is a thing these days!!!). Lesson learned: Next time, I will take a break, sit on a bench, read and drink to have a rest for the last leg. The afternoon heat is no fun!

Christian Meesters May 22

Today:

- written a blurb for my presentation at the upcoming #nanopub session
- release the #SLURM executor plugin for #Snakemake v2.7.0 - see https://fediscience.org/@snakemake/116617420491776431
- tried to mitigate the issue that TMOUT on an HPC login brings: sending SIGHUB to all detached multiplexers (so far no remedy and I tried a lot(!), don't send me tips).
- futile further debugging attempts. In the end it worked. Might result in a new release next week.

#academicchatter

Snakemake Release Robot May 22

Beep, Beep - I am your friendly #Snakemake release announcement bot.

There is a new release of the 𝐒𝐧𝐚𝐤𝐞𝐦𝐚𝐤𝐞 𝐄𝐱𝐞𝐜𝐮𝐭𝐨𝐫 𝐏𝐥𝐮𝐠𝐢𝐧 𝐟𝐨𝐫 𝐒𝐋𝐔𝐑𝐌 systems. Its version now is 2.7.0!

Give us some time, and you will automatically find the plugin on #Bioconda and #Pypi.

This plugin is relevant for #HPC users using the #SLURM batch system.
The maintainers are here on Mastodon -
@rupdecat and @johanneskoester.

If you discover any issues, please report them on https://github.com/snakemake/snakemake-executor-plugin-slurm/issues.

See https://github.com/snakemake/snakemake-executor-plugin-slurm/releases/tag/v2.7.0 for details. Here is the header of the changelog:
𝑅𝑒𝑙𝑒𝑎𝑠𝑒 𝑁𝑜𝑡𝑒𝑠 (𝑝𝑜𝑠𝑠𝑖𝑏𝑙𝑦 𝑎𝑏𝑏𝑟𝑖𝑔𝑒𝑑):
𝐅𝐞𝐚𝐭𝐮𝐫𝐞𝐬

* cpu jobs skip gpu partitions: https://github.com/snakemake/snakemake-executor-plugin-slurm/issues/454

𝐁𝐮𝐠 𝐅𝐢𝐱𝐞𝐬

* [#446]: https://github.com/snakemake/snakemake-executor-plugin-slurm/issues/446
* pipeline hangs when submitting from compute nodes: https://github.com/snakemake/snakemake-executor-plugin-slurm/issues/450
* relaxed dependencies: https://github.com/snakemake/snakemake-executor-plugin-slurm/issues/460

Rackslab May 12

🚀 Slurm-web v6.1.0 is out!

New core allocation visualization, optional RacksDB lazy loading, expanded SLES/openSUSE docs, and dependency/security fixes.

Warm thanks to NVIDIA and Zuse Institute Berlin (ZIB) for their contributions ❤️

https://rackslab.io/en/blog/slurm-web-v6.1.0/

#HPC #AI #Slurm #OpenSource #webui #clustermanagement

Release of Slurm-web v6.1.0

Core allocation graphics, expanded SLES/RHEL/Fedora docs support, and dependency/security fixes.

Andreas Skau Apr 13

Just did a major overhaul of my "top, but for #slurm" util! Might be useful to #hpc admins and users alike. Appreciate any bug reports, especially crashes or incompatability!

https://github.com/buzh/slop

GitHub - buzh/slop: A `top`-like utility for the Slurm HPC batch job scheduler

A `top`-like utility for the Slurm HPC batch job scheduler - buzh/slop

GitHub

Habr Apr 12

От майнинга на попутном газе к AI-фабрикам: история Crusoe

У AI-индустрии есть серьезная проблема: как развернуть вычислительную инфраструктуру раньше и быстрее (да еще и дешевле) конкурентов? Основной дефицитный ресурс сейчас — электричество, а не чипы или их компоненты, как вы могли предположить. Техногиганты думают, где поставить стойки, чем их охлаждать, но главное, где взять энергию, чтобы питать всю AI-систему. И у одного стартапа из Денвера есть нестандартное решение — портативные модульные AI-дата-центры, которые можно размещать в самых нестандартных условиях. Компания пришла в ИТ из мира крипты: изначально она вела деятельность установкой майнинг-машин, которые брали энергию от попутного газа на нефтяных вышках. Сегодня я расскажу вам о компании Crusoe — которая крайне нестандартно превращает энергию в вычислительную мощность. Разберем их бизнес-модель и поймем, что такое вертикально интегрированная AI-инфраструктура.

https://habr.com/ru/companies/ru_mts/articles/1022116/

#Crusoe #AIинфраструктура #датацентры #GPUоблако #облачные_вычисления #inference #Kubernetes #Slurm #edge_computing #энергетика

От майнинга на попутном газе к AI-фабрикам: история Crusoe

Хабр

Christian Meesters Mar 26

RE: https://fediscience.org/@snakemake/116295568336688286

This is a big step forward: The SLURM plugin for Snakemake now supports so-called job arrays. These are cluster jobs, with ~ equal resource requirements in terms of memory and compute resources.

The change in itself was big: The purpose of a workflow system is to make use of the vast resources of an HPC cluster. Hence, jobs are submitted to run concurrently. However, for a job array, we have to "wait" for all eligible jobs to be ready. And then we submit.

To preserve concurrent execution of other jobs which are ready to be executed, a thread pool has been introduced. In itself, I do not see job arrays as such a big feature: The LSF system profited much more from arrays than the rather lean SLURM implementation does.

BUT: the new code base will ease further development to pooling many shared memory tasks (applications which support no parallel execution or are confined to one computer by "only" supporting threading). Until then, there is more work to do.

#HPC #SLURM #Snakemake #SnakemakeHackathon2026 #ReproducibleComputing #OpenScience

Tyler Smith Mar 19

A few #slurm tidbits:

#bash #hpc