Mastodawn

Nov 6, 2022

#algorithmicbioinformatics what are the big challenges facing our field? What are the things we are doing well that we should keep on doing?

Show thread

Luis Pedro Coelho Nov 6, 2022

@Pashadag a lot of tools require too much memory as they optimize for speed and it would be good to be able to trade those off more

Show thread

Tommi Mäklin Nov 6, 2022

@luispedro @Pashadag adding onto this, many tools are also poorly maintained and/or unnecessarily difficult to use. We really should make sure that projects which result in widely used software do not rely on a single person to keep them running.

Show thread

Eric Pelletier

@themaklin @luispedro @Pashadag not only a question of maintainer, but also of funding: once your tool / database is published, it becomes harder to get fundings to maintain it.

Show thread

Paul Medvedev

Nov 6, 2022

@EricPelletier @themaklin @luispedro I agree about software maintenance. I think it's a big problem in #algorithmicbioinformatics. For usability, don't you think that Galaxy and similar frameworks solve this problem? Or not really?

Show thread

Luis Pedro Coelho Nov 7, 2022

@Pashadag @EricPelletier @themaklin Galaxy solves some problems, but far from all problems

Show thread

Luis Pedro Coelho Nov 7, 2022

@EricPelletier @themaklin @Pashadag

In my experience, maintenance as is "keeping it alive" can be done at very little cost if things are set up correctly ahead of time

We commit to supporting software 5 years post pub (https://www.big-data-biology.org/software/commitments/ — and, in practice, still support things from 15 years ago) and this mostly means that I have high standards on ensuring that things are maintainable after the original trainee leaves (some of this is learned from past experience)

BDB-Lab Software Tool Commitments

Page of the BDB-Lab

BDB-Lab

Show thread

Paul Medvedev

Nov 7, 2022

@luispedro @EricPelletier @themaklin this is impressive. Do you have any pointers on what a lab can do to implement such a policy? Your experience could be very valuable to others and myself.

Show thread

Luis Pedro Coelho Nov 7, 2022

@Pashadag @EricPelletier @themaklin automated tests are a godsend. If, for example, a new version of python breaks something, it is invaluable to (1) have a test that flags it and (2) have a test that can check whether your fix does not break anything else

If prior to publication, more than one person was involved in the code, then after publication, it is more likely that the code can be understood

Show thread

Luis Pedro Coelho Nov 7, 2022

@Pashadag @EricPelletier @themaklin most problems that people report, others will face too so when you get a bug report try to fix the underlying issue and not just this specific person's problem

Over the short term, more work for you, but you get fewer people being stuck and reaching out

This means things like (1) do not just answer this message, but make the docs clearer or (2) make the code detect this error case and print a better error message so the next person does not contact you,...

Show thread

Luis Pedro Coelho Nov 7, 2022

@Pashadag @EricPelletier @themaklin

Here is a good example where we didn't do the right thing at first:

https://github.com/BigDataBiology/SemiBin/issues/106

First user reports an error (related to memory usage) and we propose a work-around, so the user closes the issue

Two months later, another user reports the same issue. Only then, did we actually look into whether the code could be better and now we will fix it for everyone, but just solving one user at a time would lead to this coming up again and again

OSError: [Errno 12] Cannot allocate memory · Issue #106 · BigDataBiology/SemiBin

Hey, I have met some new problems. When I run with "SemiBin train -i ${wd}/*fa --data ${wd}/output/data.csv --data-split ${wd}/output/data_split.csv -c ${wd}/output/cannot/cannot.txt -o ${wd}/...

GitHub

Show thread

Luis Pedro Coelho Nov 7, 2022

@Pashadag @EricPelletier @themaklin

Also, I assume that other users faced the issue and never reported it, so we might have lost some users (and an opportunity to gather citations to the manuscript) because of this issue (which in the end, will be solved with 3-4 lines of code)

Show thread

Tommi Mäklin Nov 7, 2022

@luispedro @EricPelletier @Pashadag This is great! Something similar is definitively something for other labs & individual researchers to consider.

Show thread

Luis Pedro Coelho Nov 7, 2022

@themaklin @EricPelletier @Pashadag I've thought of writing a commentary piece in a proper journal and this mastodon discussion makes me think there is a readership

Show thread

Tommi Mäklin Nov 7, 2022

@luispedro @EricPelletier @Pashadag I think it would be very useful, or at least make more PIs and funders realize the importance of planning maintenance and sustained development ahead of time.

At least for me reading this related paper https://genomebiology.biomedcentral.com/articles/10.1186/s13059-022-02625-x made me spend more time looking for alternative solutions when deploying new pipelines rather than flocking to the most widely known ones, so communicating the issues (and solutions!) should help.

Show thread

Luis Pedro Coelho Mar 16, 2024

@themaklin @EricPelletier @Pashadag

This is now published at PLOS CB!
https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1011920

BDB-Lab Software Tool Commitments

OSError: [Errno 12] Cannot allocate memory · Issue #106 · BigDataBiology/SemiBin

For long-term sustainable software in bioinformatics