Mastodawn

James Ferguson

pyslow5 now updated to v1.2.0 (along with slow5lib) and works with the latest numpy 2.0.0 updates

Check out the release notes
We also added big-endian support, never thought we would need to do that 😅

Stellar work from
@hasindu2008 as always

https://github.com/hasindu2008/slow5lib/releases/tag/v1.2.0

Release slow5lib-v1.2.0 · hasindu2008/slow5lib

What's Changed slow5lib easy multi-thread API is no-longer beta and is fully documented now at https://hasindu2008.github.io/slow5lib/slow5_api/slow5_mt_api.html. Examples at here new low-level AP...

GitHub

Show thread

Mat Beale Jun 27, 2024

@Psy_Fer_ @hasindu2008 The advantages of SLOW5 over fast5 were obvious. Given that POD5 has now become the standard output for ONT, how does SLOW5 compare? What are the potential advantages and reasons to adopt it? Thanks

Show thread

James Ferguson

Jun 27, 2024

@typeMAT12 @hasindu2008 Slow5 is a much simpler format, with less dependency on other libs ( @hasindu2008 post the dep graph!) It is also super easy to integrate with any tools you might be building, and our version control is more stable than ONTs without breaking backward compatibility.
We have made libraries to make conversion easy, so no matter what you choose, you can always go back to the other format if you want to.
Our conversion tools don't corrupt the data like ONTs have in the past.

Show thread

James Ferguson

Jun 27, 2024

@typeMAT12 @hasindu2008 In addition to all of this, there are some issues we have with the design and engineering behind pod5.
It uses a column based system for data that 99% of users will read in a sequential row based manner. Slow5 is row based.
It abuses virtual memory and doesn't play well with HPC, and without it, you lose a lot of its speed
Slow5 is fully open source and will remain that way, there is no such guarantee from ONT (look at dorado licence and closed source libs)

Show thread

Mat Beale Jun 27, 2024

@Psy_Fer_ @hasindu2008 Thanks - this is really informative and helpful - i appreciate the explainer!

Show thread

James Ferguson

Jun 27, 2024

@typeMAT12 @hasindu2008 And also just to add, we designed it to be familiar to anyone who know's sam/bam. the ability to easily look at the data is something any bioinformatician can appreciate

slow5tools view data.blow5 | less

Familiar formating and tool design for a community was part of our plan for adoption. The pod5 stuff just seems hodge podge and has changed many times already.

Show thread

Hasindu Gamaarachchi Jun 27, 2024

@hasindu2008 @typeMAT12 I'd love to see people try and build old versions of pod5 in a few years 😅

Show thread

Mat Beale Jun 27, 2024

@Psy_Fer_ @hasindu2008 Definitely true that ONT is unstable, in terms of versioning (both in the lab and computationally). It’s incredibly frustrating to develop and validate a method, only for all the kits, basecalling models and tools (eg shift from guppy to dorado), compute requirements (latest dorado incompatible with older Macs), and downstream tool compatibility (eg latest Clair3 models don’t play well with guppy now) to change every 6-12 months.

Show thread

Mat Beale Jun 27, 2024

@Psy_Fer_ @hasindu2008 Are there any risks that ONT will make further changes that break your toolkit, and prevent longterm compatibility?

Show thread

Hasindu Gamaarachchi Jun 27, 2024

@typeMAT12 @Psy_Fer_

The good thing with slow5 is only the XYZ -> BLOW5 converter we have to fix everytime ONT does a breaking change to their file formats. Any tools that is written on top slow5 will not require to change. Because of this, we have to fix only the converter only, instead of fixing 100 tools separately :D

Show thread

Hasindu Gamaarachchi Jun 27, 2024

@typeMAT12 @Psy_Fer_

For instance when ONT moved to POD5,
We simply wrote the blue-crab converter to do pod5->slow5 converter.

All my tools and the scripts in our sequencing facility that uses slow5 did not have to undergo a single change,

Show thread

Hasindu Gamaarachchi Jun 27, 2024

@hasindu2008 @typeMAT12 This right here...the way the basecallers are written, they do a LOT of random access, where buttery-el does sequential access. Pod5 is terrible at random access