105 Followers
34 Following
33 Posts
Bioinformatician at the Peter Doherty Institute for Infection and Immunity
GitHubhttps://github.com/rrwick
Bloghttps://rrwick.github.io
Twitterhttps://twitter.com/rrwick

New blog post!

metaMDBG (Gaëtan Benoit) and Myloasm (Jim Shaw) have had recent releases, so I updated the benchmarks from the Autocycler paper:
https://rrwick.github.io/2025/09/23/autocycler-benchmark-update.html

Both tools improved considerably! Time to update your conda environments 😄

Benchmark update: metaMDBG and Myloasm

a blog for miscellaneous bioinformatics stuff

Ryan Wick’s bioinformatics blog

🧬 Check out agtools, an open-source Python framework for analysing & manipulating assembly graphs.

🔗 GitHub: https://github.com/Vini2/agtools
📜 Preprint: https://www.biorxiv.org/content/10.1101/2025.09.14.676178v1

🙏 Thanks to my amazing co-authors @rrwick @GB13Faithless @griggo_grig @npbhavya @linsalrob

#Bioinformatics #genomics #assembly #assemblygraphs #software

GitHub - Vini2/agtools: A Software Framework to Manipulate Assembly Graphs

A Software Framework to Manipulate Assembly Graphs - Vini2/agtools

GitHub

New blog post!

I added a new feature to George Bouras's Pypolca: homopolymer-only polishing. Potentially useful for cross-sample polishing - early test on Cryptosporidium looks promising.

Check it out here:
https://rrwick.github.io/2025/09/04/homopolymer-polishing.html

Cross-sample homopolymer polishing with Pypolca

a blog for miscellaneous bioinformatics stuff

Ryan Wick’s bioinformatics blog
Dorado v0.9.1 now includes a bacterial model for genome polishing, so I put it to the test! How does it compare to Medaka? And does move-table data improve polishing accuracy? Read my analysis here:
https://rrwick.github.io/2025/02/07/dorado-polish.html
Medaka vs Dorado polish

a blog for miscellaneous bioinformatics stuff

Ryan Wick’s bioinformatics blog
To make producing soft core alignments easier, we developed Core-SNP-filter, a simple and efficient tool to process SNP alignments with user-defined thresholds.
https://github.com/rrwick/Core-SNP-filter
(4/4)
GitHub - rrwick/Core-SNP-filter: a tool to filter sites in a FASTA-format whole-genome pseudo-alignment

a tool to filter sites in a FASTA-format whole-genome pseudo-alignment - rrwick/Core-SNP-filter

GitHub
And the benefits grow with dataset size! A 100% strict core may work fine for small datasets (e.g. ~10 genomes) but is devastating for very large ones (e.g. 1000+ genomes). A 95% soft core works well across all dataset sizes.
(3/4)
Our key finding: a 95% soft core (allowing up to 5% missing data per site) is usually better than a 100% strict core. It retains more information, often leading to better phylogenetic resolution and stronger temporal signal.
(2/4)
Do you make core genome alignments for phylogenomics? Mona Taouk and I explored how including sites with some missing data (a soft core) can improve analysis, especially for large datasets.
https://www.microbiologyresearch.org/content/journal/mgen/10.1099/mgen.0.001346
(1/4)
Exploring SNP filtering strategies: the influence of strict vs soft core

Phylogenetic analyses are crucial for understanding microbial evolution and infectious disease transmission. Bacterial phylogenies are often inferred from SNP alignments, with SNPs as the fundamental signal within these data. SNP alignments can be reduced to a ‘strict core’ by removing those sites that do not have data present in every sample. However, as sample size and genome diversity increase, a strict core can shrink markedly, discarding potentially informative data. Here, we propose and provide evidence to support the use of a ‘soft core’ that tolerates some missing data, preserving more information for phylogenetic analysis. Using large datasets of Neisseria gonorrhoeae and Salmonella enterica serovar Typhi, we assess different core thresholds. Our results show that strict cores can drastically reduce informative sites compared to soft cores. In a 10 000-genome alignment of Salmonella enterica serovar Typhi, a 95% soft core yielded ten times more informative sites than a 100% strict core. Similar patterns were observed in N. gonorrhoeae. We further evaluated the accuracy of phylogenies built from strict- and soft-core alignments using datasets with strong temporal signals. Soft-core alignments generally outperformed strict cores in producing trees displaying clock-like behaviour; for instance, the N. gonorrhoeae 95% soft-core phylogeny had a root-to-tip regression R 2 of 0.50 compared to 0.21 for the strict-core phylogeny. This study suggests that soft-core strategies are preferable for large, diverse microbial datasets. To facilitate this, we developed Core-SNP-filter (https://github.com/rrwick/Core-SNP-filter), an open-source software tool for generating soft-core alignments from whole-genome alignments based on user-defined thresholds.

microbiologyresearch.org
Autocycler is still new and evolving. I'll continue to improve it based on your feedback. A big thank you to the alpha testers who gave me feedback before this release! Give it a try and share your thoughts.
(5/5)