My poster at #biodata22
Last Saturday at #biodata22, I talked about our recent work on spectrum preserving tilings (SPTs). The slides for that talk can be found here (https://umd.box.com/s/b0klpykjkdui5ptq34fmnz68o62bbtbn).
One key highlight is that we have made the initial releae of piscem. Why is that exciting? A short 🧵1/12
keeping_kmers_in_check.pdf - Box

Short thread on my poster from #biodata22 on detecting allelic imbalance at isoform-level and in single cells, work with Rob Patro, Noor Singh, Euphy Wu et al.

PDF here: https://www.dropbox.com/s/yjr0d4mndnozwmd/DATA_22_Love.pdf?dl=0

DATA_22_Love.pdf

Shared with Dropbox

Dropbox

@timtriche yeah I’ve been manually cross posting.

At #biodata22 I found Twitter was easier to use, eg I want to quickly look up handles and draft posts / threads during sessions, mostly to promote work by Phd students. Both of those are hard to do here (the latter not possible w the main app).

But for me this is week 1 of trying a new thing, entirely OSS and hosted/moderated by volunteers so I’ve got lots of patience to figure things out

That's a wrap on #biodata22, next one is #biodata24 on November 6-9, 2024.

Wish I could've attended in person this year, but a quick shout out to all the organizers who enabled a quick pivot to virtual! You da real MVPs!

Markus Sommer #biodata22 closing us out with "Structure‐guided isoform analysis for the human transcriptome".

Problem: We have many more transcript annotations than genes. Which isoforms actually represent functional proteins?

Leveraging folding algorithms (e.g. AlphaFold2) to score each isoform, high score = more likely to be functional. Showed some examples where this scoring approach matches experimental data. Says not perfect, but helpful data point.

Website: https://www.isoform.io/

isoform.io

Here we provide open access to 3D structure predictions for 194,780 human protein isoforms Individual predicted structures and scores can be found in Isoforms Summary tables and all structures can be found in Downloads Search your own protein structure against our database in Foldseek

Harun Mustafa #biodata22 on "A modular multi‐label framework for aligning sequences to large read set databases and (pan)genomes".

Problem: Low-coverage pan-sample (or genome?) alignment is challenging due to gaps in graphs (both sequence and labels), leading to shorter alignments downstream.

Describing a method, "MetaGraph-MLA", that allows the aligner to leverage "similar" samples (i.e. without the gap) to increase alignment lengths.

Pre-print: https://www.biorxiv.org/content/10.1101/2022.11.04.514718v1

Katharine Jenike #biodata22 on "Establishing a Solanum pan‐genome to dissect dynamics of paralog evolution".

Building a pan-genome from Solanum, so far with 17 fully assembled genomes built via HiFi + hifiasm followed by scaffolding via HiC and Bionano. Assemblies are chromosome scale and then annotated.

Described "Panagram", a tool for visualizing the constructed pangenome and exploring unique sequence, synteny, etc.:
https://github.com/kjenike/Panagram

GitHub - kjenike/Panagram

Contribute to kjenike/Panagram development by creating an account on GitHub.

GitHub

Robert Patro (@rob) #biodata22 on "Keeping k‐mers in check—Building fast, small, and composable indices based on the De Bruijn graph".

Problem: Reference indexing is challenging, as we add reference (e.g. pangenome), the index grows rapidly. How do we keep this reference small?

Suggests model that splits index into two "tables" allowing for modular implementations, isolating bottlenecks.

Two repos mentioned:
Piscem: https://github.com/COMBINE-lab/piscem
Pufferfish2: https://github.com/COMBINE-lab/pufferfish2

GitHub - COMBINE-lab/piscem: Rust wrapper for the next generation (still currently in C++)

Rust wrapper for the next generation (still currently in C++) - GitHub - COMBINE-lab/piscem: Rust wrapper for the next generation (still currently in C++)

GitHub
@rob new implementation Piscem, already in use in different contexts worked on by the lab #biodata22