Mehmet Tekman

18 Followers
28 Following
67 Posts
GNU-obsessed bioinformatician, in love with all things mountain
matrix@mtekman:matrix.org
gitlab and githubhttps://www.gitlab.com/mtekman and https://www.github.com/mtekman
ORCID and GSChttps://orcid.org/0000-0002-4181-2676 and https://scholar.google.com/citations?hl=en&tzom=-60&user=HVwU31YAAAAJ

Really good post about the state of social mobility in academia:

https://www.tanjabhuiyan.com/blog/the-empty-promise-of-social-mobility-through-education

The Empty Promise of Social Mobility through Education — Dr. Tanja Bhuiyan

About the empty promise of social mobility into the German higher education system

Dr. Tanja Bhuiyan

@galaxyproject almost everyone from the Freiburg Galaxy team going to #gcc2024 in Brno this June is going to travel there by train. Per person that saves ~ 200 kg of CO2.

Overall carbon footprint from traveling to GCC will, of course, be dominated by flights from outside the EU so how much potential is there for tweaking, e.g., the footprint of transatlantic flights?

Following are the results of the quick research I did this morning:

A nice little paper arguing that Chord diagrams are generally harder to understand than Sankney plots for the same data. Relevant for when people will next request the "nice circular diagrams" for a bioinformatics analysis: https://dl.acm.org/doi/fullHtml/10.1145/3544548.3581119
#hci #plots #bioinformatics
Showing Flow: Comparing Usability of Chord and Sankey Diagrams | Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems

ACM Conferences
LineageOS 21 has been released: https://lineageos.org/Changelog-28/
Changelog 28 - Fantastic Fourteen, Amazing Applications, Undeniable User-Experience

Lets have a drink!

Have you ever had two #chromosome 9's? Well today I have.

One more reason that I prefer #R Dataframes to #Python Dataframes (#pandas). In R, there is rarely any uncertainty when it comes to loading in genomic data.

Image1 shows a 140k row table generated with Pandas containing just "9" or "X" for the chromosome.

Image2 shows how that dataset is read easily by R, but misinterpreted by pandas unless you set the datatypes yourself.

I've heard of #chromosome_duplication but this is pushing it

@bebatut Thank you for making the Galaxy ecosystem so much better on so many fronts!You helped so many people and you should be proud of what you achieved. Best of luck. Bon Voyage Bérénice! 🌌🇫🇷
Turn an old eReader into an Information Screen (Nook STR)

Here's a quick tutorial for turning an old Nook into a passive display. This is an update to my 2013 post End Result An eInk screen which displays the trains I can catch from my local station. It shows the next few available trains, and whether they're delayed. It also shows how long until the […]

Terence Eden’s Blog

Well that don’t look right at all

So they’re -9 compressed bz2 files

$ file *.bz2 [...] DRR187559_1.fastqsanger.bz2: bzip2 compressed data, block size = 900k DRR187559_2.fastqsanger.bz2: bzip2 compressed data, block size = 900k

And when looking for the bzip2 header that indicates compression and start of file we see:

$ grep BZh9 -c *.bz2 1.bz2:0 2.bz2:0 3.bz2:0 4.bz2:0 5.bz2:0 6.bz2:0 7.bz2:0 8.bz2:0 9.bz2:1 DRR187559_1.fastqsanger.bz2:229 DRR187559_2.fastqsanger.bz2:259

the first 8 lines are expeted, BZh and then the compression level wouldn’t be in 1-8 which were compressed with the associated compression levels

But the last two, uhhh, how did you possibly generate bzip2 files with that many headers? Apparently that can happen through concatenation.

Fun fact: bzip2 reads _2 fine.
Funner fact: basically no other implementations do. I.e. most bioinformatics tools. They just read the first entry and are done. But we only know this because it’s split mid-read, unlike _1 which runs successfully while actually failing.

$ fastqc DRR187559_1.fastqsanger.bz2 application/x-bzip2 Started analysis of DRR187559_1.fastqsanger.bz2 Analysis complete for DRR187559_1.fastqsanger.bz2 fastqc DRR187559_1.fastqsanger.bz2 4.67s user 0.35s system 150% cpu 3.334 total

FastQC reports 1927 reads which is off by, a lot. (451782 is the correct value.) We’d never know unless we carefully check this.

So if your tool breaks on a bzip2 file, try decompressing and recompressing, and updating your resume on linkedin while you find a new career.

#bioinformatics #software

Akkoma

New preprint from Tanja Bhuiyan
(@TanjaBhuiyan6, @tanjabhuiyan.bsky.social)

> Delighted to share our study on basal #transcription factor TFIID and an unexpected link to #RNA splicing. #GeneRegulation #condensates

https://www.biorxiv.org/content/10.1101/2024.02.05.578926v1