Mastodawn

Elendol Nov 29, 2022

A question for all the people working on #massspec #massspectometry #teammassspec #proteomics, especially working in platforms: how do you store and manage all your raw / mzML files? Anything beyond chugging files on a big enough disk? Other folks in other fields of #bioinformatics do you have a raw file management system for all your raw data from sequencers and other machines?

https://hachyderm.io/@Elendol/109427080898974176

Elendol (@[email protected])

If someone knows they are likely to be here I am looking for a software solution (or pointers to build a solution) able to organise and manage BIG data. Not billions of small text records, dozens of thousand of ~1Gb binary files. Something better than a basic file system. So maybe a database for the metadata (and something to extract metadata). We want to keep the raw files as source of truth but also track files converted to friendlier formats (with loss). #programming #devops #dataOps

Hachyderm.io

Show thread

Zhenbo Li Nov 30, 2022

@Elendol @makingions for me, this even worse. Our lab server doesn’t have a big enough disk :(

Show thread

Cris Lapthorn Nov 30, 2022

@zhenboli @Elendol

Yep, in industry the need is typically data on servers for regulatory/IP reasons, and in core facilities/group labs etc it become a challenge to manage data (like @zhenboli says) with diverse approaches from external HDDs to AWS.

@UCDProteomics captured thoughts on Twitter about data wrangling e.g. https://twitter.com/UCDProteomics/status/1482104659858771970

@mingxunwang is the main developer for MassQL that enables querying of raw mass spe trometry data. https://mwang87.github.io/MassQueryLanguage_Documentation/

Brett Phinney on Twitter

“We finally return for 2022 “Proteomics old time radio hour” with @neely615. Thursday, Jan 20 at 11:30 AM PST in @clubhouse. Join us! https://t.co/DEPvEBm2fh”

Twitter

Show thread

Elendol

@makingions @zhenboli @UCDProteomics @mingxunwang thank you.

I need to give it a listen.

Yes, very useful to query raw files. My issue now is how do I query 10k raw files 😁 (well, the issue is more: how to do it well).