117 Followers
67 Following
19 Posts

Assistant Professor
@UCR_CSE. Computational Mass Spectrometry, Bioinformatics. Loves collaborative science!

https://www.cs.ucr.edu/~mingxunw/

#massspec #teammassspec #molecularnetworking #GNPS #MassQL #Bioinformatics #datascience

Websitehttps://www.cs.ucr.edu/~mingxunw/

Are you interested in a 3-year #postdoc with dedicated international mobility and open science training? Check out the #YUFE4Postdocs program and join our team at the University of Antwerp!

Contact me to discuss your project addressing one or more urban opportunities and challenges, with a focus on sustainability or digital society.

More info: https://www.yufe4postdocs.eu/
Deadline: May 7, 2023

Home - YUFE4Postdocs

YUFE4Postdocs -
Does anyone have DDA Water's data they'd be willing to share to test? Trying to more thoroughly test workflows for converting raw data to work seamlessly with GNPS/GNPS2!
@metamorpheus Would second this, having RAW and mzML would be great. RAW for provenance of everything, and mzML for convenience if it were converted properly.
The Genome Sciences Department (https://www.gs.washington.edu) at UW is seeking candidates for the department chair. Our department is known for research in the areas of model organism and human genomics, #computationalbiology, and #genomics and #proteomics #techdev. Please boost ... https://apply.interfolio.com/114030
UW Genome Sciences

Wrapping up my first quarter as a professor at UC Riverside, the lab had its first outing together. One of my favorite parts of the job is the opportunity to work with these energetic and talented students. Excited to see what the next year holds!
@Elendol One final thing that I've been having a lot of fun with, is converting the mass spec data into columnar data formats like parquet or arrow. That plus some out of core compute strategies really makes it super fast to access a ton of data especially when paired with the flash ZFS arrays we've put together. If you're a company, totally affordable!
@Elendol Yeah goodluck haha, the distribution is hard in the native format. With regards to conversion, if you're running on any system with containers, NextFlow + MSConvert is kind of amazing and can distribute. Thats one way I want to convert all the data. Side note, converting/summarizing all data if its running 24/7 actually can churn through more data than you think. Ran through 400TB of Proteomics/Metabolomics data in a few weeks, not terrible.
@Elendol I hope reasonably soon we can data replicated in multiple places, and you can compute on the raw data wherever you are in the cloud or on our systems. But its not a super easy thing, but its not terrible either.
@Elendol As for conversions, in MassIVE we had built things to autoconvert and its reasonably successful, but trying to really consolidate in my new lab. I really feel you about the concern of where to put the data, should it all be on a single file system or in the cloud and who pays? My new lab, we're going down the route of all flash systems so that we can recompute on tons of data without even thinking of iops.
@Elendol So in general I agree with you sometimes having it all local in a file system is not super ideal but thats the way it works now. However, we're working on systems to have it more available (in computable forms) since reading mzML files is honestly the slowest part of any compute right now. As to the metadata, there are several things we use, for just knowing what exists, we have automated dataset caches so we know basics about everything, and also crowd sourced metadata in ReDU.