Nice collaboration led by Niek de Jonge just got published in Journal of Cheminformatics 🚀.

In this work, we implemented and evaluated an extensive cleaning pipeline for MS/MS data.

https://link.springer.com/article/10.1186/s13321-024-00878-1

#Python #matchms #cheminformatics #opensource #openscience

Big thanks to all co-authors for this very nice collaboration!

Reproducible MS/MS library cleaning pipeline in matchms - Journal of Cheminformatics

Mass spectral libraries have proven to be essential for mass spectrum annotation, both for library matching and training new machine learning algorithms. A key step in training machine learning models is the availability of high-quality training data. Public libraries of mass spectrometry data that are open to user submission often suffer from limited metadata curation and harmonization. The resulting variability in data quality makes training of machine learning models challenging. Here we present a library cleaning pipeline designed for cleaning tandem mass spectrometry library data. The pipeline is designed with ease of use, flexibility, and reproducibility as leading principles.Scientific contributionThis pipeline will result in cleaner public mass spectral libraries that will improve library searching and the quality of machine-learning training datasets in mass spectrometry. This pipeline builds on previous work by adding new functionality for curating and correcting annotated libraries, by validating structure annotations. Due to the high quality of our software, the reproducibility, and improved logging, we think our new pipeline has the potential to become the standard in the field for cleaning tandem mass spectrometry libraries. Graphical Abstract

SpringerLink

🚀 New #matchms release is out!

Version 0.27 comes with much faster fingerprint comparisons (thanks Tornike), cleaner handling of neutral losses, and updated dependencies and Python support (thanks Julian).

https://github.com/matchms/matchms

#massspec #Python #opensource #openscience

GitHub - matchms/matchms: Python library for processing (tandem) mass spectrometry data and for computing spectral similarities.

Python library for processing (tandem) mass spectrometry data and for computing spectral similarities. - matchms/matchms

GitHub

Very glad that Niek de Jonge largely improved the MS/MS data cleaning in #matchms!

--> Preprint out now: https://chemrxiv.org/engage/chemrxiv/article-details/6560c5e629a13c4d47e66013

Project in collaboration with Helge Hecht and Justin van der Hooft.

Reproducible MS/MS library cleaning pipeline in matchms

Mass spectral libraries have proven to be essential for mass spectrum annotation, both for library matching and training new machine learning algorithms. A key step in training machine learning models is having high-quality training data. Public libraries of mass spectrometry data that are open to user submission often suffer from limited metadata curation and harmonization. The resulting variability in data quality makes training of machine learning models challenging. Here we present a library cleaning pipeline designed for cleaning tandem mass spectrometry library data. The pipeline is designed with ease of use, flexibility and reproducibility as leading principles.

ChemRxiv

We just made it in time for the Friday deadline! 💫
2 new releases:

#matchms 0.22.0 --> https://pypi.org/project/matchms/

#ms2deepscore 0.5.0 --> https://pypi.org/project/ms2deepscore/

#opensource #openscience #massspec #Python
Many thanks to Niek de Jonge & Helge Hecht 🙏

matchms

Python library for large-scale comparisons and processing of tandem mass spectral data

PyPI

ENPKG integrates or is built on many computational metabolomics tools, such as #LOTUS, #SIRIUS, #GNPS, #matchms, #spec2vec, #GNPSDashboard, or #MassQL! A big thank you to the people behind them 🙏

➡ More info in the preprint: https://doi.org/10.26434/chemrxiv-2023-sljbt

A Sample-Centric and Knowledge-Driven Computational Framework for Natural Products Drug Discovery

Modern natural products (NPs) research relies on untargeted liquid chromatography coupled with mass spectrometry metabolomics. Together with cutting-edge processing and computational annotation strategies, such approaches can yield extensive spectral and structural information. However, current processing workflows require feature-alignment steps based on retention time which hinders the comparison of samples originating from different batches or analyzed using different instrumental setups. In addition, there is currently no analytical framework available to efficiently match processed metabolomics data and associated metadata with external resources. To address these limitations, we present a new sample-centric and knowledge-driven framework allowing multi-modal data alignment - e.g. through chemical structures, biological activities, or spectral features - and demonstrate its value in exploring large and chemodiverse natural extract datasets. Here, the experimental data is processed at the sample level, matched with external identifiers where possible, semantically enriched, and integrated into a unified knowledge graph. The use of semantic web technology enables comparison of processed and standardized data, information, and knowledge at the repository scale. We demonstrate the utility of the developed framework, the Experimental Natural Products Knowledge Graph (ENPKG), to leverage the results obtained from screening 1,600 plant extracts against trypanosomatids and streamline the identification of new antiparasitic compounds. Thanks to its versatility, the proposed approach allows for a radically novel exploitation of metabolomics data. Semantic web technologies are a fundamental asset and we anticipate that their adoption will complement the current computational metabolomics pipelines and enable the community to advance in the description of global chemodiversity and drug discovery projects.

ChemRxiv

Tutorial blog posts to get started with #matchms and #spec2vec which were done with @eScienceCenter were (slightly) updated to work with matchms 0.18.0:

--> https://blog.esciencecenter.nl/build-your-own-mass-spectrometry-analysis-pipeline-in-python-using-matchms-part-i-d96c718c68ee

#OpenScience #MassSpec #tutorials

Build your own mass spectrometry analysis pipeline in Python using matchms — part I

Mass spectrometry data is at the heart of numerous applications in the biomedical and life sciences. With growing use of high-throughput techniques and increasing availability of public datasets, it…

Netherlands eScience Center

New release of #matchms (0.18.0) and other key pieces of the matchms ecosystem: #spec2vec (0.8.0) & #ms2deepscore (0.3.1).😊
--> https://github.com/matchms/matchms

Main changes:
✨ Similarity scores are stored as sparse arrays
✨ New Pipeline class to assemble matchms workflows

#OpenSource #OpenScience

Thanks to all developers invovled in recent changes, including @twitter@hecht_h , Maxim Skoryk, Niek de Jonge, @twitter@jjjvanderhooft , David Joas.🙏

GitHub - matchms/matchms: Python library for processing (tandem) mass spectrometry data and for computing spectral similarities.

Python library for processing (tandem) mass spectrometry data and for computing spectral similarities. - matchms/matchms

GitHub