@nicholdav @astrojuanlu I think maybe the most important law of software architecture is Conway's Law - the structure of the program will tend to reflect the structure of the organization. Most scientific apps will probably reflect the very simple structure of the lab that supports them.

#rse #ScientificSoftware

➡️ Explore the OSF Open Source Ecosystem: https://www.cos.io/ose

➡️ 2026 BSSw Fellows & Honorable Mentions: https://lnkd.in/e-yaEih7

#OpenScience #OpenSource #ScientificSoftware #OSF #ResearchInfrastructure #CenterForOpenScience

Advancing Open Science Through Open Source Development

Scholarship advances best when knowledge, tools, and data are shared as public goods. Just as societies invest in roads, power grids, and internet infrastructure for the benefit of all, research communities need durable, open infrastructure to support discovery. Open source provides that foundation by ensuring that research systems are transparent, adaptable, and built for the long term.

Building ML Tools Scientists Will Actually Use

The Gap Between Models and Tools I've seen a lot of impressive ML models in biopharma that never get used. Not because the science is wrong, but because the tool doesn't fit into anyone's workflow. The model might be published in Nature Methods with beautiful receiver operating characteristic curves, but if a discovery scientist can't access it without filing an IT ticket or if it requires command-line expertise, it sits unused. This is the reality of building ML tools for scientific users: […]

https://kemal.yaylali.uk/building-ml-tools-scientists-will-actually-use/

Building ML Tools Scientists Will Actually Use – Kemal's

New Preprint Alert!

We're excited to share our latest work on #ChemRxiv! MARCUS (Molecular Annotation and Recognition for Curating Unravelled Structures) is a web-based platform for extracting chemical information from scientific papers.

📄 Preprint: https://doi.org/10.26434/chemrxiv-2025-9p1q1

🔗 Try it out: https://marcus.decimer.ai

#Cheminformatics #OpenScience #ChemicalDatabases #AIinScience #ScientificSoftware #ResearchTools

MARCUS: Molecular Annotation and Recognition for Curating Unravelled Structures

The exponential growth of chemical literature necessitates the development of automated tools for extracting and curating molecular information from unstructured scientific publications into open-access chemical databases. Current optical chemical structure recognition (OCSR) and named entity recognition solutions operate in isolation, which limits their scalability for comprehensive literature curation. Here we present MARCUS (Molecular Annotation and Recognition for Curating Unravelled Structures), a tool to aid curators in performing literature curation in the field of natural products. This integrated web-based platform combines automated text annotation, multi-engine OCSR, and direct submission capabilities to the COCONUT database. MARCUS employs a fine-tuned GPT-4 model to extract chemical entities and utilises an ensemble approach integrating DECIMER, MolNexTR, and MolScribe for structure recognition. The platform aims to streamline the data extraction workflow from PDF upload to database submission, significantly reducing curation time. MARCUS bridges the gap between unstructured chemical literature and machine-actionable databases, enabling FAIR data principles and facilitating AI-driven chemical discovery. Through open-source code, accessible models, and comprehensive documentation, the web application enhances accessibility and promotes community-driven development. This approach facilitates unrestricted use and encourages the collaborative advancement of automated chemical literature curation tools. We dedicate MARCUS to Dr Marcus Ennis, the longest-serving curator of the ChEBI database, on the occasion of his 75th birthday.

ChemRxiv

At the request of a journal editor, I reviewed a paper by leading researchers on one of my favorite #chemistry topics - tautomers! This article was featured in the Journal of Chemical Information and Modeling. I am grateful for the #PeerReview certificate presented by the American Chemical Society. It was an honor to be entrusted with this responsibility.

Reminder that I'm #OpenToWork for #cheminformatics or #scientificSoftware development. Let's discuss how my skills can benefit your team.

The 2025_03_1 release of #RDKit release includes my contribution to speed up part of getting 2D fingerprints for a molecule by ~75x! So if you generate #chemical fingerprints, now is a good time to upgrade.

Reminder that I'm #OpenToWork so if you're hiring for #cheminformatics or #scientificSoftware development, let's talk.

#chemistry #DrugDiscovery #pharma #PythonForChemists

https://github.com/rdkit/rdkit/releases/tag/Release_2025_03_1

Release 2025_03_1 (Q1 2025) Release · rdkit/rdkit

Release_2025.03.1 (Changes relative to Release_2024.09.1) Acknowledgements (Note: I'm no longer attempting to manually curate names. If you would like to see your contribution acknowledged with you...

GitHub

I'm excited to present "Finding Tautomers" at the first North American #RDKit User Group Meeting in the #Boston #MA area on Friday April 11!

Reminder that I'm #OpenToWork so if you're in the area and hiring for #cheminformatics or #scientificSoftware development, let me know and we can meet to discuss your needs.

Interested in #MPI and #OpenMP parallel programming to speed up your scientific applications written in #C, #Cpp, #Fortran or #Python (with #numpy)?

Attend our course in #Mainz at the Johannes Gutenberg University (#JGU) for a 4-day course from 1. April to 4. April 2025!

See our announcement page for further details and to register: https://indico.zdv.uni-mainz.de/event/34/

Note, it is an on-site course.

#RSE #HPC #scientificsoftware

Parallel Programming with MPI and OpenMP (4-Day Workshop)

Dive into the world of high-performance computing with our hands-on workshop, focusing on the programming models MPI and OpenMP. Gain practical experience with Message Passing Interface (MPI) basics and shared memory directives of OpenMP through interactive sessions in C or Fortran. Agenda: A preliminary course outline can be found here. Location: Takes place at the computing centre of the University of Mainz. Detailed travel directions will be provided to accepted participants in advance....

Indico

The #Energy #Climate & #Environment program at #IIASAVienna had its quarterly meeting last Friday (~100 researchers), so I had to reflect on our role as community data hub and what to present on behalf of the #ScenarioServices & #ScientificSoftware team.

We developed a new #ScenarioExplorer front-end last year, and we made a lot of progress with our #opensource packages for scenario analysis, validation & data-management.

Step by step towards #OpenScience and reusable, reproducible analysis...

Working with #NUTS administrative EU 🇪🇺 regions is one of the little nuisances in #energysystems modelling and scenario analysis.

So the #IIASA #ScenarioServices team put together a little #opensource #python utility package so that modelers can focus on #freethemodels and don’t have to spend too much time on data-wrangling…
#pysquirrel #ScientificSoftware
https://github.com/iiasa/pysquirrel

GitHub - iiasa/pysquirrel

Contribute to iiasa/pysquirrel development by creating an account on GitHub.

GitHub