Jan Jensen

@janhjensen
618 Followers
90 Following
550 Posts
Computational chemist at the University of Copenhagen. Editor-in-Chief PeerJ Physical Chemistry

SAVE THE DATE! The 2024 RDKit UGM will take place from 11-13 September in Zurich Switzerland.

We'll post more information and open registration in Q1 of next year.

Another onymous, impact neutral review sent off. This time for Nature Computational Science. https://proteinsandwavefunctions.blogspot.com/2016/01/writing-impact-neutral-review.html
Writing an impact neutral review

The idea of impact neutral reviewing was pioneered by PLoS ONE ten years ago this year. The idea is that ... PLOS ONE only verifies ...

Generation of conformational ensembles of small molecules via Surrogate Model-Assisted Molecular Dynamics | ChemRxiv - https://go.shr.lc/3RlsdCt #compchem
Generation of conformational ensembles of small molecules via Surrogate Model-Assisted Molecular Dynamics

The accurate prediction of thermodynamic properties is crucial in various fields such as drug discovery and materials design. This task relies on sampling from the underlying Boltzmann distribution, which is challenging using conventional approaches such as simulations. In this work, we introduce Surrogate Model-Assisted Molecular Dynamics (SMA-MD), a new procedure to sample the equilibrium ensemble of molecules. First, SMA-MD leverages Deep Generative Models to enhance the sampling of slow degrees of freedom. Subsequently, the generated ensemble undergoes statistical reweighting, followed by short simulations. Our empirical results show that SMA-MD generates more diverse and lower energy ensembles than conventional Molecular Dynamics simulations. Furthermore, we showcase the application of SMA-MD for the computation of thermodynamical properties by estimating implicit solvation free energies.

ChemRxiv
LLamol: A Dynamic Multi-Conditional Generative Transformer for De Novo Molecular Design https://arxiv.org/abs/2311.14407 #compchem
Identifying opportunities for late-stage C-H alkylation with high-throughput experimentation and in silico reaction screening https://www.nature.com/articles/s42004-023-01047-5 #compchem
Identifying opportunities for late-stage C-H alkylation with high-throughput experimentation and in silico reaction screening - Communications Chemistry

Late-stage functionalization of drug molecules can tune their properties without the need for entirely new syntheses, however, predicting reactivity and planning synthesis for late-stage C-H activation remains challenging. Here, the authors develop a reaction screening approach combining high-throughput experimentation with computational graph neural networks to identify suitable substrates that can be used for late-stage C-H alkylation via Minisci-type chemistry.

Nature
Another onymous, impact neutral review sent off. This time for Nature Computational Materials https://proteinsandwavefunctions.blogspot.com/2016/01/writing-impact-neutral-review.html
Writing an impact neutral review

The idea of impact neutral reviewing was pioneered by PLoS ONE ten years ago this year. The idea is that ... PLOS ONE only verifies ...

On The Difficulty of Validating Molecular Generative Models Realistically: A Case Study on Public and Proprietary Data | ChemRxiv - https://go.shr.lc/47CpxFJ #compchem
On The Difficulty of Validating Molecular Generative Models Realistically: A Case Study on Public and Proprietary Data

While a multitude of deep generative models have recently emerged there exists no best practice for their practically relevant validation. On the one hand, novel de novo-generated molecules cannot be refuted by retrospective validation (so that this type of validation is biased); but on the other hand prospective validation is expensive and then often biased by the human selection process. In this case study, we frame retrospective validation as the ability to mimic human drug design, by answering the following question: Can a generative model trained on early-stage project compounds generate middle/late-stage compounds de novo? To this end, we used experimental data that contains the elapsed time of a synthetic expansion following hit identification from five public (where the time series was pre-processed to better reflect realistic synthetic expansions) and six in-house project datasets, and used REINVENT as a widely adopted RNN-based generative model. After splitting the dataset and training REINVENT on early-stage compounds, we found that rediscovery of middle/late-stage compounds was much higher in public projects (at 1.60%, 0.64%, and 0.21% of the top 100, 500, and 5,000 scored generated compounds) than in in-house projects (where the values were 0.00%, 0.03%, and 0.04%, respectively). Similarly, average single nearest neighbour similarity between early- and middle/late-stage compounds in public projects was higher between active compounds than inactive compounds; however, for in-house projects the converse was true, which makes rediscovery (if so desired) more difficult. We hence show that the generative model recovers very few middle/late-stage compounds from real-world drug discovery projects, highlighting the fundamental difference between purely algorithmic design and drug discovery as a real-world process. Evaluating de novo compound design approaches appears, based on the current study, difficult or even impossible to do retrospectively. "Scientific Contribution" This contribution hence illustrates aspects of evaluating the performance of generative models in a real-world setting which have not been extensively described previously and which hopefully contribute to their further future development.

ChemRxiv
A Genetic Optimization Strategy with Generality in Asymmetric Organocatalysis as Primary Target | ChemRxiv - https://go.shr.lc/47DMkki #compchem
A Genetic Optimization Strategy with Generality in Asymmetric Organocatalysis as Primary Target

A catalyst possessing a broad substrate scope, in terms of both turnover and enantioselectivity, is sometimes called “general”. Despite their great utility in asymmetric synthesis, truly general catalysts are difficult or expensive to discover via traditional high-throughput screening and are, therefore, rare. Existing computational tools accelerate the evaluation of reaction conditions from a pre-defined set of experiments to identify the most general ones, but cannot generate entirely new catalysts with enhanced substrate breadth. For these reasons, we report an inverse design strategy based on the open-source genetic algorithm NaviCatGA and on the OSCAR database of organocatalysts to simultaneously probe the catalyst and substrate scope and optimize generality as primary target. We apply this strategy to the Pictet–Spengler condensation, for which we curate a database of 820 reactions, used to train statistical models of selectivity and activity. Starting from OSCAR, we define a combinatorial space of millions of catalyst possibilities, and perform evolutionary experiments on a diverse substrate scope that is representative of the whole chemical space of tetrahydro-β-carboline products. While privileged catalysts emerge, we show how genetic optimization can address the broader question of generality in asymmetric synthesis, extracting structure–performance relationships from the challenging areas of chemical space.

ChemRxiv

I'm happy to announce that the 2023.09.1 release of the #RDKit is now out.

Release notes are here:
https://github.com/rdkit/rdkit/releases/tag/Release_2023_09_1

The conda-forge and NPM builds are already available and I guess that the pypi builds will show up soon as well.

Release 2023_09_1 (Q3 2023) Release · rdkit/rdkit

Release_2023.09.1 (Changes relative to Release_2023.03.1) Acknowledgements (Note: I'm no longer attempting to manually curate names. If you would like to see your contribution acknowledged with you...

GitHub

OPSIN 2.8 ("Open Parser for Systematic IUPAC Nomenclature") was released last week: https://github.com/dan2097/opsin/releases/tag/2.8.0 #chemistry

Changes:
- Support for undecahectane/undecadictane
- Support for dicarboximido
- Improved support for lysergic acid derivatives
- Added a few more sugars e.g. digitalose
- Added borodeuteride and hydro contractions of pharmaceutical salts e.g. hydromethanesulfonate
- Support substitution on glyceric acid
- Corrected interpretation of imidazolium, trioxane and phthalhydrazide

Release v2.8.0 · dan2097/opsin

Support for undecahectane/undecadictane (previously only hendeca was supported) Support for dicarboximido Improved support for lysergic acid derivatives Added a few more sugars e.g. digitalose Adde...

GitHub