SAVE THE DATE! The 2024 RDKit UGM will take place from 11-13 September in Zurich Switzerland.
We'll post more information and open registration in Q1 of next year.
SAVE THE DATE! The 2024 RDKit UGM will take place from 11-13 September in Zurich Switzerland.
We'll post more information and open registration in Q1 of next year.
The accurate prediction of thermodynamic properties is crucial in various fields such as drug discovery and materials design. This task relies on sampling from the underlying Boltzmann distribution, which is challenging using conventional approaches such as simulations. In this work, we introduce Surrogate Model-Assisted Molecular Dynamics (SMA-MD), a new procedure to sample the equilibrium ensemble of molecules. First, SMA-MD leverages Deep Generative Models to enhance the sampling of slow degrees of freedom. Subsequently, the generated ensemble undergoes statistical reweighting, followed by short simulations. Our empirical results show that SMA-MD generates more diverse and lower energy ensembles than conventional Molecular Dynamics simulations. Furthermore, we showcase the application of SMA-MD for the computation of thermodynamical properties by estimating implicit solvation free energies.
Late-stage functionalization of drug molecules can tune their properties without the need for entirely new syntheses, however, predicting reactivity and planning synthesis for late-stage C-H activation remains challenging. Here, the authors develop a reaction screening approach combining high-throughput experimentation with computational graph neural networks to identify suitable substrates that can be used for late-stage C-H alkylation via Minisci-type chemistry.
While a multitude of deep generative models have recently emerged there exists no best practice for their practically relevant validation. On the one hand, novel de novo-generated molecules cannot be refuted by retrospective validation (so that this type of validation is biased); but on the other hand prospective validation is expensive and then often biased by the human selection process. In this case study, we frame retrospective validation as the ability to mimic human drug design, by answering the following question: Can a generative model trained on early-stage project compounds generate middle/late-stage compounds de novo? To this end, we used experimental data that contains the elapsed time of a synthetic expansion following hit identification from five public (where the time series was pre-processed to better reflect realistic synthetic expansions) and six in-house project datasets, and used REINVENT as a widely adopted RNN-based generative model. After splitting the dataset and training REINVENT on early-stage compounds, we found that rediscovery of middle/late-stage compounds was much higher in public projects (at 1.60%, 0.64%, and 0.21% of the top 100, 500, and 5,000 scored generated compounds) than in in-house projects (where the values were 0.00%, 0.03%, and 0.04%, respectively). Similarly, average single nearest neighbour similarity between early- and middle/late-stage compounds in public projects was higher between active compounds than inactive compounds; however, for in-house projects the converse was true, which makes rediscovery (if so desired) more difficult. We hence show that the generative model recovers very few middle/late-stage compounds from real-world drug discovery projects, highlighting the fundamental difference between purely algorithmic design and drug discovery as a real-world process. Evaluating de novo compound design approaches appears, based on the current study, difficult or even impossible to do retrospectively. "Scientific Contribution" This contribution hence illustrates aspects of evaluating the performance of generative models in a real-world setting which have not been extensively described previously and which hopefully contribute to their further future development.
A catalyst possessing a broad substrate scope, in terms of both turnover and enantioselectivity, is sometimes called “general”. Despite their great utility in asymmetric synthesis, truly general catalysts are difficult or expensive to discover via traditional high-throughput screening and are, therefore, rare. Existing computational tools accelerate the evaluation of reaction conditions from a pre-defined set of experiments to identify the most general ones, but cannot generate entirely new catalysts with enhanced substrate breadth. For these reasons, we report an inverse design strategy based on the open-source genetic algorithm NaviCatGA and on the OSCAR database of organocatalysts to simultaneously probe the catalyst and substrate scope and optimize generality as primary target. We apply this strategy to the Pictet–Spengler condensation, for which we curate a database of 820 reactions, used to train statistical models of selectivity and activity. Starting from OSCAR, we define a combinatorial space of millions of catalyst possibilities, and perform evolutionary experiments on a diverse substrate scope that is representative of the whole chemical space of tetrahydro-β-carboline products. While privileged catalysts emerge, we show how genetic optimization can address the broader question of generality in asymmetric synthesis, extracting structure–performance relationships from the challenging areas of chemical space.
I'm happy to announce that the 2023.09.1 release of the #RDKit is now out.
Release notes are here:
https://github.com/rdkit/rdkit/releases/tag/Release_2023_09_1
The conda-forge and NPM builds are already available and I guess that the pypi builds will show up soon as well.
OPSIN 2.8 ("Open Parser for Systematic IUPAC Nomenclature") was released last week: https://github.com/dan2097/opsin/releases/tag/2.8.0 #chemistry
Changes:
- Support for undecahectane/undecadictane
- Support for dicarboximido
- Improved support for lysergic acid derivatives
- Added a few more sugars e.g. digitalose
- Added borodeuteride and hydro contractions of pharmaceutical salts e.g. hydromethanesulfonate
- Support substitution on glyceric acid
- Corrected interpretation of imidazolium, trioxane and phthalhydrazide