Marwin Segler

@marwinsegler
297 Followers
204 Following
28 Posts
{Organic, Medicinal, Comp.} Chemistry, ML, Drug Discovery, Catalysis, Computer Assisted Scientific Discovery & Creativity, Music. At Microsoft Research (MSR) AI4Science
Quite a pleasant sea view, and delighted not having spent 24 h crammed in a plane. You can still check out our ICML paper “Retrosynthetic Planning with Dual Value Networks” led by my fabulous colleague Guoqing Liu, happy to set up a virtual coffee to chat about it. Also on arxiv: https://arxiv.org/abs/2301.13755
Retrosynthetic Planning with Dual Value Networks

Retrosynthesis, which aims to find a route to synthesize a target molecule from commercially available starting materials, is a critical task in drug discovery and materials design. Recently, the combination of ML-based single-step reaction predictors with multi-step planners has led to promising results. However, the single-step predictors are mostly trained offline to optimize the single-step accuracy, without considering complete routes. Here, we leverage reinforcement learning (RL) to improve the single-step predictor, by using a tree-shaped MDP to optimize complete routes. Specifically, we propose a novel online training algorithm, called Planning with Dual Value Networks (PDVN), which alternates between the planning phase and updating phase. In PDVN, we construct two separate value networks to predict the synthesizability and cost of molecules, respectively. To maintain the single-step accuracy, we design a two-branch network structure for the single-step predictor. On the widely-used USPTO dataset, our PDVN algorithm improves the search success rate of existing multi-step planners (e.g., increasing the success rate from 85.79% to 98.95% for Retro*, and reducing the number of model calls by half while solving 99.47% molecules for RetroGraph). Additionally, PDVN helps find shorter synthesis routes (e.g., reducing the average route length from 5.76 to 4.83 for Retro*, and from 5.63 to 4.78 for RetroGraph).

arXiv.org

if you want to intern with us at Microsoft Research AI4Science (including my team), please apply here until the 24th of Dec!

https://careers.microsoft.com/us/en/job/1497779/Internship-Opportunity-AI4Science

Internship Opportunity: AI4Science in Cambridge, Cambridgeshire, United Kingdom | Research, Applied, & Data Sciences at Microsoft

Apply for Internship Opportunity: AI4Science job with Microsoft in Cambridge, Cambridgeshire, United Kingdom. Research, Applied, & Data Sciences at Microsoft

Microsoft

We're organising a workshop on Physics for ML at #ICLR2023.

Submit your work on physics-based ML, equivariance, etc.

Site: https://physics4ml.github.io
OpenReview: https://openreview.net/group?id=ICLR.cc/2023/Workshop/Physics4ML

Deadline 3rd Feb.

https://twitter.com/tk_rusch/status/1603791044398702595

#Physics4ML #AI4Science #GeometricDeepLearning

Overview

Physics4ML

ICLR 2023 Workshop on Physics for Machine Learning

After few days here it's time for an #Introduction. I am

... a computational chemist interested in #OrganicChemistry and #MolecularChemistry,
... a researcher / lecturer at the university (#WWU), giving courses for undergraduate and MSc students,
... living in #muenster with a small family and a #dog,
... traveling by #bicycle whenever possible.

I would like to discuss scientific and other topics: #compchem, #python, #Fortran, #Cycling, #hiking, #norway, #ElectronicMusic, #muenster, etc ...

@Phlogiston1

3/3 In many cases I was first unconvinced, but then upon digging in Reaxys/Scifinder, found the suggested reactions to be well precedented. Machines don’t just learn, sometimes they can also teach a bit…

@Phlogiston1 2/3

Can PMI be reasonably predicted? Or is it too complex?

The point on obvious routes is an interesting one. For comp Chem where 10000s of molecules need to be screened this seems desirable, but for process much less so. In my experience, with certain kinds of models, the algorithms can also return non-obvious solutions, in particular for more exotic heterocycle formation.

@Phlogiston1 good points! What I still need to make up my mind about is to what extent the “best” routes need to be returned during search, or whether the search algorithm should just return a long list of routes, that can then be refined according to desired criteria. Maybe that gives more flexibility to explore the results. I would guess setting up the right multi-objective scoring upfront might be tricky, similar to algorithmic molecular design.
@Phlogiston1 great! We want to investigate this in more detail now. If you have suggestions what to look into, I’d love to hear about it and discuss. From a chemical perspective, besides outright errors, if I would pick 4 challenges it would be in 1) regio- & chemoselectivity 2) poor synthesis strategy (repetitive steps, particularly with protecting groups) 3) handling of conditions 4) lack of diversity of solutions. What do you see as most problematic in the current tools?
@[email protected] the field is certainly not yet where it needs to be, but we will (slowly) change that
@b00gizm I have to say this really works well in the UK, on all levels. Germany could learn a lot from how things are done here.