This weeks #ChemSciPicks is a fantastic Edge article from Charlotte Deane et al., (University of Oxford).

This Edge article reports PoseBusters, a Python package that performs a series of standard quality checks using the well-established cheminformatics toolkit RDKit.

You can read the work for free here:

https://doi.org/10.1039/D3SC04185A

#Chemistry

(This was the paper that I was excited to share and I am so glad that the rest of the Chemical Science editorial team agreed with me!)

The PoseBusters test suite validates chemical and geometric consistency of a ligand including its stereochemistry, and the physical plausibility of intra- and intermolecular measurements such as the planarity of aromatic rings, standard bond lengths, and protein-ligand clashes.
PoseBusters was then used in this study to compare five deep learning-based docking methods and two well-established standard docking methods with and without an additional post-prediction energy minimisation step using a molecular mechanics force field. In terms of physical plausibility and the ability to generalise to examples that are distinct from the training data, no deep learning-based method yet outperforms classical docking tools.

Okay! So some personal thoughts!

My background is in computational chemistry, primarily being applied to catalysis. I worked mainly with DFT, but also some QM/MM methods.

As much as I think AI is exciting, it often feels like you are getting to the answer without understanding, or doing, the working.

And it is in the working where there are interesting observations to be uncovered, new mechanisms to be found and potentially some really crucial insight.

So this paper is a timely reminder that where AI can definitely speed things up, sometimes missing out those inbetween steps - which rely on clear physical models and describe chemistry accurately (for the most part!) - isn't the best.

It is a really interesting paper and one that I can't speak about in a whole lot of depth (this sort of posing and docking wasn't my area) but I think it is an absolutely great read and Charlotte Deane's group have produced an amazing piece of work.

@EllisCrawford here yes, but others work with XAI - explainable AI and there are some people who also have approaches to pull out the explanations from deep learning. (I didn’t yet try the latter). For chemistry of course it seems to be easier than more complex systems.
@annakcroft XAI sounds brilliant and I'll need to look more into developments with that and chemistry!
@EllisCrawford @annakcroft
Yes, XAI is fascinating, it feels like we’re only scratching the surface of it. We did some work here https://chemrxiv.org/engage/chemrxiv/article-details/64f67524dd1a73847f341bee trying to understand what the learned representation space looks like (we’re definitely not the only ones!). The field of chemistry is interesting for XAI as it is complex enough to be non-trivial, yet small enough to be tractable.
Global interpretability and geometry of graph convolu- tional neural networks for chemistry in terms of chemical moieties

Graph convolutional neural nets, such as SchNet, [Schütt et al, Journal of Chemical Physics, 2018, 148, 241722], provide accurate predictions of chemical quantities without invoking any direct physical or chemical principles. These methods learn a hidden statistical representation of molecular systems in an end-to-end fashion; from xyz coordinates to molecular properties with many hidden layers in between. This naturally leads to the interpretability question: what underlying chemical model determines the algorithm’s accurate decision-making? To answer this question, we analyze the hidden layer activations of QM9-trained SchNet, also known as “embedding vectors” with dimension- reduction, linear discriminant analysis and Euclidean-distance measures. The result is a quantifiable geometry of the model’s decision making that identifies chemical moieties and has a low parametric space of ∼ 5 important parameters from the fully-trained 128-parameter embedding. The geometry of the embedding space organizes these moieties with sharp linear boundaries that can classify each chemical environment within < 5 × 10−4 error. Euclidean distance between embedding vectors can be used to demonstrate a versatile molecular similarity measure, outperforming other popular hand- crafted representations such as Smooth Overlap of Atomic Positions (SOAP). We also reveal that the embedding vectors can be used to extract observables that are related to chemical environments such as pKa and NMR. The work is in line with the recent push for explainable AI and gives insights into the depth of modern statistical representations of chemistry, such as graph convolutional neural nets, in this rapidly evolving technology.

ChemRxiv
@EllisCrawford I'd strongly second this as a software engineer and mathematician... I work in fields where being able to automate has been the norm, and a vital tool, for decades. But extracting true insight necessary for next level discoveries and innovations still relies, fundamentally, on getting down and dirty with the details using a human mind; jumping over that step jumps over crucial pattern formation and ends up blocking the entire discovery process down the line.
Charlotte Deane is a Professor of Structural Bioinformatics at the University of Oxford, where she leads the Oxford Protein Informatics Group (OPIG), a research group of over 20 people working on diverse problems across immunoinformatics, protein structure and small molecule drug delivery, using statistics, AI and computation to generate biological and medical insight.
Charlotte's research covers several areas in protein structure prediction and protein interaction networks, with work mainly focused on protein structure, immunoinformatics, biological networks and small molecules.
@EllisCrawford Sent to the group Slack (your posts make me look smart when I forward them  )
@SRLevine oh no. Now I need to hope that what I say is at least somewhat reasonable 🤣