Interestingly, only a few days after writing this toot, I somehow wound up with an invitation to use the molab online notebooks from #Marimo (https://marimo.io/).

I am happy to announce that molab notebooks are indeed able to load and run #RDKit; this will help immensely for people like me who are stuck on using stuff from the browser due to various constraints.

marimo | a next-generation Python notebook

Explore data and build apps seamlessly with marimo, a next-generation Python notebook.

If y'all would indulge me a late addition: the appropriate lines in the #RDKit code (https://github.com/rdkit/rdkit/blob/master/rdkit/Chem/Draw/__init__.py#L190-L224) can of course be modified to use the B&R blob I introduced in the essay.

What's hampering me from actually trying this out for myself is that due to RDKit not yet being supported by #Pyodide, a lot of the webservices that can run #Python from the browser (e.g. the online version of #jupyter) are also not able to load RDKit. (I have seen at least two proposals to use the JS version of RDKit instead, but I have not managed to make that work either.) Because I had written that essay, and am currently writing this toot from a smartphone (for a number of reasons, I don't have access to a computer I can use), indirectly using RDKit through Wolfram Cloud (basically the online version of Mathematica) is pretty much how I attempt to do cheminformatics experiments on a smartphone. :)

#chemistry #cheminformatics #visualization

rdkit/rdkit/Chem/Draw/__init__.py at master · rdkit/rdkit

The official sources for the RDKit library. Contribute to rdkit/rdkit development by creating an account on GitHub.

GitHub

I posted a short computational essay on Wolfram Community a little while ago (https://community.wolfram.com/groups/-/m/t/3727989) regarding what I call "molecular heatmaps" for visualizing atomic descriptors. This is based on previous work by #RDKit's Landrum and Riniker (cf. https://doi.org/10.1186/1758-2946-5-43).

Something I did not talk about in that essay is the virtue of reading journal articles that are outside of your usual purview (or, you might even say "comfort zone"). If I had completely restricted myself to the cheminformatics literature, I would not have found out about "blob functions" (cf. https://doi.org/10.1080/10867651.2001.10487549), which are often used in computer graphics. This isn't the first time I was able to profitably use knowledge from one subject to use in another one. It takes conscious effort, but I recommend having a wide reading appetite very warmly.

#chemistry #cheminformatics #visualization

What if hitting multiple drug targets wasn't a problem to solve, but an advantage to exploit? I'll be exploring that idea at the #marimo Community Call on April 9, 3–4PM ET — covering drug discovery with a compound-first approach.
RSVP at https://luma.com/6p89x1s2?tk=TGd9ER
#chemistry #drugDiscovery #notebook #python #RDKit #sqlite
marimo community call · Luma

Hear directly from the developers of marimo at this year's inaugural community call. We'll share our roadmap, and demo an exciting new feature that we believe…

Here's an #RDKit #cheminformatics quiz for you all. What do you think this code will output?

from rdkit import Chem
mol = Chem.MolFromSmiles("C" + "C(C)(C)" * 50 + "C")
pat = Chem.MolFromSmarts("[$([CD4H0X4](-*)(-*)(-*)-*)]")
print(len(mol.GetSubstructMatches(pat)))

No cheating by actually running the code! :) Feel free to explain your reasoning in the comments.

0
0%
38
0%
42
50%
50
50%
Poll ended at .

I just submitted a #cheminformatics preprint to ChemRxiv, based on the #RDKit count fingerprints, #chemfp, and some one-off R&D code I wrote over the last few months.

"Superimposed Coding of Count Fingerprints to Binary Fingerprints"

In short, my superimposed coding method gives k-recall@k nearest neighbor scores ~0.9 relative to using full count fingerprints and the multiset Tanimoto (aka MinMax, aka Ruzicka similarity). Recall can be over 0.95 w/ 8192 bits!

https://chemfp.com/SuperimposedCounts.pdf

#OpenBabel is dead, long live #RDKit!

https://github.com/RMeli/spyrmsd/issues/149

On a more serius note, it would be cool to have a cheminformatics library that actually works. Don't get me wrong, RDKit is very cool - but you can feel all the underlying problems it has when using it.

#Cheminformatics

Remove Open Babel support? · Issue #149 · RMeli/spyrmsd

Open Babel seems to have become abandonware. The last commit on master is from December 2024. The last release on GitHub is from 2020, and the same goes for the last release in PyPI. Open Babel is ...

GitHub

Hey, @egonw - I'm working on a preprint.

How do I cite a source code file in the #rdkit and a commit message? FWIW, I use #Zotero.

"The RDKit implementation [of the multiset Tanimoto] was added in 2009, using fuzzy set operations already available for multiset Dice similarity."

"added in 2009" is commit 104efc5b607baa54ce0804c6a76d484bf9f78b57 at https://github.com/rdkit/rdkit/commit/423433a3e47df64af4a31888e835144e8b3a6c07#diff-d7a0f684fa993bfd84319df4d23b199973d13599b94ad6a4b3a6c79ed7d46719

"fuzzy set operations" is a reference to the two operations starting at https://github.com/rdkit/rdkit/blob/af4e6c05eca09efa8e8f61603937e0d997fc1499/Code/DataStructs/SparseIntVect.h#L132

Or am I overthinking?

support Tversky similarity for SparseIntVects · rdkit/rdkit@423433a

The official sources for the RDKit library. Contribute to rdkit/rdkit development by creating an account on GitHub.

GitHub

#RDKit Atom Pair count fingerprints are wild! If you sum the per-record counts you'll see patterns like:

sum num_records
399 489
400 6
401 0
402 0
403 36
404 0
405 3
406 133423
407 9
408 6
409 42
410 6
411 0
412 36
...
493 83
494 1
495 3
496 116222
497 10
498 3
499 47
500 0

Not easy to plot! No doubt due to rings and chains causing highly repetitive path lengths. (406=2x7x29, 496=8x31)

Tonight I'm taking the train to Prague for the European edition of the 2025 #RDKit UGM.
I'm really looking forward to meeting a bunch of the community there!
We don't have space for any last-minute in-person registrations, but info on joining the live streams is here:
https://github.com/rdkit/UGM_2025/
GitHub - rdkit/UGM_2025: 2025 RDKit UGM

2025 RDKit UGM. Contribute to rdkit/UGM_2025 development by creating an account on GitHub.

GitHub