Mastodawn

Spent most of the week writing an EState count #cheminformatics fingerprint for #chemfp .

It should have been a few hours to build on RDKit's EState code. Perhaps a bit longer to implement a faster version using the same SMARTS patterns.

I then realized the RDKit implementation and patterns had problems, eg, not matching both atoms in "CC", and unexpected handling of explicit hydrogens, like in deuterated [2H]. See https://git.sr.ht/~dalke/rdkit/log

The hard part was finding good test cases.

Andrew Dalke Mar 10

SureChEMBL has the banner "By using the site you are agreeing to our Privacy Policy". It links to https://chembl.gitbook.io/surechembl/privacy-notices then https://chembl.gitbook.io/surechembl/privacy-notices/surechembl-website links then https://1396459327-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2Fty0JOfWwPnEbs5wW271w%2Fuploads%2F8dCXtuh9PSd4NKw4f7Cd%2FPrivacy%20Notice%20SureChEMBL%20website.pdf?alt=media&token=bc640502-e42a-4776-92e7-24ea3e871380

They use Google Analytics to track my visit.

I sent email to the #EMBL data controller via [email protected] to highlight issues with Google and the US, and asked they stop using Google Analytics, or clarify why they couldn't use one of https://european-alternatives.eu/alternatives-to .

You should email them too. #cheminformatics

Privacy notices | SureChEMBL

Find out what personal data is collected by the specific EMBL-EBI services you are requesting and for what purposes, as well as how your data are processed and kept secure.

Show thread

Egon Willighagen Mar 8

pyBacting 0.2.16 (with CDK 2.12) is now available at https://pypi.org/project/pybacting/

#python #openscience #cheminformatics

Client Challenge

Egon Willighagen Mar 8

CDK 2.12 blogpost https://chem-bla-ics.linkedchemistry.info/2026/03/08/cdk-2.12.html https://doi.org/10.59350/gw9at-srp84

Replies will show up in the blog

#cdk #java #cheminformatics #openscience

CDK 2.12

Version 2.12 of the Chemistry Development Kit has been released. It is the last release with contributions by our NWO Open Science grant. This release adds some nice new APIs:

chem-bla-ics

Egon Willighagen Mar 6

"LOTUS Wikidata Explorer" https://adafede.github.io/marimo/apps/lotus_wikidata_explorer.html (by @adafede)

With @wikidata, QLever, IDSM, the @cdk, and more

#openscience #cheminformatics

François Ferron 🇪🇺 🔷️🔶️Mar 4

The Coordinated Action “Diagnostic, Therapeutic and Vaccine Viral Targets” of #ANRS MIE is organising a #webinar on #AI for molecular discovery. This webinar will explore how AI and #computational #modelling are advancing #drugdesign and protein research.
Speakers will present approaches for molecular design, prediction of protein variant effects and dynamics, and structural modelling of protein–protein interactions. The session will also take a critical perspective, addressing current limitations in #cheminformatics and practical considerations for researchers.

🚨 MARK YOUR CALENDAR 🚨
⚠️ AI for molecular discovery: From drug design to protein dynamics⚠️
April 15th 2026
12:30 - 14:00
Wednesday April 15th 2026, from 12:30 to 14:00 — online (Zoom)

Programme :

1️⃣ "AI for drug design" — Dragos Horvath, Strasbourg University
2️⃣ "Computational approaches for protein variant effect and motion prediction" — Elodie Laine, Sorbonne University
3️⃣ "Structural modelling and binding affinity prediction of the Human PDZ-PBM interactome" — Victor Reys, Utrecht University

➡️ Registration : https://services.hosting.augure.com/Response/c7juk/%7B6f9b7a92-f72c-4db8-9259-d92aaa0f0cc3%7D

Webinar Registration "AI for Molecular Discovery: From Drug Design to Protein Dynamics" - CA Viral Targets ANRS MIE

The Chemistry Development Kit Mar 3

CDK 2.12 was released: https://doi.org/10.5281/zenodo.18850648

The release notes are here: https://github.com/cdk/cdk/releases/tag/cdk-2.12

One new feature is support for atropisomers, see the screenshot

#openscience #cheminformatics

Andrew Dalke Mar 3

Do you #cheminformatics folks know about count fingerprint use in machine learning?

My paper (in review) argues superimposed coding incorporates most advantages of count fps while using binary fps with Tanimoto.

I want to see if it's useful for ML. I'm not an ML person. The papers I've seen seem to only use binary fps like Morgan, but not count variants. They do use simple count descriptors (eg, num. N, or rings), so the underlying method supports counts.

Why aren't they using count fps?

Jeremy Monat Mar 3

New Blog Post: Prioritizing Drug-Like 💊 ChEMBL Compounds Within Target 🎯 Profiles

In this post, I go through how to use the #Python #ChEMBL #API and #SQLite to:
• Retrieve compound and target activity data programmatically
• Build a local database of molecules and their associated targets
• Rank compounds based on Lipinski Rule of Five violations

Read it at https://bertiewooster.github.io/2026/01/05/ChEBML-database.html. Marimo and Jupyter notebooks too!

#cheminformatics #drugDiscovery #chemistry #medChem #medicinalChemistry

Prioritizing Drug-Like ChEMBL Compounds Within Target Profiles

When reviewing data to find pharma compounds for virtual screening, we might want to check what their target profiles and rank candidates by how many Lipinski’s rule of five violations they have–the fewer the better. Here, a target profile refers to the set of targets a compound is known to be active against. This post uses the ChEMBL API and a SQLite database to do that.

Jeremy Monat, PhD

Andrew Dalke Feb 21

The #cheminformatics fingerprint tool #chemfp version 5.1b1 is out! https://chemfp.com/

The big feature is integration of the new "superimposed" count simulation method to RDKit byte fingerprint generation.

Use it if you want Tanimoto similarity of count fingerprints, but don't want to toss out all of your existing byte fingerprint tools for similarity search, clustering, etc. nor take a big performance loss.

Instead, use superimposed and get a ~0.95 recall using "normal" byte fps.

Privacy notices | SureChEMBL

Client Challenge

CDK 2.12

Webinar Registration "AI for Molecular Discovery: From Drug Design to Protein Dynamics" - CA Viral Targets ANRS MIE

Prioritizing Drug-Like ChEMBL Compounds Within Target Profiles

A fast and comprehensive Python package for cheminformatics fingerprints.