chemfp 5.1 is out, my package for #cheminformatics fingerprints

Most of the new features are related to "superimposed" binary coding of count fingerprints. These do a good job of approximating the count Tanimoto when using the binary Tanimoto.

- integrated "superimposed" into RDKit binary fp generation

- added:
- EState count&superimposed fps (RDKit and OEChem)
- Gobbi 2D pharmacophore count fps (RDKit)
- LINGO count&superimposed fps from SMILES strings

More at https://chemfp.com/docs/whats_new_in_51.html .

What’s new in chemfp 5.1 — chemfp documentation 5.1 documentation

#SWAT4HCLS is over.

Today, I am at the #KNCV CTC meeting, where we are launching the #Cheminformatics and #DrugDiscovery working group, as part of the CTC Section

JH shows recent work where they use LLMs together with chemical similarity knowledge to do ChEBI ontology classification, the chebifier: https://chebifier.hastingslab.org/ and https://doi.org/10.1039/D3DD00238A

#swat4hcls #cheminformatics

Two papers for the ICCS Collection in the Journal of Cheminformatics have been tagged as accepted now. With another 4 under review, and at least one more paper to be submitted. The deadline was extended one last time.

But we are looking forward to a nice collection of work presented at the ICCS 2025!

#cheminformatics #noordwijkerhout #ICCS2025

OPSIN 2.9.0 has been released: https://chembl.blogspot.com/2026/03/opsin-v290-released.html

OPSIN is an IUPAC name parser that returns SMILES

"The release notes describe a mixture of minor bug fixes and improvements:"

#smiles #chemistry #iupac #cheminformatics #openscience

OPSIN v2.9.0 released

Just a quick note to say that Daniel Lowe has released OPSIN v.2.9.0 , the first release since Oct 2023. This is now available via the EMBL-...

Here's an #RDKit #cheminformatics quiz for you all. What do you think this code will output?

from rdkit import Chem
mol = Chem.MolFromSmiles("C" + "C(C)(C)" * 50 + "C")
pat = Chem.MolFromSmarts("[$([CD4H0X4](-*)(-*)(-*)-*)]")
print(len(mol.GetSubstructMatches(pat)))

No cheating by actually running the code! :) Feel free to explain your reasoning in the comments.

0
0%
38
0%
42
50%
50
50%
Poll ended at .

Spent most of the week writing an EState count #cheminformatics fingerprint for #chemfp .

It should have been a few hours to build on RDKit's EState code. Perhaps a bit longer to implement a faster version using the same SMARTS patterns.

I then realized the RDKit implementation and patterns had problems, eg, not matching both atoms in "CC", and unexpected handling of explicit hydrogens, like in deuterated [2H]. See https://git.sr.ht/~dalke/rdkit/log

The hard part was finding good test cases.

SureChEMBL has the banner "By using the site you are agreeing to our Privacy Policy". It links to https://chembl.gitbook.io/surechembl/privacy-notices then https://chembl.gitbook.io/surechembl/privacy-notices/surechembl-website links then https://1396459327-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2Fty0JOfWwPnEbs5wW271w%2Fuploads%2F8dCXtuh9PSd4NKw4f7Cd%2FPrivacy%20Notice%20SureChEMBL%20website.pdf?alt=media&token=bc640502-e42a-4776-92e7-24ea3e871380

They use Google Analytics to track my visit.

I sent email to the #EMBL data controller via [email protected] to highlight issues with Google and the US, and asked they stop using Google Analytics, or clarify why they couldn't use one of https://european-alternatives.eu/alternatives-to .

You should email them too. #cheminformatics

Privacy notices | SureChEMBL

Find out what personal data is collected by the specific EMBL-EBI services you are requesting and for what purposes, as well as how your data are processed and kept secure.

pyBacting 0.2.16 (with CDK 2.12) is now available at https://pypi.org/project/pybacting/

#python #openscience #cheminformatics

Client Challenge