I just submitted a #cheminformatics preprint to ChemRxiv, based on the #RDKit count fingerprints, #chemfp, and some one-off R&D code I wrote over the last few months.
"Superimposed Coding of Count Fingerprints to Binary Fingerprints"
In short, my superimposed coding method gives k-recall@k nearest neighbor scores ~0.9 relative to using full count fingerprints and the multiset Tanimoto (aka MinMax, aka Ruzicka similarity). Recall can be over 0.95 w/ 8192 bits!


