I am still finding interesting things to do with protein #embeddings — now coloring #AlphaFold models by the 3-component UMAP reduction of the per-residue embedding. Look at those bright Phe residues in the disordered region — somehow #protT5 encodes them as "different" than the others.

In my experience so far these protein language models are uncannily able to highlight the same regions of the protein that I'm interested in already.

(#ChimeraX, using a sequence coloring format file)

I've been gazing at per-residue #embeddings from the #protT5 model and it gives me the feeling of having a secret tome of insight but it's in a lost language that we need to decipher; and while large parts of the tome are likely nonsense there are certainly some interesting things lurking in there. Just a simple matrix multiplication of one protein (DSB-1) with itself shows some intriguing patterns: