I've been gazing at per-residue #embeddings from the #protT5 model and it gives me the feeling of having a secret tome of insight but it's in a lost language that we need to decipher; and while large parts of the tome are likely nonsense there are certainly some interesting things lurking in there. Just a simple matrix multiplication of one protein (DSB-1) with itself shows some intriguing patterns:
I calculated the embeddings for each possible Alanine/Glycine substitution (X->A and A->G) of DSB-1 and then asked for each position, which other residues are most affected. Again the key phosphoresidue S186 shows up brightly (very highly changed from its original value), and there is suggestion of something going on in the disordered domain (circled). Does anyone working with #proteinEmbeddings or #proteinLanguageModels have any pointers for me?