Mastodawn

Sergey Ovchinnikov Dec 23, 2022

Instead, if you use language model, which models: P(sequence), and train an extra structure head on the attention maps, essentially modeling p(structure | sequence) and optimize both functions, you get working designs! LM loves them (lower perplexity) and alphafold does not hate them (pTM > 0.5). (5/5)

Show thread

Sergey Ovchinnikov Dec 23, 2022

For comparison, we also used ColabDesign's AfDesign protocol (protocol=fixbb), which only models p(xyz|seq). Not surprisingly, AF2 liked them (high pTM values), but LM did not (high perplexity values)... and most of these sequences also did not work in the lab (not soluble and/or monomeric species by size exclusion chromatography)... (4/5)

Show thread

Sergey Ovchinnikov Dec 23, 2022

Now we can invert this model to find a sequence that matches a given backbone. (In this case, denovo designed backbones were selected, and any sequences remotely similar to the designed sequences were purged from the LM training set.)

Given Bayes' theorem, by optimizing both the p(xyz|seq) and p(seq), we are also optimizing p(seq|xyz), since p(xyz) is constant. (3/5)

Show thread

Sergey Ovchinnikov Dec 23, 2022

Given the observation that attention maps in the LMs correspond to contacts. One can train a linear projection from the attention maps to a distogram, allowing the modeling of P(structure | sequence). (2/5)

Papers showing LMs learn contacts:
https://arxiv.org/abs/2006.15222
https://www.biorxiv.org/content/10.1101/2020.12.15.422761v1
https://www.biorxiv.org/content/10.1101/2020.12.21.423882v2

BERTology Meets Biology: Interpreting Attention in Protein Language Models

Transformer architectures have proven to learn useful representations for protein classification and generation tasks. However, these representations present challenges in interpretability. In this work, we demonstrate a set of methods for analyzing protein Transformer models through the lens of attention. We show that attention: (1) captures the folding structure of proteins, connecting amino acids that are far apart in the underlying sequence, but spatially close in the three-dimensional structure, (2) targets binding sites, a key functional component of proteins, and (3) focuses on progressively more complex biophysical properties with increasing layer depth. We find this behavior to be consistent across three Transformer architectures (BERT, ALBERT, XLNet) and two distinct protein datasets. We also present a three-dimensional visualization of the interaction between attention and protein structure. Code for visualization and analysis is available at https://github.com/salesforce/provis.

arXiv.org

Sergey Ovchinnikov Dec 23, 2022

One issue with using methods like AlphaFold or RoseTTAFold for design is that they were trained to model P(structure|sequence). They were only trained on valid sequences. So if you use AF for design, you are likely to find adversarial sequences that trick the model. So you need a way to model the validity of the sequence or the P(sequence). Enter protein language models! (1/5)