Mastodawn

Sergey Ovchinnikov Mar 16, 2023

Kevin K. Yang 楊凱筌 Feb 27, 2023

Efficiently generate de novo proteins by
- optimizing residue logits for max AF confidence
- redesigning the sequence using ProteinMPNN
Tested in the lab, including CryoEM structures
@chrisfrank662 @AKhoshouei @sokrypton @hendrik_dietz

https://www.biorxiv.org/content/10.1101/2023.02.24.529906v1

Show thread

Sergey Ovchinnikov Jan 26, 2023

Nailed it! I think I'm ready to retire... 😅

Sergey Ovchinnikov Jan 14, 2023

The first interview done! Time to prep for the next. 😎

Show thread

Sergey Ovchinnikov Dec 23, 2022

@neuropunk
So now, during design, as soon as you get the desired structure, there is no longer any signal to update your sequence.

In this case, we wanted the structure to be fully encoded in the LM's contacts, and to avoid a situation where a more complex structure module starts hallucinating or improvising. (2/2)

Show thread

Sergey Ovchinnikov Dec 23, 2022

@neuropunk
It's good for the structure prediction task, you want the model to be robust and recognize the bare minimum signal from the input sequence. But not good for the design task.

Let's say you have a suboptimal sequence that only partly encodes the desired structure. If your model is "too good", it will fill in the rest of the structure. (1/2)

Show thread

Sergey Ovchinnikov Dec 23, 2022

One thing to keep in mind is that it's critical that this linear projection be as simple as possible. This is to avoid "connect the dots" phenomenon we saw with TrRosetta, where the sequence codes for some of the contacts but the rest of the layers fill in the remainder contacts. 🤔

Sergey Ovchinnikov Dec 23, 2022

Alright, first attempt at a tooter thread 😅

Show thread

Sergey Ovchinnikov Dec 23, 2022

check out the preprint:
https://www.biorxiv.org/content/10.1101/2022.12.21.521521v1

Thanks to all the amazing collaborators!
@robert_verkuil
@OriKabeli
@du_yilun
@BasileWicky
@LFMilles
@JustasDauparas
David Baker
@UWproteindesign
@TomSercu
@alexrives

Show thread

Sergey Ovchinnikov Dec 23, 2022

Instead, if you use language model, which models: P(sequence), and train an extra structure head on the attention maps, essentially modeling p(structure | sequence) and optimize both functions, you get working designs! LM loves them (lower perplexity) and alphafold does not hate them (pTM > 0.5). (5/5)

Show thread

Sergey Ovchinnikov Dec 23, 2022

For comparison, we also used ColabDesign's AfDesign protocol (protocol=fixbb), which only models p(xyz|seq). Not surprisingly, AF2 liked them (high pTM values), but LM did not (high perplexity values)... and most of these sequences also did not work in the lab (not soluble and/or monomeric species by size exclusion chromatography)... (4/5)