New pre-print!

Plug and play generation works for images and text…what about proteins?

We engineer proteins by combining your favorite unsupervised and supervised protein sequence models (even protein language models!) in a fast *gradient-based* discrete MCMC sampler.

🧵

Combining multiple models in sequence space is straightforward if we treat each as one expert in a product of experts, like in energy based models.

"But isn't directed evolution just doing brute force or random search for mutations?” 🤔

2/

Not necessarily! We use gradients to craft an efficient proposal distribution for sampling from high-dimensional and discrete product of experts.

For example, this enables us to do things like maximize the sum of two binary MNIST digits just by flipping binary pixels:

3/

Our in-silico experiments on proteins span a variety of unsupervised evolutionary sequence models like Potts and ESM2 (35M/150M/650M).

Our results suggest our sampler has the practicality of simple black-box algorithms while outperforming brute force and random search.

webpage: https://pemami4911.github.io/blog/2023/01/05/ppde.html
paper: https://arxiv.org/abs/2212.09925
colab: https://colab.research.google.com/drive/1s3heukQga1ShfxrAMRxNtZFfSwu_D_m7?usp=sharing

end/

Plug & Play Directed Evolution for Proteins with Gradient-based Discrete MCMC