Kevin K. Yang 楊凱筌

1,061 Followers
423 Following
436 Posts
Senior Researcher in BioML at Microsoft Research New England. He/him/他. 🇹🇼
Websitehttps://yangkky.github.io/

Contrastive learning for enzyme class prediction improves accuracy, reliability, and sensitivity and identifies promiscuous activity that is then verified in the lab!

Tianhao Yu, @OceanHCui @luoyunan @HuiminZhaoLab

https://www.science.org/doi/10.1126/science.adf2465

RT @ml4proteins
Next week on 4/11 @ 4 pm EST, we'll have @NotinPascal talk about hybrid protein language models for fitness prediction!

https://www.ml4proteinengineering.com/apr-11

Apr-11 — ML Protein Engineering Seminar Series

ML Protein Engineering Seminar Series
When you live in Boston, you make your own taro mochi.

Using human-readible text as input and output is incredibly dumb and tedious
---
RT @Abebab
aii, let's do this... AI/ML version

what’s your critical hot take on AI/ML that would have you in this position
https://twitter.com/Abebab/status/1642550494265589761

Abeba Birhane on Twitter

“aii, let's do this... AI/ML version what’s your critical hot take on AI/ML that would have you in this position”

Twitter
Nobody's told my kid yet that he's gonna get replaced by stable diffusion

Machine learning to predict the binding energy between an enzyme and its ligand.

Carlos Ramírez-Palacios @CG_Martini

https://pubs.acs.org/doi/full/10.1021/acs.jctc.2c01227

Collaborator coming into town this week so of course I have a 103 fever

Finetune a masked language model to make edits that improve some function of a sequence in order to find sequences that are better than anything in the training set.

@vishakh_pk Richard Yuanzhe Pang @hhexiy @ank_parikh

https://arxiv.org/abs/2303.04562

Extrapolative Controlled Sequence Generation via Iterative Refinement

We study the problem of extrapolative controlled generation, i.e., generating sequences with attribute values beyond the range seen in training. This task is of significant importance in automated design, especially drug discovery, where the goal is to design novel proteins that are \textit{better} (e.g., more stable) than existing sequences. Thus, by definition, the target sequences and their attribute values are out of the training distribution, posing challenges to existing methods that aim to directly generate the target sequence. Instead, in this work, we propose Iterative Controlled Extrapolation (ICE) which iteratively makes local edits to a sequence to enable extrapolation. We train the model on synthetically generated sequence pairs that demonstrate small improvement in the attribute value. Results on one natural language task (sentiment analysis) and two protein engineering tasks (ACE2 stability and AAV fitness) show that ICE considerably outperforms state-of-the-art approaches despite its simplicity. Our code and models are available at: https://github.com/vishakhpk/iter-extrapolation.

arXiv.org

RT @ChelseaParlett
3 Lectures
118 Slides
♾ hours making graphics

A peek at the graphics I use to explain Recurrent Neural Networks and related topics😅

RT @MartinPacesa
Quite an interesting experiment to have AlphaFold fold the protein sequence sequentially. It's certainly no simulation of how proteins would fold but it's interesting to see what AF2 thinks the intermediates would be.