Yonatan Belinkov

47 Followers
45 Following
7 Posts
Assistant professor of computer science
@TechnionLive
. #NLProc #NLP
Websitehttps://www.cs.technion.ac.il/~belinkov/
Excited to be involved in organizing Blackbox next year with Sophie Hao, @jaapjumelet, @hmohebbi, @arya and @boknilev!
Transformers learn in-context by gradient descent
abs: https://arxiv.org/abs/2212.07677
Transformers learn in-context by gradient descent

At present, the mechanisms of in-context learning in Transformers are not well understood and remain mostly an intuition. In this paper, we suggest that training Transformers on auto-regressive objectives is closely related to gradient-based meta-learning formulations. We start by providing a simple weight construction that shows the equivalence of data transformations induced by 1) a single linear self-attention layer and by 2) gradient-descent (GD) on a regression loss. Motivated by that construction, we show empirically that when training self-attention-only Transformers on simple regression tasks either the models learned by GD and Transformers show great similarity or, remarkably, the weights found by optimization match the construction. Thus we show how trained Transformers become mesa-optimizers i.e. learn models by gradient descent in their forward pass. This allows us, at least in the domain of regression problems, to mechanistically understand the inner workings of in-context learning in optimized Transformers. Building on this insight, we furthermore identify how Transformers surpass the performance of plain gradient descent by learning an iterative curvature correction and learn linear models on deep data representations to solve non-linear regression tasks. Finally, we discuss intriguing parallels to a mechanism identified to be crucial for in-context learning termed induction-head (Olsson et al., 2022) and show how it could be understood as a specific case of in-context learning by gradient descent learning within Transformers. Code to reproduce the experiments can be found at https://github.com/google-research/self-organising-systems/tree/master/transformers_learn_icl_by_gd .

arXiv.org

Hi there! This is ACM, the world's largest computing society. As you might have noticed, we have opened not only our official #Mastodon account but also our own #instance!

Please consider joining @mastodon.acm.org, a community for #computing researchers & practitioners to connect & exchange ideas with each other, whether you are an ACM member or not.

You can use this link to join: https://mastodon.acm.org/invite/FbXaxAHg

Spread the word with your friends and colleagues! Happy tooting!

#introduction

Mastodon

@ACM invites you to join this server of Mastodon! With an account, you will be able to follow people, post updates and exchange messages with users from any Mastodon server and more!

Mastodon hosted on mastodon.acm.org

➡️ Article published:

Part-of-Speech and Morphological Tagging of Algerian Judeo-Arabic

-- Ofra Tirosh-Becker, Michal Kessler, Oren Becker, Yonatan Belinkov

https://nejlt.ep.liu.se/article/view/4315

#nlp #nlproc #newpaper

Part-of-Speech and Morphological Tagging of Algerian Judeo-Arabic | Northern European Journal of Language Technology

➤ Findings paper at the #EMNLP Blackbox NLP Workshop (https://blackboxnlp.github.io/):

3) Calibrating Trust of Multi-Hop Question Answering Systems with Decompositional Probes
https://arxiv.org/abs/2204.07693
With Kaige Xie and Sarah Wiegreffe

This paper looks at new technique for XAI that helps people determine when a question-answering system might be giving the wrong answer when they themselves might not know the answer

BlackboxNLP 2023

Workshop on analyzing and interpreting neural networks for NLP

Analyzing and interpreting neural networks for NLP
#blackboxNLP is happening at #emnlp2022 this Thursday Dec 8, join us at Abu Dhabi or virtually.
And please use the rocket chat channel to communicate:
https://emnlp2022.rocket.chat/channel/workshop-14-blackboxnlp
EMNLP2022

@thegradient @anamarasovic I’m interested in making sure that this instance can support the visibility of early-career scientists. I think people should be encouraged to share their papers with short threads—call them Tootorials—I was intending to make that suggestion Monday and try to kick it off.