Mastodawn

Jasmijn Bastings Sep 26, 2023

Ian Tenney Sep 25, 2023

Thread: Excited to announce the v1.0 release of the Learning Interpretability Tool (🔥LIT), an interactive platform to debug, validate, and understand ML model behavior. This release brings exciting new features — including layouts, demos, and metrics — and a simplified Python API. https://pair-code.github.io/lit

(1/5)

Learning Interpretability Tool

Show thread

Jasmijn Bastings May 1, 2023

@mega @davidbau Check out our preprint for more details and analysis: https://arxiv.org/abs/2304.14767

This was a really fun project with @mega, Katja Filippova, Amir Globerson! #NLProc #NLP #XAI

Dissecting Recall of Factual Associations in Auto-Regressive Language Models

Transformer-based language models (LMs) are known to capture factual knowledge in their parameters. While previous work looked into where factual associations are stored, only little is known about how they are retrieved internally during inference. We investigate this question through the lens of information flow. Given a subject-relation query, we study how the model aggregates information about the subject and relation to predict the correct attribute. With interventions on attention edges, we first identify two critical points where information propagates to the prediction: one from the relation positions followed by another from the subject positions. Next, by analyzing the information at these points, we unveil a three-step internal mechanism for attribute extraction. First, the representation at the last-subject position goes through an enrichment process, driven by the early MLP sublayers, to encode many subject-related attributes. Second, information from the relation propagates to the prediction. Third, the prediction representation "queries" the enriched subject to extract the attribute. Perhaps surprisingly, this extraction is typically done via attention heads, which often encode subject-attribute mappings in their parameters. Overall, our findings introduce a comprehensive view of how factual associations are stored and extracted internally in LMs, facilitating future research on knowledge localization and editing.

arXiv.org

Show thread

Jasmijn Bastings May 1, 2023

@mega @davidbau Our study was inspired by works on knowledge tracing (Kevin Meng, @davidbau @peterbhase) and mechanistic interpretability (@kevrowan @AnthropicAI). It introduces an in-depth view of factual predictions and facilitates new research directions for knowledge localization & editing.

Show thread

Jasmijn Bastings May 1, 2023

@mega @davidbau Through per-layer gradient X input analysis (similar to @gsarti_ et al.) and “patching” experiments of early-layer representations, we further show the importance of the subject enrichment process for attribute extraction to happen.

Show thread

Jasmijn Bastings May 1, 2023

@mega @davidbau Further analysis of these heads in the embedding space (@guy__dar) shows that they often encode subject-attribute mappings in their parameters. Some attention heads act as “knowledge hubs” with hundreds of such encoded mappings.

Show thread

Jasmijn Bastings May 1, 2023

@mega @davidbau (B) information from the relation propagates to the prediction, and (C) the prediction representation “queries” the enriched subject to extract a specific attribute. Perhaps surprisingly, this extraction is typically done via attention heads.

Show thread

Jasmijn Bastings May 1, 2023

@mega @davidbau Analyzing the information at these critical points, we unveil a three-step internal mechanism for attribute extraction: (A) the representation at the last-subject position goes through an enrichment process driven by the early MLP layers, to encode many subject-related attributes

Show thread

Jasmijn Bastings May 1, 2023

@mega We prompt GPT-{2|J} with subject-relation queries (“Beats Music is owned by”) from CounterFact (@mengk20 @davidbau) and intervene on attention edges (similar to @hmohebbi75) to analyze how information is aggregated across layers and positions to predict the attribute (“Apple”).
This reveals two critical points where information propagates to the prediction: one from the relation positions ("is owned by", in the example) followed by another from the subject positions (“Beats Music”).

Jasmijn Bastings May 1, 2023

LMs capture many factual associations, but how do they recall them internally during inference? In a new preprint, we find that LMs build attribute-rich subject representations, from which attention heads extract the predicted attribute.
(with Mor Geva @mega, Katja Filippova, And Amir Globerson) 🧵 #NLP #NLProc

Jasmijn Bastings Apr 15, 2023

The research fairy Mar 26, 2023

Airbnb

Waterbnb

Firebnb

Earthbnb

Long ago, the four bed-and-breakfasts lived in harmony

Everything changed when the Firebnb attacked