David Bau

@davidbau@sigmoid.social
268 Followers
75 Following
21 Posts
Computer Science Professor at Northeastern. Believes AI should be transparent. http://baulab.info
Lab websitehttps://baulab.info/
Personal bloghttp://davidbau.com/

My student Rohit Gandikota is finding that large diffusion models like SDXL can be controlled very precisely with low-rank changes in parameters.

His twitter thread on "Concept Sliders" is also a great survey of current diffusion model controllability work. The nice thing about modern image synthesis is that the results are gorgeous.

Help him spread the word!

https://x.com/RohitGandikota/status/1727410973638852957?s=20

https://sliders.baulab.info/

Rohit Gandikota (@RohitGandikota) on X

Have you ever wanted to make a precise change when generating images with diffusion models?🎨 We present Concept Sliders, which enable smooth control to create your vision, fix common problems, and a "fix hands" sliders too. Here's an explainer on how sliders work🧡

X (formerly Twitter)

My student Koyena Pal will be presenting some cool work at #CoNLL2023.

Her Future Lens (https://future.baulab.info) can look a single hidden state of an LLM and see what a transformer is planning, several tokens ahead.

It can make reading the internal states much more intuitive!

Look for us if you'll be in Singapore for EMNLP/BlackboxNLP/CoNLL.

Work with Jiuding Sun, Andrew Yuan, and Byron Wallace.

https://future.baulab.info

https://twitter.com/kpal_koyena/status/1723026282781581380?s=20

Future Lens

We conjecture that hidden state vectors corresponding to individual input tokens encode information sufficient to accurately predict several tokens ahead.

@mega @davidbau Check out our preprint for more details and analysis: https://arxiv.org/abs/2304.14767

This was a really fun project with @mega, Katja Filippova, Amir Globerson! #NLProc #NLP #XAI
Dissecting Recall of Factual Associations in Auto-Regressive Language Models

Transformer-based language models (LMs) are known to capture factual knowledge in their parameters. While previous work looked into where factual associations are stored, only little is known about how they are retrieved internally during inference. We investigate this question through the lens of information flow. Given a subject-relation query, we study how the model aggregates information about the subject and relation to predict the correct attribute. With interventions on attention edges, we first identify two critical points where information propagates to the prediction: one from the relation positions followed by another from the subject positions. Next, by analyzing the information at these points, we unveil a three-step internal mechanism for attribute extraction. First, the representation at the last-subject position goes through an enrichment process, driven by the early MLP sublayers, to encode many subject-related attributes. Second, information from the relation propagates to the prediction. Third, the prediction representation "queries" the enriched subject to extract the attribute. Perhaps surprisingly, this extraction is typically done via attention heads, which often encode subject-attribute mappings in their parameters. Overall, our findings introduce a comprehensive view of how factual associations are stored and extracted internally in LMs, facilitating future research on knowledge localization and editing.

arXiv.org

Eric Todd has a really interesting new preprint on arXiv:

https://functions.baulab.info

He shows LLMs contain vector representations of functions that compose and apply in diverse contexts.

It could be a powerful tool for understanding reasoning mechanisms within LLMs.

Eric explains more on a twitter thread here

https://x.com/ericwtodd/status/1717277426873766104?s=20

It's his paper of his PhD. Help him spread the word!

Function Vectors in Large Language Models

LLMs have an embedding space for functions that emerge from in-context learning.

I want to show the NSF there would be broad support+utility for a "National Deep Inference" service for >100b LLMs.

If your research would be enabled by an inference service on open LLMs w API access+overrides to internal activations, params, gradients: please boost this thread!

(I'm also gathering feedback on twitter - more details here:)

https://twitter.com/davidbau/status/1605609105824964611

David Bau on Twitter

β€œI want to show the NSF there would be broad support+utility for a "National Deep Inference" service for >100b LLMs. If your research would be enabled by an inference service on open LLMs w API access+overrides to internal activations, params, gradients: Please Like this thread!”

Twitter

Had a great time meeting people at #BlackboxNLP and #EMNLP2022 last week. You can see my talk as well as great talks by Lena Voita and Catherine Olsson on youtube here:

https://youtube.com/@blackboxnlp

BlackboxNLP

Teile deine Videos mit Freunden, Verwandten oder der ganzen Welt

YouTube

RT @mengk20@twitter.com

How & where do large language models (LLMs) like GPT store knowledge? Can we surgically write *new* facts into them, just like we write records into databases?

Explainer 🧡 on how interpretability & model editing go hand-in-hand, and why these emerging areas are so important πŸ‘‡

πŸ¦πŸ”—: https://twitter.com/mengk20/status/1588581237345595394

Kevin Meng on Twitter

β€œHow & where do large language models (LLMs) like GPT store knowledge? Can we surgically write *new* facts into them, just like we write records into databases? Explainer 🧡 on how interpretability & model editing go hand-in-hand, and why these emerging areas are so important πŸ‘‡β€

Twitter

I will be at #Neurips2022 this week!

Reach out to me to connect+chat about mechanisms in LLMs, AI alignment, knowledge, privacy, image synthesis, interpretability.

And Wed PM (Session 4 Hall J #226) find Kevin's poster on editing GPT

https://neurips.cc/virtual/2022/poster/53864

https://twitter.com/mengk20/status/1588581237345595394

Our paper visualization for @NeuripsConf '22 is back. It only shows this year's papers but I am working on the multi-year browser later this week. For now just some insights. ...

(1) Graph island:

I have put together a curated list of about 50 papers that (IMO) were seminal in defining what "Deep Learning" is today, to share with my students. Goes from 1943-2022.

https://papers.baulab.info/00_README.html

Question for the AI fediverse: What is missing off this list that should be on it?

Famous Deep Learning Papers

A survey of greatest hits in deep learning research.