Mastodawn

Boyang "Albert" Li Dec 20, 2022

akhaliq Dec 20, 2022

DSI++: Updating Transformer Memory with New Documents
abs: https://arxiv.org/abs/2212.09744

DSI++: Updating Transformer Memory with New Documents

Differentiable Search Indices (DSIs) encode a corpus of documents in model parameters and use the same model to answer user queries directly. Despite the strong performance of DSI models, deploying them in situations where the corpus changes over time is computationally expensive because reindexing the corpus requires re-training the model. In this work, we introduce DSI++, a continual learning challenge for DSI to incrementally index new documents while being able to answer queries related to both previously and newly indexed documents. Across different model scales and document identifier representations, we show that continual indexing of new documents leads to considerable forgetting of previously indexed documents. We also hypothesize and verify that the model experiences forgetting events during training, leading to unstable learning. To mitigate these issues, we investigate two approaches. The first focuses on modifying the training dynamics. Flatter minima implicitly alleviate forgetting, so we optimize for flatter loss basins and show that the model stably memorizes more documents ($+12\%$). Next, we introduce a generative memory to sample pseudo-queries for documents and supplement them during continual indexing to prevent forgetting for the retrieval task. Extensive experiments on novel continual indexing benchmarks based on Natural Questions (NQ) and MS MARCO demonstrate that our proposed solution mitigates forgetting significantly. Concretely, it improves the average Hits@10 by $+21.1\%$ over competitive baselines for NQ and requires $6$ times fewer model updates compared to re-training the DSI model for incrementally indexing five corpora in a sequence.

arXiv.org

Boyang "Albert" Li Dec 8, 2022

Second day at #EMNLP2022. My personal favorite of the day: Razvan Pascanu's keynote at the Multilingual Representation Learning Workshop. Some ideas here will inform the next decade, I believe. #EMNLP2022livetweet

Boyang "Albert" Li Dec 7, 2022

I will be attending EMNLP at Abu Dhabi until next Monday. My personal favorite for the first day is Timo Schick's keynote at the GEM workshop on the paper "PEER: A Collaborative Language Model". https://arxiv.org/abs/2208.11663

PEER: A Collaborative Language Model

Textual content is often the output of a collaborative writing process: We start with an initial draft, ask for suggestions, and repeatedly make changes. Agnostic of this process, today's language models are trained to generate only the final result. As a consequence, they lack several abilities crucial for collaborative writing: They are unable to update existing texts, difficult to control and incapable of verbally planning or explaining their actions. To address these shortcomings, we introduce PEER, a collaborative language model that is trained to imitate the entire writing process itself: PEER can write drafts, add suggestions, propose edits and provide explanations for its actions. Crucially, we train multiple instances of PEER able to infill various parts of the writing process, enabling the use of self-training techniques for increasing the quality, amount and diversity of training data. This unlocks PEER's full potential by making it applicable in domains for which no edit histories are available and improving its ability to follow instructions, to write useful comments, and to explain its actions. We show that PEER achieves strong performance across various domains and editing tasks.

arXiv.org

Show thread

Boyang "Albert" Li Dec 7, 2022

The development of AI is full of surprises, with pendulum swings between what is considered easy and hard. Initially people thought symbolic reasoning was hard, so they focused on that. After that, people thought pattern recognition was hard. With extraordinarily large models, it seems many (not every) pattern problems are solved.

Perhaps neither was hard by itself, but combining the two is really, really hard.

Boyang "Albert" Li Dec 7, 2022

https://maximumeffort.substack.com/p/i-taught-chatgpt-to-invent-a-language An interesting read, but I'd say it's much less successful than the author would like to believe.

Basically, like other LLM, ChatGPT can capture many, many patterns, both short-term and long-term. This is an incredible achievement, but it still has some significant shortcomings, especially in the areas of symbolic reasoning and combinatorial generalization.

It's important that we distinguish between pattern matching and symbolic reasoning.

I Taught ChatGPT to Invent a Language

In which ChatGPT and I invent a fictional language spoken by slime-people

Maximum Effort, Minimum Reward

Show thread

Boyang "Albert" Li Dec 7, 2022

@simon Judging by the number of "No it's not quite right", it's hard to see if ChatGPT understands the underlying principles or is just randomly guessing.

Show thread

Boyang "Albert" Li Dec 4, 2022

@TedUnderwood How about asking the student to generate a response from ChatGPT first and criticize it? Is that meta enough? 😂 (Disclaimer: I haven't tried asking ChatGPT to do that.)

Show thread

Boyang "Albert" Li Nov 19, 2022

@Chunyuan 😂 reality is stranger than fiction.

Boyang "Albert" Li Nov 19, 2022

Our paper: Here is a dataset where human performance is still much much higher than machines. Room to improve.

Reviewer: Not a fair comparison. Humans are trained with more data. Reject. #AAAI2023 #RealStory

Boyang "Albert" Li Nov 19, 2022

Don Kosak Nov 18, 2022

#arXiv announced today a collaboration with #HuggingFace for #ML related papers.

A new "Demo" tab on the arXiv site links to HuggingFace Spaces to show working #MachineLearning demonstrations made by the authors or community.

https://blog.arxiv.org/2022/11/17/discover-state-of-the-art-machine-learning-demos-on-arxiv/

DSI++: Updating Transformer Memory with New Documents

PEER: A Collaborative Language Model

I Taught ChatGPT to Invent a Language

Discover State-of-the-Art Machine Learning Demos on arXiv – arXiv blog