New paper 🚨 https://arxiv.org/abs/2211.09260

Can we train a single search system that satisfies our diverse information needs?

We present 𝕋𝔸ℝ𝕋 πŸ₯§ the first multi-task instruction-following retriever trained on 𝔹𝔼ℝℝ𝕀 🫐, a collections of 40 retrieval tasks with instructions! 1/N

#PaperThread #newpaper

Task-aware Retrieval with Instructions

We study the problem of retrieval with instructions, where users of a retrieval system explicitly describe their intent along with their queries, making the system task-aware. We aim to develop a general-purpose task-aware retrieval systems using multi-task instruction tuning that can follow human-written instructions to find the best documents for a given query. To this end, we introduce the first large-scale collection of approximately 40 retrieval datasets with instructions, and present TART, a multi-task retrieval system trained on the diverse retrieval tasks with instructions. TART shows strong capabilities to adapt to a new task via instructions and advances the state of the art on two zero-shot retrieval benchmarks, BEIR and LOTTE, outperforming models up to three times larger. We further introduce a new evaluation setup to better reflect real-world scenarios, pooling diverse documents and tasks. In this setup, TART significantly outperforms competitive baselines, further demonstrating the effectiveness of guiding retrieval with instructions.

arXiv.org
A user query can have diverse intents (e.g., retrieve relevant documents, find code implementation or similar questions asked in the forum previously).
We often build separate retrieval systems for different intents by training a retriever to model those implicit intents. 2/N

We advocate for a new task formulation, retrieval with Instructions, where a retriever takes a query AND an instruction that EXPLICITLY describes the information need.

The goal here is to build a single retriever that can find relevant documents satisfying the instruction. 3/N

Despite the rapid progress of instruction tuning for LLM, it's not explored in the retrieval area as retrieval tasks are not included in those instruction-tuning datasets.
We construct BERRI 🫐, a new large-scale collection of about 40 retrieval datasets with instructions. 4/N
After collecting and preprocessing many datasets in a unified format, we define some key aspects that should be clarified in instructions for retrieval tasks and newly annotate multiple instructions per dataset in BERRI. 5/N
Using BERRI 🫐, we conduct multi-task instruction tuning for retrieval, resulting in TART-full (a 1.5 billion powerful cross-encoder model) and TART-dual (a 110M efficient bi-encoder model) πŸ₯§. We also explore many training strategies and negative samples.
6/N
Our TART πŸ₯§ shows state-of-the-art performance on two zero-shot retrieval benchmarks, BEIR and LOTTE, demonstrating TART's strong ability to adapt to a new task via natural language instructions. We also found removing instructions at test or training time hurts performance.
7/N
We conduct a set of analyses to see what helps TART to learn to follow instructions for retrieval and found diversity of datasets (# of datasets, # of task categories, and # of domains), model size, and good negative samples help. See our paper for more details & examples!
8/N
This work is from my internship at Meta with amazing collaborators
@timo_schick @PSH_Lewis Xilun @gizacard @riedelcastro @scottyih
Feel free to contact me if you have any questions!
https://arxiv.org/abs/2211.09260
Task-aware Retrieval with Instructions

We study the problem of retrieval with instructions, where users of a retrieval system explicitly describe their intent along with their queries, making the system task-aware. We aim to develop a general-purpose task-aware retrieval systems using multi-task instruction tuning that can follow human-written instructions to find the best documents for a given query. To this end, we introduce the first large-scale collection of approximately 40 retrieval datasets with instructions, and present TART, a multi-task retrieval system trained on the diverse retrieval tasks with instructions. TART shows strong capabilities to adapt to a new task via instructions and advances the state of the art on two zero-shot retrieval benchmarks, BEIR and LOTTE, outperforming models up to three times larger. We further introduce a new evaluation setup to better reflect real-world scenarios, pooling diverse documents and tasks. In this setup, TART significantly outperforms competitive baselines, further demonstrating the effectiveness of guiding retrieval with instructions.

arXiv.org