Mastodawn

Akari Asai

@[email protected]

171 Followers

85 Following

30 Posts

NLP Ph.D. student at University of Washignton Computer Science and Engineering.

IBM PhD fellow (2022-).

Currently:
UW NLP & Part-time at Meta AI.

Formely:
EECS undergrad at UTokyo
2x Engineering internships at Google
Research internships at Microsoft Research Asia, Megagon lab and salesforce research

In my free time, I 🏃🏻‍♀️👩🏻‍🍳🥧🧗🏻‍♀️📖👩🏻‍💻

Website	https://akariasai.github.io/
Twitter	@AkariAsai
Location	Seattle, WA
Pronouns	she/her

Show thread

Akari Asai Dec 21, 2022

@soerenarlt
Thanks a lot! We found that in some relationship types (e.g., country, nationality .. etc) LMs often exploit surface-level cues (e.g., entity names; for example, a person named Akari is likely from Japan).

As a result, even a model doesn't really know the answers, it can still answer correctly, which weakens the correlations.
More discussions can be seen in the Sec 3.2, "Subject entity popularity predicts memorization" paragraph!

Show thread

Akari Asai Dec 21, 2022

@jacobeisenstein
Thank you so much for the feedback!!
Yes, we totally agree that retrieval-augmentation is quite effective and address many issues of relying on LMs trained on static text. We tried to put many findings from the paper to a single post, which may make the post misleading...

Regarding the calibration, we didn't try other methods and focus on the simple popularity-based aproach as the first step. We're interested in trying more sophistecated (e.g., learned) approaches though!

Show thread

Akari Asai Dec 20, 2022

More interesting results & discussions in our paper!
📝 https://tinyurl.com/2sdeuupn
‍💻 https://github.com/AlexTMallen/adaptive-retrieval
Work done by
Alex (a junior undergrad at UW!)
@AkariAsai
@v
Rajarshi Das
Hanna Hajishirzi
Daniel Khashabi

Show thread

Akari Asai Dec 20, 2022

Adaptive Retrieval decides when *not* to retrieve based on the subject popularity & relationship type, This approach not only gives performance improvements (up to 5%) but also largely reduces the inference time latency & API costs (e.g., halves GPT-3 API costs!). [9/N]

Show thread

Akari Asai Dec 20, 2022

In summary, LLMs indeed memorize a lot now, but they are still not good enough to completely replace non-parametric memories, esp domains with long-tail distributions. Can we get the best of both worlds? We introduce a simple-yet-effective Adaptive Retrieval [8/N]

Show thread

Akari Asai Dec 20, 2022

We found that retrieval-augmented LMs (red & green lines) are particularly helpful for questions about less popular entities, where LMs suffer. On the contrary, larger models (eg GPT-3) even outperform retrieval-augmented models in well-known facts, due to retrieval errors. [7/N]

Show thread

Akari Asai Dec 20, 2022

We show that augmenting LMs with non-parametric memories (retrieved text chunks) largely helps: GPT-Neo 1.3B assisted by retrieved context outperforms vanilla GPT-3 003! Even for GPT-3, retrieval gives up to 10% accuracy gains. Why are they so effective? [6/N]

Show thread

Akari Asai Dec 20, 2022

We found that there are strong correlations between subject entity popularity and accuracy, indicating that LLMs memorize well popular factual knowledge while it does not memorize less popular ones. [4/N]

Show thread

Akari Asai Dec 20, 2022

Surprisingly, in long-tail distributions, scaling LLMs may not be as helpful as we believed: GPT-3 003 performs nearly as poorly as GPT-Neo 2B on less popular entities 🟦
Prior knowledge learning analysis often uses NQ/TriviaQA 🟥 may inflate the effectiveness of scaling [5/N]

Show thread

Akari Asai Dec 20, 2022

To answer these questions, we construct a new large open-domain QA dataset, PopQA, whose questions are grounded in Wikidata and are sampled from long-tail popularity distributions of Wikipedia to enable fine-grained analysis. We then test 10 LLMs in a zero/few-shot manner [3/N]