Joint work of Nico Daheim,¹ Nouha Dziri,² Mrinmaya Sachan,³ Iryna Gurevych¹ and Edoardo Ponti.⁴
________________
¹ Ubiquitous Knowledge Processing Lab, Computer Science Department, TU Darmstadt, hessian.AI
² Allen Institute for AI (AI2)
³ ETH Zürich
⁴ The University of Edinburgh

See you in Mexico City 🇲🇽 at #NAACL2024! (9/9)

(9/9) #NAACL2024

⚖️ Trade-off between faithfulness and abstractiveness
📈 Results on further tasks, such as FaithDial
🧑‍⚖️ Human Evaluation

For further results, check our paper and code!

📄 Paper: https://arxiv.org/abs/2303.17574
💻 Code: https://github.com/UKPLab/naacl2024-ewr

(8/🧵) #NAACL2024

Elastic Weight Removal for Faithful and Abstractive Dialogue Generation

Ideally, dialogue systems should generate responses that are faithful to the knowledge contained in relevant documents. However, many models generate hallucinated responses instead that contradict it or contain unverifiable information. To mitigate such undesirable behaviour, it has been proposed to fine-tune a `negative expert' on negative examples and subtract its parameters from those of a pre-trained model. However, intuitively, this does not take into account that some parameters are more responsible than others in causing hallucinations. Thus, we propose to weigh their individual importance via (an approximation of) the Fisher Information matrix, which measures the uncertainty of their estimate. We call this method Elastic Weight Removal (EWR). We evaluate our method -- using different variants of Flan-T5 as a backbone language model -- on multiple datasets for information-seeking dialogue generation and compare our method with state-of-the-art techniques for faithfulness, such as CTRL, Quark, DExperts, and Noisy Channel reranking. Extensive automatic and human evaluation shows that EWR systematically increases faithfulness at minor costs in terms of other metrics. However, we notice that only discouraging hallucinations may increase extractiveness, i.e. shallow copy-pasting of document spans, which can be undesirable. Hence, as a second main contribution, we show that our method can be extended to simultaneously discourage hallucinations and extractive responses. We publicly release the code for reproducing EWR and all baselines.

arXiv.org

Adding the abstractiveness expert can improve the baseline in terms of both faithfulness and abstractiveness (gray region on the chart)

(7/🧵) #NAACL2024

Subtracting the hallucination mitigation expert trained on hallucinated examples removes these examples from training data

This drastically reduces hallucinations 📉
And it does so better than other methods 🏆

(6/🧵) #NAACL2024

We improve on scalar weighting!
How? By weighting each model according to their Fisher Information Matrix ⚖️

It provides a parameter-specific scaling that can better isolate parameters responsible for hallucinations and abstractiveness.

(5/🧵) #NAACL2024

We train 2 experts :
💠 a hallucination mitigation expert to discourage hallucinations
💠 an abstractiveness expert to encourage naturalness

(4/🧵) #NAACL2024

Adding them to the model weights promotes the behavior of the finetuned model ✅
Subtracting them discourage that behavior ❌

(3/🧵) #NAACL2024

Our method builds on Task Arithmetic 🏗️

Task vectors impact the model behavior. They are the difference between the model weights after and before fine-tuning on a task.

(2/🧵) #NAACL2024

Dialog models often hallucinate 😵‍💫
➕ Knowledge grounding can help
➖ but the responses become less natural

Can we reduce hallucinations AND keep naturalness?
Yes 🚀 With Elastic Weight Removal (EWR)!

Learn more about our #NAACL2024 paper 🧵 (1/9)

📃 https://arxiv.org/abs/2303.17574

Elastic Weight Removal for Faithful and Abstractive Dialogue Generation

Ideally, dialogue systems should generate responses that are faithful to the knowledge contained in relevant documents. However, many models generate hallucinated responses instead that contradict it or contain unverifiable information. To mitigate such undesirable behaviour, it has been proposed to fine-tune a `negative expert' on negative examples and subtract its parameters from those of a pre-trained model. However, intuitively, this does not take into account that some parameters are more responsible than others in causing hallucinations. Thus, we propose to weigh their individual importance via (an approximation of) the Fisher Information matrix, which measures the uncertainty of their estimate. We call this method Elastic Weight Removal (EWR). We evaluate our method -- using different variants of Flan-T5 as a backbone language model -- on multiple datasets for information-seeking dialogue generation and compare our method with state-of-the-art techniques for faithfulness, such as CTRL, Quark, DExperts, and Noisy Channel reranking. Extensive automatic and human evaluation shows that EWR systematically increases faithfulness at minor costs in terms of other metrics. However, we notice that only discouraging hallucinations may increase extractiveness, i.e. shallow copy-pasting of document spans, which can be undesirable. Hence, as a second main contribution, we show that our method can be extended to simultaneously discourage hallucinations and extractive responses. We publicly release the code for reproducing EWR and all baselines.

arXiv.org

Joint work of Chen Cecilia Liu, Jonas Pfeiffer (Google DeepMind), Ivan Vulić (Language Technology Lab, University of Cambridge) and Iryna Gurevych (@UKPLab)

We look forward to seeing you in 🇲🇽!

(8/8) #NAACL2024 #NLProc