▪️ 7 papers authored or co-authored by UKP at this year's #EACL2024

▪️ 5 papers authored or co-authored by UKP at this year's #NAACL

▪️ At the #ACL2024NLP in Bangkok, Iryna Gurevych held a Keynote and the UKP Lab is part of two outstanding paper awards. Congratulations to the authors Indraneil Paul, Goran Glavaš, Jan-Christoph Klie, Rahul N., Juan Haladjian, Marc Kirchner, and Iryna Gurevych!

✨ As 2024 is coming to its end, we are happy to review some of the achievements of the UKP Lab ✨

▪️ We have established two independent research groups (AI and NLP for Mental Health, led by Shaoxiong Ji, and NLP for Expert Domains, led by Simone Balloccu.

▪️ At this Year’s #EMNLP2024 we presented 13 papers, including 11 in the Main track and 2 in the Findings track

▪️ 11 papers authored or co-authored by UKP members have been accepted for publication at this year's #ACL2024NLP in Bangkok 🇹🇭!

My #acl2024nlp Presidential Address is now publicly available. If you saw the slides & discussion of them in August, especially, please have a listen to the actual talk. It starts at about 43'50" in this video:

https://underline.io/events/466/sessions/18203/lecture/104931-test-of-time-awards-lifetime-achievement-awards-presidential-address

Watch lectures from the best researchers.

On-demand video platform giving you access to lectures from conferences worldwide.

Underline.io

»[O]ur results do not mean that AI is not a threat at all« emphasized Iryna Gurevych. »[But future research should] focus on other risks posed by the models, such as their potential to be used to generate fake news.« (3/🧵)

Read the full press release here: https://nachrichten.idw-online.de/2024/08/12/independent-complex-thinking-not-yet-possible-after-all-study-led-by-tu-shows-limitations-of-chatgpt-co

#ACL2024NLP

Independent, complex thinking not (yet) possible after all: Study led by TU shows limitations of ChatGPT & co.

The 2024 study, authored by Sheng Lu, Irina Bigoulaeva, Rachneet Sachdeva, Harish Tayyar Madabushi and Iryna Gurevych (BathNLP Lab | Ubiquitous Knowledge Processing (UKP) Lab), was just presented at #ACL2024NLP. It found no evidence of emergent abilities in LLMs that go beyond in-context learning.(2/🧵)

ArXiv: https://arxiv.org/abs/2309.01809

Are Emergent Abilities in Large Language Models just In-Context Learning?

Large language models, comprising billions of parameters and pre-trained on extensive web-scale corpora, have been claimed to acquire certain capabilities without having been specifically trained on them. These capabilities, referred to as "emergent abilities," have been a driving force in discussions regarding the potentials and risks of language models. A key challenge in evaluating emergent abilities is that they are confounded by model competencies that arise through alternative prompting techniques, including in-context learning, which is the ability of models to complete a task based on a few examples. We present a novel theory that explains emergent abilities, taking into account their potential confounding factors, and rigorously substantiate this theory through over 1000 experiments. Our findings suggest that purported emergent abilities are not truly emergent, but result from a combination of in-context learning, model memory, and linguistic knowledge. Our work is a foundational step in explaining language model performance, providing a template for their efficient use and clarifying the paradox of their ability to excel in some instances while faltering in others. Thus, we demonstrate that their capabilities should not be overestimated.

arXiv.org

Our colleagues Iryna Gurevych, Yufang Hou and Preslav Nakov presenting the work of Max Glockner on #Missci at #ACL2024NLP 🇹🇭 , a collaboration with IBM Research Ireland and MBZUAI.

https://arxiv.org/abs/2406.03181

Missci: Reconstructing Fallacies in Misrepresented Science

Health-related misinformation on social networks can lead to poor decision-making and real-world dangers. Such misinformation often misrepresents scientific publications and cites them as "proof" to gain perceived credibility. To effectively counter such claims automatically, a system must explain how the claim was falsely derived from the cited publication. Current methods for automated fact-checking or fallacy detection neglect to assess the (mis)used evidence in relation to misinformation claims, which is required to detect the mismatch between them. To address this gap, we introduce Missci, a novel argumentation theoretical model for fallacious reasoning together with a new dataset for real-world misinformation detection that misrepresents biomedical publications. Unlike previous fallacy detection datasets, Missci (i) focuses on implicit fallacies between the relevant content of the cited publication and the inaccurate claim, and (ii) requires models to verbalize the fallacious reasoning in addition to classifying it. We present Missci as a dataset to test the critical reasoning abilities of large language models (LLMs), that are required to reconstruct real-world fallacious arguments, in a zero-shot setting. We evaluate two representative LLMs and the impact of different levels of detail about the fallacy classes provided to the LLM via prompts. Our experiments and human evaluation show promising results for GPT 4, while also demonstrating the difficulty of this task.

arXiv.org

And consider following the authors Fengyu Cai (UKP Lab), Xinran Zhao (Carnegie Mellon University), Hongming Zhang (Tencent AI), Iryna Gurevych, and Heinz Koeppl (Computer Science, TU Darmstadt) for more information or an exchange of ideas.

See you at #ACL2024NLP 🇹🇭!

$\textit{GeoHard}$: Towards Measuring Class-wise Hardness through Modelling Class Semantics

Recent advances in measuring hardness-wise properties of data guide language models in sample selection within low-resource scenarios. However, class-specific properties are overlooked for task setup and learning. How will these properties influence model learning and is it generalizable across datasets? To answer this question, this work formally initiates the concept of $\textit{class-wise hardness}$. Experiments across eight natural language understanding (NLU) datasets demonstrate a consistent hardness distribution across learning paradigms, models, and human judgment. Subsequent experiments unveil a notable challenge in measuring such class-wise hardness with instance-level metrics in previous works. To address this, we propose $\textit{GeoHard}$ for class-wise hardness measurement by modeling class geometry in the semantic embedding space. $\textit{GeoHard}$ surpasses instance-level metrics by over 59 percent on $\textit{Pearson}$'s correlation on measuring class-wise hardness. Our analysis theoretically and empirically underscores the generality of $\textit{GeoHard}$ as a fresh perspective on data diagnosis. Additionally, we showcase how understanding class-wise hardness can practically aid in improving task learning.

arXiv.org

We demonstrate that with the knowledge of class-wise hardness, class reorganization will lead to a more coherent class-wise hardness distribution, and further improve the model performance.

(7/🧵)

#ACL2024NLP #NLProc

☝️ Moreover, we theoretically prove that the intra-class hardness is associated with overfitting phenomena, leading to performance degradation in the training process.

(6/🧵) #ACL2024NLP #NLProc