Mastodawn

UKP Lab Aug 13, 2024

Are all classes in #NLProc tasks equally difficult to learn? 🤔
In our #ACL2024NLP paper, we analyze why this is not the case!
Please meet #GeoHard, a metric to measure class-wise difficulty 🔍📊 ! 🧵(1/9)

📆 Poster: Tue, Aug 13, 12:15 ICT

📰 https://arxiv.org/abs/2407.12512

$\textit{GeoHard}$: Towards Measuring Class-wise Hardness through Modelling Class Semantics

Recent advances in measuring hardness-wise properties of data guide language models in sample selection within low-resource scenarios. However, class-specific properties are overlooked for task setup and learning. How will these properties influence model learning and is it generalizable across datasets? To answer this question, this work formally initiates the concept of $\textit{class-wise hardness}$. Experiments across eight natural language understanding (NLU) datasets demonstrate a consistent hardness distribution across learning paradigms, models, and human judgment. Subsequent experiments unveil a notable challenge in measuring such class-wise hardness with instance-level metrics in previous works. To address this, we propose $\textit{GeoHard}$ for class-wise hardness measurement by modeling class geometry in the semantic embedding space. $\textit{GeoHard}$ surpasses instance-level metrics by over 59 percent on $\textit{Pearson}$'s correlation on measuring class-wise hardness. Our analysis theoretically and empirically underscores the generality of $\textit{GeoHard}$ as a fresh perspective on data diagnosis. Additionally, we showcase how understanding class-wise hardness can practically aid in improving task learning.

arXiv.org

Show thread

UKP Lab Aug 13, 2024

🎯 We've found a consistent pattern of class-wise difficulty across various language models, paradigms and human annotations on eight NLU datasets.

• Fine-tuned LMs: Roberta/OPT/Flan-T5
• In-context learning: LLama/OPT

Class-wise difficulty is an intrinsic feature! 🔍🤖

(2/🧵) #ACL2024NLP #NLProc

Show thread

UKP Lab Aug 13, 2024

🤔 How do we measure the hardness of a class?

GeoHard to the rescue! It incorporates both inter-class and intra-class measures from class-wise semantics.

In the embedding space, greater diversity within a class and closer distances between classes indicate higher hardness.

(3/🧵)

#ACL2024NLP #NLProc

Show thread

UKP Lab Aug 13, 2024

By computing the correlation between hardness measures and performance, we compare GeoHard with baseline metrics, specifically the aggregation of instance-level hardness metrics on eight NLU datasets.

GeoHard outperforms the instance-level aggregation by more than 59%! 🤯

(4/🧵)

#ACL2024NLP #NLProc

Show thread

UKP Lab Aug 13, 2024

🧠 GeoHard is stable across different semantic encoders and NLP tasks. This means that it generalizes well in measuring class-wise hardness.

(5/🧵)

#ACL2024NLP #NLProc

Show thread

UKP Lab

☝️ Moreover, we theoretically prove that the intra-class hardness is associated with overfitting phenomena, leading to performance degradation in the training process.

(6/🧵) #ACL2024NLP #NLProc

Show thread

UKP Lab Aug 13, 2024

We demonstrate that with the knowledge of class-wise hardness, class reorganization will lead to a more coherent class-wise hardness distribution, and further improve the model performance.

(7/🧵)

#ACL2024NLP #NLProc

Show thread

UKP Lab Aug 13, 2024

Check our paper and code!

📰 Paper:https://arxiv.org/abs/2407.12512
💻 Code: https://github.com/TRUMANCFY/geohard

(8/🧵) #ACL2024NLP #NLProc

$\textit{GeoHard}$: Towards Measuring Class-wise Hardness through Modelling Class Semantics

arXiv.org

Show thread

UKP Lab Aug 13, 2024

And consider following the authors Fengyu Cai (UKP Lab), Xinran Zhao (Carnegie Mellon University), Hongming Zhang (Tencent AI), Iryna Gurevych, and Heinz Koeppl (Computer Science, TU Darmstadt) for more information or an exchange of ideas.

See you at #ACL2024NLP 🇹🇭!