Many speak idyllically about a world with "fair" or "unbiased" LLMs, but is that even possible? In our new preprint, we take the most well-defined principle of AI safety/ethics and show, in reality, an LLM could never be fair under any definition in the current ML literature. [Brief thread:👇; paper link: https://arxiv.org/abs/2406.03198]
The Impossibility of Fair LLMs

The rise of general-purpose artificial intelligence (AI) systems, particularly large language models (LLMs), has raised pressing moral questions about how to reduce bias and ensure fairness at scale. Researchers have documented a sort of "bias" in the significant correlations between demographics (e.g., race, gender) in LLM prompts and responses, but it remains unclear how LLM fairness could be evaluated with more rigorous definitions, such as group fairness or fair representations. We analyze a variety of technical fairness frameworks and find inherent challenges in each that make the development of a fair LLM intractable. We show that each framework either does not logically extend to the general-purpose AI context or is infeasible in practice, primarily due to the large amounts of unstructured training data and the many potential combinations of human populations, use cases, and sensitive attributes. These inherent challenges would persist for general-purpose AI, including LLMs, even if empirical challenges, such as limited participatory input and limited measurement methods, were overcome. Nonetheless, fairness will remain an important type of model evaluation, and there are still promising research directions, particularly the development of standards for the responsibility of LLM developers, context-specific evaluations, and methods of iterative, participatory, and AI-assisted evaluation that could scale fairness across the diverse contexts of modern human-AI interaction.

arXiv.org
With ML models like for sentencing criminals or hiring job applicants, you might impose a constraint like "fairness through unawareness" (e.g., your model doesn't take race/gender as input), but not with LLMs or any general-purpose model built on unstructured data. (Section 3.1)
Recommendation systems scholars define fairness as equity between stakeholders, such as content creators. But if OpenAI/Google could consume the internet and serve it up with an LLM instead of redirecting to third parties, producers may never get their fair share! (Section 3.2)
What about "group fairness" (e.g., group parity, a hiring decision is uncorrelated with race, gender, disability, etc.)? No luck. Again, with general-purpose AI, fairness cannot be guaranteed across populations, and LLMs have no explicit target: city, industry, etc. (Section 4.1)
Worse, every LLM has a multitude of sensitive attributes at play. There are no robust techniques to excise even one from a dataset—much less all of them—and "unbiasing" for some tasks would remove essential information for other tasks like medical prediction. (Section 4.2)
But, you reply, at least we can enforce fairness in individual cases (e.g., sanitized datasets for each task) and combine those models into a general-purpose AI system! Unfortunately, as Dwork and Ilvento (2019) showed quite explicitly, fairness does not compose. (Section 4.3)
So are we morally doomed? Not quite! Our preprint dashes hopes for a silver bullet of AI ethics or safety, but the case for incremental fairness remains strong! We argue 3 principles: focus on context, hold LLM developers responsible, and iterate with stakeholders. (Section 5.1)
Moreover, AI-assisted alignment may be the only path to long-term success. We conclude our big-picture discussion with implications for specific LLM practices: curating training data, instruction tuning, prompt engineering, personalization, and interpretability. (Section 5.2)
The key point is: a lot of people are just too optimistic about AI ethics and safety right now. However, there is a ton of surface area for more contextualized, adaptive approaches! You can read our HEAL #CHI2024 paper on ArXiv: https://arxiv.org/abs/2406.03198 We hope you find it useful!
The Impossibility of Fair LLMs

The rise of general-purpose artificial intelligence (AI) systems, particularly large language models (LLMs), has raised pressing moral questions about how to reduce bias and ensure fairness at scale. Researchers have documented a sort of "bias" in the significant correlations between demographics (e.g., race, gender) in LLM prompts and responses, but it remains unclear how LLM fairness could be evaluated with more rigorous definitions, such as group fairness or fair representations. We analyze a variety of technical fairness frameworks and find inherent challenges in each that make the development of a fair LLM intractable. We show that each framework either does not logically extend to the general-purpose AI context or is infeasible in practice, primarily due to the large amounts of unstructured training data and the many potential combinations of human populations, use cases, and sensitive attributes. These inherent challenges would persist for general-purpose AI, including LLMs, even if empirical challenges, such as limited participatory input and limited measurement methods, were overcome. Nonetheless, fairness will remain an important type of model evaluation, and there are still promising research directions, particularly the development of standards for the responsibility of LLM developers, context-specific evaluations, and methods of iterative, participatory, and AI-assisted evaluation that could scale fairness across the diverse contexts of modern human-AI interaction.

arXiv.org