You've heard of "AI red teaming" frontier LLMs, but what is it? Does it work? Who benefits?
Questions for our #CSCW2024 workshop! The team includes OpenAI's red team lead, Lama Ahmad, and Microsoft's, Ram Shankar Siva Kumar.
Cite paper: http://arxiv.org/abs/2407.07786
Apply to join: http://bit.ly/airedteam


The Human Factor in AI Red Teaming: Perspectives from Social and Collaborative Computing
Rapid progress in general-purpose AI has sparked significant interest in "red teaming," a practice of adversarial testing originating in military and cybersecurity applications. AI red teaming raises many questions about the human factor, such as how red teamers are selected, biases and blindspots in how tests are conducted, and harmful content's psychological effects on red teamers. A growing body of HCI and CSCW literature examines related practices-including data labeling, content moderation, and algorithmic auditing. However, few, if any, have investigated red teaming itself. This workshop seeks to consider the conceptual and empirical challenges associated with this practice, often rendered opaque by non-disclosure agreements. Future studies may explore topics ranging from fairness to mental health and other areas of potential harm. We aim to facilitate a community of researchers and practitioners who can begin to meet these challenges with creativity, innovation, and thoughtful reflection.
arXiv.orgOur new preprint shows the first detailed public opinion data on digital sentience:
76% agree torturing sentient AIs is wrong;
69% support a ban on sentient AI;
63% support a ban on AGI; and
a median forecast of 5 years to sentient AI and only 2 to AGI! https://arxiv.org/abs/2407.08867

Perceptions of Sentient AI and Other Digital Minds: Evidence from the AI, Morality, and Sentience (AIMS) Survey
Humans now interact with a variety of digital minds, AI systems that appear to have mental faculties such as reasoning, emotion, and agency, and public figures are discussing the possibility of sentient AI. We present initial results from 2021 and 2023 for the nationally representative AI, Morality, and Sentience (AIMS) survey (N = 3,500). Mind perception and moral concern for AI welfare were surprisingly high and significantly increased: in 2023, one in five U.S. adults believed some AI systems are currently sentient, and 38% supported legal rights for sentient AI. People became more opposed to building digital minds: in 2023, 63% supported banning smarter-than-human AI, and 69% supported banning sentient AI. The median 2023 forecast was that sentient AI would arrive in just five years. The development of safe and beneficial AI requires not just technical study but understanding the complex ways in which humans perceive and coexist with digital minds.
arXiv.orgThe key point is: a lot of people are just too optimistic about AI ethics and safety right now. However, there is a ton of surface area for more contextualized, adaptive approaches! You can read our HEAL
#CHI2024 paper on ArXiv:
https://arxiv.org/abs/2406.03198 We hope you find it useful!

The Impossibility of Fair LLMs
The rise of general-purpose artificial intelligence (AI) systems, particularly large language models (LLMs), has raised pressing moral questions about how to reduce bias and ensure fairness at scale. Researchers have documented a sort of "bias" in the significant correlations between demographics (e.g., race, gender) in LLM prompts and responses, but it remains unclear how LLM fairness could be evaluated with more rigorous definitions, such as group fairness or fair representations. We analyze a variety of technical fairness frameworks and find inherent challenges in each that make the development of a fair LLM intractable. We show that each framework either does not logically extend to the general-purpose AI context or is infeasible in practice, primarily due to the large amounts of unstructured training data and the many potential combinations of human populations, use cases, and sensitive attributes. These inherent challenges would persist for general-purpose AI, including LLMs, even if empirical challenges, such as limited participatory input and limited measurement methods, were overcome. Nonetheless, fairness will remain an important type of model evaluation, and there are still promising research directions, particularly the development of standards for the responsibility of LLM developers, context-specific evaluations, and methods of iterative, participatory, and AI-assisted evaluation that could scale fairness across the diverse contexts of modern human-AI interaction.
arXiv.orgMoreover, AI-assisted alignment may be the only path to long-term success. We conclude our big-picture discussion with implications for specific LLM practices: curating training data, instruction tuning, prompt engineering, personalization, and interpretability. (Section 5.2)
So are we morally doomed? Not quite! Our preprint dashes hopes for a silver bullet of AI ethics or safety, but the case for incremental fairness remains strong! We argue 3 principles: focus on context, hold LLM developers responsible, and iterate with stakeholders. (Section 5.1)
But, you reply, at least we can enforce fairness in individual cases (e.g., sanitized datasets for each task) and combine those models into a general-purpose AI system! Unfortunately, as Dwork and Ilvento (2019) showed quite explicitly, fairness does not compose. (Section 4.3)
Worse, every LLM has a multitude of sensitive attributes at play. There are no robust techniques to excise even one from a dataset—much less all of them—and "unbiasing" for some tasks would remove essential information for other tasks like medical prediction. (Section 4.2)
What about "group fairness" (e.g., group parity, a hiring decision is uncorrelated with race, gender, disability, etc.)? No luck. Again, with general-purpose AI, fairness cannot be guaranteed across populations, and LLMs have no explicit target: city, industry, etc. (Section 4.1)
Recommendation systems scholars define fairness as equity between stakeholders, such as content creators. But if OpenAI/Google could consume the internet and serve it up with an LLM instead of redirecting to third parties, producers may never get their fair share! (Section 3.2)
With ML models like for sentencing criminals or hiring job applicants, you might impose a constraint like "fairness through unawareness" (e.g., your model doesn't take race/gender as input), but not with LLMs or any general-purpose model built on unstructured data. (Section 3.1)