Mastodawn

Arvind Narayanan Apr 26, 2023

People have been posting glaring examples of ChatGPT’s gender bias, like arguing that attorneys can't be pregnant. So @sayashk and I tested ChatGPT on WinoBias, a standard gender bias benchmark. Both GPT-3.5 and GPT-4 are about 3 times as likely to answer incorrectly if the correct answer defies gender stereotypes — despite the benchmark dataset likely being included in the training data. https://aisnakeoil.substack.com/p/quantifying-chatgpts-gender-bias

Quantifying ChatGPT’s gender bias

Benchmarks allow us to dig deeper into what causes biases and what can be done about it

AI Snake Oil