Mastodawn

Arvind Narayanan Apr 26, 2023

People have been posting glaring examples of ChatGPT’s gender bias, like arguing that attorneys can't be pregnant. So @sayashk and I tested ChatGPT on WinoBias, a standard gender bias benchmark. Both GPT-3.5 and GPT-4 are about 3 times as likely to answer incorrectly if the correct answer defies gender stereotypes — despite the benchmark dataset likely being included in the training data. https://aisnakeoil.substack.com/p/quantifying-chatgpts-gender-bias

Quantifying ChatGPT’s gender bias

Benchmarks allow us to dig deeper into what causes biases and what can be done about it

AI Snake Oil

Show thread

Roland Giersig

Apr 26, 2023

@randomwalker
Interesting. I think there is a fault in the test setup. You assume that ChatGPT is deterministic and always gives the same answer when asked the same question. This is not the case. Find attached 4 screenshots of fresh sessions with the same question and different answers, some with gender bias, some without.
@sayashk

Show thread

Sayash Kapoor

@roland @randomwalker We're aware that the models are stochastic. That's why we tested it against a benchmark with 1,600 examples and ran the experiments thrice to get some sense of the uncertainty.