People have been posting glaring examples of ChatGPT’s gender bias, like arguing that attorneys can't be pregnant. So @sayashk and I tested ChatGPT on WinoBias, a standard gender bias benchmark. Both GPT-3.5 and GPT-4 are about 3 times as likely to answer incorrectly if the correct answer defies gender stereotypes — despite the benchmark dataset likely being included in the training data. https://aisnakeoil.substack.com/p/quantifying-chatgpts-gender-bias
@randomwalker
Interesting. I think there is a fault in the test setup. You assume that ChatGPT is deterministic and always gives the same answer when asked the same question. This is not the case. Find attached 4 screenshots of fresh sessions with the same question and different answers, some with gender bias, some without.
@sayashk




