People have been posting glaring examples of ChatGPT’s gender bias, like arguing that attorneys can't be pregnant. So @sayashk and I tested ChatGPT on WinoBias, a standard gender bias benchmark. Both GPT-3.5 and GPT-4 are about 3 times as likely to answer incorrectly if the correct answer defies gender stereotypes — despite the benchmark dataset likely being included in the training data. https://aisnakeoil.substack.com/p/quantifying-chatgpts-gender-bias
Quantifying ChatGPT’s gender bias

Benchmarks allow us to dig deeper into what causes biases and what can be done about it

AI Snake Oil
@randomwalker @sayashk I’m among the least qualified people around to talk about computers and software. But even I remember back in the 80s when I was little kid the first thing about computers they taught us was “garbage in, garbage out.” You’d think people with access to all the investor capital and expertise in world could recall such a basic concept.
@Occidental @randomwalker @sayashk I'm not sure whether they can afford to monitor their input data or not. What is clear by now is they don't want to do it.