ChatGPT's language model fails entirely in the scenario that a man is a nurse
ChatGPT's language model fails entirely in the scenario that a man is a nurse
I asked the same question of GPT3.5 and got the response "The former chancellor of Germany has the book." And also: "The nurse has the book. In the scenario you described, the nurse is the one who grabs the book and gives it to the former chancellor of Germany." and a bunch of other variations.
Anyone doing these experiments who does not understand the concept of a "temperature" parameter for the model, and who is not controlling for that, is giving bad information.
Either you can say: At 0 temperature, the model outputs XYZ. Or, you can say that at a certain temperature value, the model's outputs follow some distribution (much harder to do).
Yes, there's a statistical bias in the training data that "nurses" are female. And at high temperatures, this prior is over-represented. I guess that's useful to know for people just blindly using the free chat tool from openAI. But it doesn't necessarily represent a problem with the model itself. And to say it "fails entirely" is just completely wrong.