@abananabag @alinanorakari They are somewhat black box entities in which we train their responses by feeding in large quantities of data and temper those responses by manual training, in the manual training especially we set a tone for what's expected of the AI.
The scoring system creates a sort of implicit motivation for the AI, it's designed to "want" a higher score and learns from the training what answers give it higher scores and learns patterns from that.
The nature of the models is that they're generating what looks like human responses. It's a whole different beast (of which this would be just a component) to make an AI that's specifically broad general knowledge expertise, especially when said AI needs to also have a conversation with the user.
And as far as things like mathematics being a language, it's well established and it's also why their programming and math expertise are accidental. They didn't design these models initially for that, they just fed it craploads of data and it incidentally picked up those skills from the training data set. Both math and programming statements are just a form of instructions, just rigid ones. And because it's rigid, it's actually probably easier for LLMs to understand.
It's similar to how they have a wide array of language understanding, because they're not natively english or such... they had a wide array of languages thrown at it, and it's understanding of any language is learned from the dataset rather than being hard programming. So it knows English and German much the same as it knows Python and C++.