4/
The research [2] noted how "the percentage of incorrect results increases markedly from the raw to the shaped-up models, as a consequence of substantially reducing avoidance [...]
Where the raw models tend to give non-conforming outputs that cannot be interpreted as an answer [...], shaped-up models instead give seemingly #PlausibleButWrong answers [[...]
This does not match the expectation that more recent #LLMs would more successfully avoid answering outside their operating range"