Large languge models are unredictably unpredictable, according to a new paper from Apple Research. The gist: Current AI control methods assume models are steerable by default. This paper shows that's often wrong: the behaviors you want may simply be impossible, and you can't know in advance which ones. it's all very Heisenberg-ian, that you can't know both capability & consequences with precision. https://machinelearning.apple.com/research/genctrl
@paul While Apple is not spending massive amounts of money to train frontier models, they do seem to be maintaining an academic-like team to really understand the nature of these models.
That feels like a good place to be at the moment.
