@enhancedscurry @eschaton @siracusa I’m not an expert, but these don’t feel like something sufficiently good models will fix:
• hands with too many or too few fingers
• distortions in image enhancements
• ChatGPT guessing how many letters a word has
These are all “human looks at it for mere seconds and seems the error” mistakes that such a model doesn’t seem to grok
@siracusa @enhancedscurry @chucker
Considering LLMs has made me reflect on how our own brain has separate modules for "understanding" (predicting, modelling) different things, such that not only is our understanding not reducible to language, the very notion of "meaning" for a human appeals to those non-linguistic models. Language production & understanding for us consists of converting nonverbal internal states/model into language and back. Of course there are complex feedbacks where language influences nonverbal understanding but it still involves separate systems and understanding doesn't reduce to language.
What I'm not sure of is whether this is an inherent feature of anything that could act on its "understanding" as well as we do (which LLMs currently can't, whatever label we assign to their internal processes), or whether it's just the way we happen to do it and a system without such a separation could perform as well as we do. I can't help feeling it's an inherent feature but I haven't found a logical justification for it.