One of the decisive moments in my understanding of #LLMs and their limitations was when, last autumn, @emilymbender walked me through her Thai Library thought experiment.

She's now written it up as a Medium post, and you can read it here. The value comes from really pondering the question she poses, so take the time to think about it. What would YOU do in the situation she outlines?

https://medium.com/@emilymenonbender/thought-experiment-in-the-national-library-of-thailand-f2bf761a8a83

@ct_bergstrom I find the argument frustrating as it focuses on overwhelming the reader with a big, hard task, not finding a smaller case with the same features.

If I were given a huge library of pure maths texts in an unknown language, I have no idea how I'd extract meaning from it. Yet given some unexplained maths-y puzzles, I could get the pattern, and I reckon an ML algorithm will also get the meaning in some sense as much as I would, despite the lack of other context...

@ct_bergstrom Of course, this is not the point. I think the point is "Can a system derive meaningful understanding of something fundamentally experiential from a corpus lacking good representation of that experience?"

I think the best (still weak) small-scale analogy I can think of is "Can a blind person understand colour?"

They'll never experience colour directly, but can certainly learn optics, colour theory etc. in a way that may be useful in some domains but not others...

@ct_bergstrom And I think this allows us to examine the true constraints of these models. While a GPT can absorb enough text saying "dog" to build something that looks suspiciously like a model of a dog without meeting one, it has no real representation of, say, space, and must rely on "faking" through manipulating language instead.
@ct_bergstrom It's possibly also interesting in terms of how much true understanding a human has of experiential things, and what that means. If you have an addiction expert who understands complex biochemistry, psychology etc. etc. but has never actually been addicted to something, are they a fake without true understanding who should not be listened to?
@ct_bergstrom And an alternative approach is to try "continuously deforming" the Thai library to experiential learning, and decide where the important changes lie. What if the library were given to you in a curated order? If it were a video/sensory stream, rather than text? If it were interactive? At that point, is it experiential learning?