The TMLR version is here: https://openreview.net/forum?id=aY2nsgE97a
and the arXiv version (https://arxiv.org/abs/2405.05847) should be updated to match shortly. Check it out if you're interested in interpretability, and its challenges!
#interpretability
Andrew Lampinen (@[email protected])
How well can we understand an LLM by interpreting its representations? What can we learn by comparing brain and model representations? Our new paper (https://arxiv.org/abs/2405.05847) highlights intriguing biases in learned feature representations that make interpreting them more challenging! 1/9 #intrepretability #deeplearning #representation #transformers