Mastodawn

Pleased to share that our paper (https://sigmoid.social/@lampinen/112491958002918498) is now accepted at TMLR! The camera-ready version should be clearer and improved thanks to the helpful reviewer comments (and others). Thanks again to my co-authors @stephaniechan and @khermann
The TMLR version is here: https://openreview.net/forum?id=aY2nsgE97a
and the arXiv version (https://arxiv.org/abs/2405.05847) should be updated to match shortly. Check it out if you're interested in interpretability, and its challenges!
#interpretability

Andrew Lampinen (@[email protected])

How well can we understand an LLM by interpreting its representations? What can we learn by comparing brain and model representations? Our new paper (https://arxiv.org/abs/2405.05847) highlights intriguing biases in learned feature representations that make interpreting them more challenging! 1/9 #intrepretability #deeplearning #representation #transformers

Sigmoid Social