How should we design a probe? Previous work in NLP asks how much information a model's representations contain. In our #emnlp2022 paper, we instead ask how much a model could extract! This leads to an answer: follow the architectural bottleneck principle!
Joint work with Josef Valvoda, @niklasstoehr and @rdc
https://arxiv.org/abs/2211.06420
In this paper, we seek to measure how much information a component in a neural network could extract from the representations fed into it. Our work stands in contrast to prior probing work, most of which investigates how much information a model's representations contain. This shift in perspective leads us to propose a new principle for probing, the architectural bottleneck principle: In order to estimate how much information a given component could extract, a probe should look exactly like the component. Relying on this principle, we estimate how much syntactic information is available to transformers through our attentional probe, a probe that exactly resembles a transformer's self-attention head. Experimentally, we find that, in three models (BERT, ALBERT, and RoBERTa), a sentence's syntax tree is mostly extractable by our probe, suggesting these models have access to syntactic information while composing their contextual representations. Whether this information is actually used by these models, however, remains an open question.
The time we take to read a word depends on its predictability, i.e. its surprisal. However, we only know how surprising a word is after we see it. Our new paper investigates whether we anticipate words' surprisals to allocate reading times in advance :)
Joint work with Clara Meister, Ethan Wilcox, @roger_p_levy , @rdc
Paper: https://arxiv.org/abs/2211.14301
Code: https://github.com/rycolab/anticipation-on-reading-times
Over the past two decades, numerous studies have demonstrated how less predictable (i.e. higher surprisal) words take more time to read. In general, these previous studies implicitly assumed the reading process to be purely responsive: readers observe a new word and allocate time to read it as required. These results, however, are also compatible with a reading time that is anticipatory: readers could, e.g., allocate time to a future word based on their expectation about it. In this work, we examine the anticipatory nature of reading by looking at how people's predictions about upcoming material influence reading times. Specifically, we test anticipation by looking at the effects of surprisal and contextual entropy on four reading-time datasets: two self-paced and two eye-tracking. In three of four datasets tested, we find that the entropy predicts reading times as well as (or better than) the surprisal. We then hypothesise four cognitive mechanisms through which the contextual entropy could impact RTs -- three of which we design experiments to analyse. Overall, our results support a view of reading that is both anticipatory and responsive.
I'm absolutely thrilled to have been awarded a prestigious ERC Starting Grant on 'Explainable and Robust Automatic Fact Checking (ExplainYourself)'!
Official press release: https://erc.europa.eu/news/erc-2021-starting-grants-results
More about the project & how to join the team: http://www.copenlu.com/talk/2022_11_erc/
This wouldn't have been possible without the great work of my PhD students & postdocs in CopeNLU (especially @pepa @dustin) which this project builds on.