If you're at #ACL2023NLP ... erm ... actually, I've lost track of what that means for findings papers ... you can watch a 1 minute video? ... anyways, I've realised that I hadn't yet announced our findings papers, so here goes, in no particular order:

1/4 #newpaper #nlp #nlproc

Can (text) LLMs reason about images, if they get a textual description of them? Yes, sort of!
Says Sherzod Hakimov, in "Images in Language Space: Exploring the Suitability of Large Language Models for Vision & Language Tasks" (ACL Findings)
https://arxiv.org/abs/2305.13782

2/4

Images in Language Space: Exploring the Suitability of Large Language Models for Vision & Language Tasks

Large language models have demonstrated robust performance on various language tasks using zero-shot or few-shot learning paradigms. While being actively researched, multimodal models that can additionally handle images as input have yet to catch up in size and generality with language-only models. In this work, we ask whether language-only models can be utilised for tasks that require visual input -- but also, as we argue, often require a strong reasoning component. Similar to some recent related work, we make visual information accessible to the language model using separate verbalisation models. Specifically, we investigate the performance of open-source, open-access language models against GPT-3 on five vision-language tasks when given textually-encoded visual information. Our results suggest that language models are effective for solving vision-language tasks even with limited samples. This approach also enhances the interpretability of a model's output by providing a means of tracing the output back through the verbalised image content.

arXiv.org

Will language learning agents learn more easily if they get supportive feedbk? Yes, sort of!
Found Philipp Sadler (w/ Sherzod), in "Yes, this Way! Learning to Ground Ref Expr into Actions with Intra-episodic Feedback from Supportive Teachers"

https://arxiv.org/abs/2305.12880

3/4

Yes, this Way! Learning to Ground Referring Expressions into Actions with Intra-episodic Feedback from Supportive Teachers

The ability to pick up on language signals in an ongoing interaction is crucial for future machine learning models to collaborate and interact with humans naturally. In this paper, we present an initial study that evaluates intra-episodic feedback given in a collaborative setting. We use a referential language game as a controllable example of a task-oriented collaborative joint activity. A teacher utters a referring expression generated by a well-known symbolic algorithm (the "Incremental Algorithm") as an initial instruction and then monitors the follower's actions to possibly intervene with intra-episodic feedback (which does not explicitly have to be requested). We frame this task as a reinforcement learning problem with sparse rewards and learn a follower policy for a heuristic teacher. Our results show that intra-episodic feedback allows the follower to generalize on aspects of scene complexity and performs better than providing only the initial statement.

arXiv.org

Can we learn when to restart for restart incremental processing? Yes, sort of!

Found the unstoppable Patrick Kahardipraja (in his 3rd paper out of his MSc!): "Learning Adaptive Revision for Incremental Natural Language Understanding with a Two-Pass Model"

https://arxiv.org/abs/2305.10845

4/4

TAPIR: Learning Adaptive Revision for Incremental Natural Language Understanding with a Two-Pass Model

Language is by its very nature incremental in how it is produced and processed. This property can be exploited by NLP systems to produce fast responses, which has been shown to be beneficial for real-time interactive applications. Recent neural network-based approaches for incremental processing mainly use RNNs or Transformers. RNNs are fast but monotonic (cannot correct earlier output, which can be necessary in incremental processing). Transformers, on the other hand, consume whole sequences, and hence are by nature non-incremental. A restart-incremental interface that repeatedly passes longer input prefixes can be used to obtain partial outputs, while providing the ability to revise. However, this method becomes costly as the sentence grows longer. In this work, we propose the Two-pass model for AdaPtIve Revision (TAPIR) and introduce a method to obtain an incremental supervision signal for learning an adaptive revision policy. Experimental results on sequence labelling show that our model has better incremental performance and faster inference speed compared to restart-incremental Transformers, while showing little degradation on full sequences.

arXiv.org

So many findings, still so many unknowings!

Anyway, have fun at #ACL2023NLP you who are going to be there, and everybody, check out these papers if the titles sounds interesting.

5/4 & fin