Mastodawn

Show thread

Lukas Galke Jan 6, 2025

Paper: https://rdcu.be/d5f2e

Code: https://github.com/lgalke/easy2deeplearn

Data: https://doi.org/10.5281/zenodo.14205452

Show thread

Lukas Galke Jan 6, 2025

What can we conclude? Humans and deep nets are not so different after all when learning a new language. The simplicity bias of overparameterized models seems to guide them towards learning compositional structures, even though they could easily memorize all different combinations.

Show thread

Lukas Galke Jan 6, 2025

When analyzing the learning trajectory of RNNs throughout training, we make several other interesting observations: medium-structured languages have an learnability advantage early in training (likely due to ambiguous terms in those languages) but fall behind high-structured languages later.

Show thread

Lukas Galke Jan 6, 2025

We find a similar effect when looking at memorization errors. In the memorization test, the task for in-context LLMs boils down to copying a word that is present earlier in the prompt. But even here, we can see an advantage of language structure.

Show thread

Lukas Galke Jan 6, 2025

All these learning systems, small RNNs, pre-trained LLMs, and humans, show *very* similar memorization and generalization behavior -- with more structured languages leading to generalizations that are more systematic and more similar to the generalization of human participants.

Show thread

Lukas Galke Jan 6, 2025

Investigating the relationship between language learning and language structure, we find striking similarities between humans and language models: small recurrent neural networks trained from scratch and large pre-trained language models via in-context learning.

Lukas Galke Jan 6, 2025

🗞️ Now out in Nature Communications:

Deep neural networks and humans both benefit from compositional structure.

w/ Yoav Ram and Limor Raviv

Lukas Galke Dec 14, 2024

Ansgar Scherp Dec 13, 2024

Preventing catastrophic forgetting in NLP! 🌟 Our discrete key-value bottleneck enables efficient continual learning in encoder-only language models—no major updates, just localized tweaks. With Andor Diera and @lpag Learn more! 🚀 https://arxiv.org/abs/2412.08528

Continual Learning for Encoder-only Language Models via a Discrete Key-Value Bottleneck

Continual learning remains challenging across various natural language understanding tasks. When models are updated with new training data, they risk catastrophic forgetting of prior knowledge. In the present work, we introduce a discrete key-value bottleneck for encoder-only language models, allowing for efficient continual learning by requiring only localized updates. Inspired by the success of a discrete key-value bottleneck in vision, we address new and NLP-specific challenges. We experiment with different bottleneck architectures to find the most suitable variants regarding language, and present a generic discrete key initialization technique for NLP that is task independent. We evaluate the discrete key-value bottleneck in four continual learning NLP scenarios and demonstrate that it alleviates catastrophic forgetting. We showcase that it offers competitive performance to other popular continual learning methods, with lower computational costs.

arXiv.org

Lukas Galke Nov 30, 2024

Lukas Galke Nov 25, 2024

We have some openings for PhD/Postdoc positions on multilingual language modeling at SDU's Centre for Machine Learning, Denmark. Topics go down to the core of pre-training and instruction tuning and adjacent topics such as efficient language modeling. Please consider to apply and/or reshare :)

https://tinyurl.com/dfm2025phd

https://tinyurl.com/dfm2025postdoc

Several PhD scholarships in Artificial Intelligence

Application deadline: 19 December 2024 at 23:59 hours local Danish time

SDU Career Site

Lukas Galke Nov 25, 2024

https://tinyurl.com/dfm2025phd

https://tinyurl.com/dfm2025postdoc

Several PhD scholarships in Artificial Intelligence

Application deadline: 19 December 2024 at 23:59 hours local Danish time

SDU Career Site

Website	http://lpag.de
ORCID	https://orcid.org/0000-0001-6124-1092
Google Scholar	https://scholar.google.de/citations?hl=en&user=AHGGdYQAAAAJ&view_op=list_works&sortby=pubdate