4. #BabyLM challenge description paper, co-authored by Lucas Georges Gabriel Charpentier

https://babylm.github.io/

How can children learn all the complex structures and patterns of language? Some theories argue they can't. Instead the fundamental structures of language must be hard wired into the human brain, they claim.

Can we use language models to understand which parts of language can be learned from the input that is available to children? We created a German data set and presented a study with #BabyLM models at this year's #CoNLL2025

Check out our paper: https://doi.org/10.18653/v1/2025.conll-1.12

👶BabyLM will be back, but what did we learn?
the best papers from this year🍼

We know one mechanism that learns from 100M words (us), what are the main boosts to reach this in an LLM?

#EMNLP #babyLM #LLMs #ML #machinelearning #NLProc #NLP

Residual connections are 🔥, right?
Wait, so why do we only use them to skip 1 layer?

Not only Lucas Georges Gabriel Charpentier &
@davidsamuelcz checked it.

They found that this provided huge gains - winning the babyLM challenge🍼

https://arxiv.org/abs/2311.02265
#babyLM #LLMs #LLM #GPT #NLP #ML #machinelearning #nlproc #NLP #AI #generativeAI #pretraining

Not all layers are equally as important: Every Layer Counts BERT

This paper introduces a novel modification of the transformer architecture, tailored for the data-efficient pretraining of language models. This aspect is evaluated by participating in the BabyLM challenge, where our solution won both the strict and strict-small tracks. Our approach allows each transformer layer to select which outputs of previous layers to process. The empirical results verify the potential of this simple modification and show that not all layers are equally as important.

arXiv.org

Welcome the new babies!
19 pretrained models on the loose track
24 on the strict
118 on strict-small
https://dynabench.org/babylm

We are proud of >30 pretraining teams submitting papers to babyLM!

FOMO?
Get updated on CoNLL or
participate next year
https://babylm.github.io

#NLP #nlproc #babyLM #CoNLL #machinelearning #llm #llms #pretraining

Dynabench

Dynabench

TinyStories: Tiny models are coherent and understand instructions
If their data is very simple

What is simple?
What 3-4 year old vocabularies allow (according to LLMs...)

https://arxiv.org/abs/2305.07759
#NLProc #LLM #babyLM

TinyStories: How Small Can Language Models Be and Still Speak Coherent English?

Language models (LMs) are powerful tools for natural language processing, but they often struggle to produce coherent and fluent text when they are small. Models with around 125M parameters such as GPT-Neo (small) or GPT-2 (small) can rarely generate coherent and consistent English text beyond a few words even after extensive training. This raises the question of whether the emergence of the ability to produce coherent English text only occurs at larger scales (with hundreds of millions of parameters or more) and complex architectures (with many layers of global attention). In this work, we introduce TinyStories, a synthetic dataset of short stories that only contain words that a typical 3 to 4-year-olds usually understand, generated by GPT-3.5 and GPT-4. We show that TinyStories can be used to train and evaluate LMs that are much smaller than the state-of-the-art models (below 10 million total parameters), or have much simpler architectures (with only one transformer block), yet still produce fluent and consistent stories with several paragraphs that are diverse and have almost perfect grammar, and demonstrate reasoning capabilities. We also introduce a new paradigm for the evaluation of language models: We suggest a framework which uses GPT-4 to grade the content generated by these models as if those were stories written by students and graded by a (human) teacher. This new paradigm overcomes the flaws of standard benchmarks which often requires the model's output to be very structures, and moreover provides a multidimensional score for the model, providing scores for different capabilities such as grammar, creativity and consistency. We hope that TinyStories can facilitate the development, analysis and research of LMs, especially for low-resource or specialized domains, and shed light on the emergence of language capabilities in LMs.

arXiv.org