Input, more input πŸ€–βš‘ Just like Jonny 5 in Short Circuit, our baby model is reading every single token from its pretraining dataset. So far: 10 trillion tokens, 36 languages + code & math as their own "languages" πŸ“šπŸŒπŸ’» We’re tracking progress & sharing it openly πŸ‘‡ (1/2)

Ally Sheedy and Johnny 5 in Sh...
As of this morning: 🧠 425.49B tokens seen πŸ“Š 4.25% completed This eager reader wants more input, one token at a time. Follow along. πŸ” (2/2) #PreTraining #LLM #MultilingualAI #TransparentAI #goOpenEuroLLM