Olmo Hybrid offers the same performance as Olmo 3 but with 49% fewer training tokens by combining transformers and linear recurrent layers, may offer better expressivity and scalability than pure models.
Released by @allenai with open weights, checkpoints, and code for community use and research #allenai #olmohybrid
https://allenai.org/blog/olmohybrid
Released by @allenai with open weights, checkpoints, and code for community use and research #allenai #olmohybrid
https://allenai.org/blog/olmohybrid

