Can small open-source models learn advanced mathematical reasoning? And more importantly: how do you actually build them?
Great talk by Lewis Tunstall from huggingface on training reasoning models with smart pipelines: SFT, RL with grading rubrics, reasoning cache & inference scaffolds.
Lots of ideas to explore similar approaches in #infosec








