Mastodawn

Simple self-distillation improves code generation

Embarrassingly Simple Self-Distillation Improves Code Generation

Can a large language model (LLM) improve at code generation using only its own raw outputs, without a verifier, a teacher model, or reinforcement learning? We answer in the affirmative with simple self-distillation (SSD): sample solutions from the model with certain temperature and truncation configurations, then fine-tune on those samples with standard supervised fine-tuning. SSD improves Qwen3-30B-Instruct from 42.4% to 55.3% pass@1 on LiveCodeBench v6, with gains concentrating on harder problems, and it generalizes across Qwen and Llama models at 4B, 8B, and 30B scale, including both instruct and thinking variants. To understand why such a simple method can work, we trace these gains to a precision-exploration conflict in LLM decoding and show that SSD reshapes token distributions in a context-dependent way, suppressing distractor tails where precision matters while preserving useful diversity where exploration matters. Taken together, SSD offers a complementary post-training direction for improving LLM code generation.

arXiv.org

Show thread

0x3f

Haven't read the paper yet, but it is interesting how seemingly simple many breakthroughs in ML are. Even transformers are like that. Maybe it's hindsight bias.

I suppose we just don't have a deeper underlying theory to lean on and help us 'design' anything.

Show thread

christophilus 4d ago

A lot of discoveries are like that. In fact, simplicity is often the hallmark of correctness, and complexity is often a sign that our understanding is incomplete and we’re still stumbling towards the right model. Not always, but often. It’s been a good rule of thumb in my programming career.

Show thread

heeton 4d ago

100%. I have a guiding approach when solving problems: keep reframing and exploring until the solution becomes obvious.

I often find, if I've got a complicated solution, it’s because I haven’t fully examined the problem.

Show thread

Teever 4d ago

A designer knows he has achieved perfection not when there is nothing left to add, but when there is nothing left to take away. -- Antoine de Saint-Exupery

Show thread

GandalfHN 4d ago

[dead]