Self-Distillation

Discover how self-distillation enables continual learning with minimal overhead

https://airanked.dev/posts/self-distillation-in-ai

#SelfDistillation #ContinualLearning #AiTraining

Self-Distillation Enables Continual Learning

Continual learning, enabling models to acquire new skills and knowledge without degrading existing capabilities, remains a fundamental challenge for foundation models. While on-policy reinforcement learning can reduce forgetting, it requires explicit reward functions that are often unavailable. Learning from expert demonstrations, the primary alternative, is dominated by supervised fine-tuning (SFT), which is inherently off-policy. We introduce Self-Distillation Fine-Tuning (SDFT), a simple method that enables on-policy learning directly from demonstrations. SDFT leverages in-context learning by using a demonstration-conditioned model as its own teacher, generating on-policy training signals that preserve prior capabilities while acquiring new skills. Across skill learning and knowledge acquisition tasks, SDFT consistently outperforms SFT, achieving higher new-task accuracy while substantially reducing catastrophic forgetting. In sequential learning experiments, SDFT enables a single model to accumulate multiple skills over time without performance regression, establishing on-policy distillation as a practical path to continual learning from demonstrations.

arXiv.org

fly51fly (@fly51fly)

Self-Distillation Zero는 이진 보상만으로 학습하던 방식을 자기 수정(Self-Revision)으로 바꿔, 밀도 높은 지도 신호를 만드는 방법을 제안합니다. RL/선호학습의 보상 희소성 문제를 완화할 수 있는 중요한 연구입니다.

https://x.com/fly51fly/status/2045620305318806007

#selfdistillation #reinforcementlearning #densesupervision #llm #arxiv

fly51fly (@fly51fly)

롱컨텍스트 적응을 위한 새로운 연구로, RoPE를 변형한 self-distillation 방식인 'Shuffle the Context'를 제안합니다. 긴 문맥을 더 잘 처리하도록 LLM을 적응시키는 기술로, 장문 추론·문서 이해 성능 개선에 의미가 있습니다.

https://x.com/fly51fly/status/2045258532795428922

#longcontext #rope #selfdistillation #llm #research

fly51fly (@fly51fly) on X

[CL] Shuffle the Context: RoPE-Perturbed Self-Distillation for Long-Context Adaptation Z Li, C Liang, L Ren, T Zhao… [Georgia Institute of Technology & Microsoft] (2026) https://t.co/X3O9rBH5nf

X (formerly Twitter)

fly51fly (@fly51fly)

Apple 연구진이 아주 단순한 자기 증류(self-distillation)만으로 코드 생성 성능을 높일 수 있음을 보였다. 복잡한 기법 없이도 모델이 더 나은 코드를 생성하도록 개선할 수 있어, 코드 LLM 학습과 최적화에 유용한 실마리를 제공한다.

https://x.com/fly51fly/status/2040548383577051317

#codegeneration #selfdistillation #apple #llm #research

fly51fly (@fly51fly) on X

[CL] Embarrassingly Simple Self-Distillation Improves Code Generation R Zhang, R H Bai, H Zheng, N Jaitly… [Apple] (2026) https://t.co/6FMSB7rHdL

X (formerly Twitter)

Apple: Embarrassingly Simple Self-Distillation Improves Code Generation

https://arxiv.org/abs/2604.01193

#HackerNews #Apple #SelfDistillation #CodeGeneration #MachineLearning #TechNews

Embarrassingly Simple Self-Distillation Improves Code Generation

Can a large language model (LLM) improve at code generation using only its own raw outputs, without a verifier, a teacher model, or reinforcement learning? We answer in the affirmative with simple self-distillation (SSD): sample solutions from the model with certain temperature and truncation configurations, then fine-tune on those samples with standard supervised fine-tuning. SSD improves Qwen3-30B-Instruct from 42.4% to 55.3% pass@1 on LiveCodeBench v6, with gains concentrating on harder problems, and it generalizes across Qwen and Llama models at 4B, 8B, and 30B scale, including both instruct and thinking variants. To understand why such a simple method can work, we trace these gains to a precision-exploration conflict in LLM decoding and show that SSD reshapes token distributions in a context-dependent way, suppressing distractor tails where precision matters while preserving useful diversity where exploration matters. Taken together, SSD offers a complementary post-training direction for improving LLM code generation.

arXiv.org

97 часов на одной RTX 4090: MoE с подключаемыми экспертами, самодистилляция и почему перплексия — плохая метрика

Всё началось с простой идеи: что если подключать к языковой модели новые «навыки» как приложения к смартфону — без переобучения, без деградации, за полчаса? Я потратил 22 шага экспериментов и 97.5 GPU-часов на одной видеокарте, чтобы это проверить. Архитектура заработала идеально. А потом выяснилось, что модель, которая говорит на языке математики, совершенно не умеет решать задачи. Это история о том, как красивая метрика обманула исследователя.

https://habr.com/ru/companies/borisovai/articles/1010470/

#mixture_of_experts #moe #selfdistillation #dynamic_architecture #llm #research

97 часов на одной RTX 4090: MoE с подключаемыми экспертами, самодистилляция и почему перплексия — плохая метрика

Меня зовут Борисов Павел, занимаюсь ML-исследованиями. Последние месяцы ковырялся с архитектурой MoE, где эксперты подключаются поверх замороженной модели. 22 эксперимента на одной RTX 4090, ниже...

Хабр

97 часов на одной RTX 4090: как я учил нейросеть улучшать саму себя — и что пошло не так

Всё началось с простой идеи: что если подключать к языковой модели новые «навыки» как приложения к смартфону — без переобучения, без деградации, за полчаса? Я потратил 22 шага экспериментов и 97.5 GPU-часов на одной видеокарте, чтобы это проверить. Архитектура заработала идеально. А потом выяснилось, что модель, которая говорит на языке математики, совершенно не умеет решать задачи. Это история о том, как красивая метрика обманула исследователя, и как модель в итоге нашла выход сама.

https://habr.com/ru/articles/1005168/

#mixture_of_experts #moe #selfdistillation #dynamic_architecture #pytorch #llm #research

97 часов на одной RTX 4090: как я учил нейросеть улучшать саму себя — и что пошло не так

Всё началось с простой идеи: что если подключать к языковой модели новые «навыки» как приложения к смартфону — без переобучения, без деградации, за полчаса? Я потратил 22 шага экспериментов и 97.5...

Хабр

Jonas Hübotter (@jonashuebotter)

Latent Space 팟캐스트(@latentspacepod)에서 Ted Kyi가 진행한 self-distillation(자기증류) 토론을 발견했다는 소식입니다. 발언자는 자기증류에 대한 훌륭한 정리와 향후 연구·적용 가능성에 대한 기대감을 표명하고 있습니다. 연구자·개발자 관점에서 관심을 끌 만한 내용입니다.

https://x.com/jonashuebotter/status/2023112694015173082

#selfdistillation #research #podcast #latentspace

Jonas Hübotter (@jonashuebotter) on X

Just came across this great discussion of self-distillation on @latentspacepod! Really good run down by Ted Kyi and we’re every bit excited about what’s next as he is! https://t.co/G5LrWlOT8B

X (formerly Twitter)

fly51fly (@fly51fly)

ETH Zurich 연구진의 'Reinforcement Learning via Self-Distillation'(2026) arXiv 논문이 공개되었습니다. 논문은 강화학습에 자기증류(self-distillation)를 접목한 방법론을 제안하며 관련 링크(arXiv)가 제공됩니다. 저자로 J Hübotter, F Lübeck, L Behric, A Baumann이 표기되어 있습니다.

https://x.com/fly51fly/status/2016992229261529388

#reinforcementlearning #selfdistillation #research #arxiv

fly51fly (@fly51fly) on X

[LG] Reinforcement Learning via Self-Distillation J Hübotter, F Lübeck, L Behric, A Baumann... [ETH Zurich] (2026) https://t.co/JNuKhITPFd

X (formerly Twitter)