Simple self-distillation improves code generation

https://arxiv.org/abs/2604.01193

#arxiv

Embarrassingly Simple Self-Distillation Improves Code Generation

Can a large language model (LLM) improve at code generation using only its own raw outputs, without a verifier, a teacher model, or reinforcement learning? We answer in the affirmative with simple self-distillation (SSD): sample solutions from the model with certain temperature and truncation configurations, then fine-tune on those samples with standard supervised fine-tuning. SSD improves Qwen3-30B-Instruct from 42.4% to 55.3% pass@1 on LiveCodeBench v6, with gains concentrating on harder problems, and it generalizes across Qwen and Llama models at 4B, 8B, and 30B scale, including both instruct and thinking variants. To understand why such a simple method can work, we trace these gains to a precision-exploration conflict in LLM decoding and show that SSD reshapes token distributions in a context-dependent way, suppressing distractor tails where precision matters while preserving useful diversity where exploration matters. Taken together, SSD offers a complementary post-training direction for improving LLM code generation.

arXiv.org

fly51fly (@fly51fly)

대형 언어모델이 가르치는 과정에서 상대방의 정신 상태를 추론하는지(mentalize) 분석한 2026년 논문이 공개되었습니다. 인간의 teaching 상호작용을 모사하는 LLM의 인지적 행동을 다뤄, 모델 해석과 인간유사 추론 능력 연구에 의미가 있습니다.

https://x.com/fly51fly/status/2040187062683582641

#llm #mentalization #airesearch #languagemodels #arxiv

fly51fly (@fly51fly) on X

[AI] Do Large Language Models Mentalize When They Teach? S K. Harootonian, M K. Ho, T L. Griffiths, Y Niv… [Princeton University & New York University] (2026) https://t.co/VsoXwbYsAf

X (formerly Twitter)

fly51fly (@fly51fly)

MIT와 NUS 연구진이 오픈엔드 발견을 위한 자율 멀티에이전트 진화 시스템 CORAL을 제안했습니다. 멀티에이전트가 스스로 진화하며 탐색하는 새로운 AI 연구로, 자율 에이전트와 오픈엔드 탐색 분야에서 주목할 만합니다.

https://x.com/fly51fly/status/2040189285471760609

#coral #multiagent #research #autonomousai #arxiv

fly51fly (@fly51fly) on X

[LG] CORAL: Towards Autonomous Multi-Agent Evolution for Open-Ended Discovery A Qu, H Zheng, Z Zhou, Y Yan… [MIT & NUS] (2026) https://t.co/9ZCSBtIwF4

X (formerly Twitter)

The case for zero-error horizons in trustworthy LLMs

https://arxiv.org/abs/2601.15714

#arxiv #llm #llms

Even GPT-5.2 Can't Count to Five: The Case for Zero-Error Horizons in Trustworthy LLMs

We propose Zero-Error Horizon (ZEH) for trustworthy LLMs, which represents the maximum range that a model can solve without any errors. While ZEH itself is simple, we demonstrate that evaluating the ZEH of state-of-the-art LLMs yields abundant insights. For example, by evaluating the ZEH of GPT-5.2, we found that GPT-5.2 cannot even compute the parity of a short string like 11000, and GPT-5.2 cannot determine whether the parentheses in ((((()))))) are balanced. This is surprising given the excellent capabilities of GPT-5.2. The fact that LLMs make mistakes on such simple problems serves as an important lesson when applying LLMs to safety-critical domains. By applying ZEH to Qwen2.5 and conducting detailed analysis, we found that while ZEH correlates with accuracy, the detailed behaviors differ, and ZEH provides clues about the emergence of algorithmic capabilities. Finally, while computing ZEH incurs significant computational cost, we discuss how to mitigate this cost by achieving up to one order of magnitude speedup using tree structures and online softmax.

arXiv.org

Today on the #arXiv:

Nolan et al. 2026, "Planetary Radar at the Arecibo Observatory" - https://arxiv.org/abs/2604.00332

@mikeynolan , @lynncarter, and @PlanetTreky review everything that was done by #TeamRadar at Arecibo.

Planetary Radar at the Arecibo Observatory

In the late 1990s, the Arecibo Observatory and its planetary radar system were upgraded to increase sensitivity by a factor of 20. This upgrade substantially improved the quality of the data and the ability to observe terrestrial planets, outer planet satellites, planetary rings, and near-Earth objects until the telescope's collapse in 2020. The higher sensitivity allowed radar observations of 889 near-Earth asteroids and comets from 1997 to 2020, compared to the 40 achieved in the previous 30 years, and showed that the population of near-Earth asteroids is heterogeneous, suggesting a wide variety of formation and evolution mechanisms. The planetary radar's ability to see through the atmospheres of Venus and Titan, into the shadows of Mercury and the Moon, and under the surface of the Moon and Mars provided a unique perspective on those bodies that has driven in-situ exploration. No other existing or planned facility matches the sensitivity that Arecibo had.

arXiv.org

Surprisingly elaborated and, actually, interesting #AprilFools prank paper at #arXiv #astro-ph today, by @Tiylaya :

Where to Search For Life: Evidence from narrative sources with established predictive efficacy:
https://arxiv.org/abs/2603.28883

#academicchatter #astronomy #SciFi

Where to Search For Life: Evidence from narrative sources with established predictive efficacy

The search for habitable planets, and even for ``Earth 2.0'', is a major driver in contemporary astronomy. However selecting target fields to prioritise for such searches presents a challenge. Here we establish a statistical analysis of the appearance of constellation names in science fiction magazines of the pulp era, evaluating the most commonly mentioned constellations and thus those which the science fiction community collectively identify as the most likely locations to find life. Given that the predictive power of science fiction is well established, we suggest that these locations might be prioritised by searches for extrasolar biospheres.

arXiv.org
"Mexican Burrowing Toads as gravitational wave detectors" by Frederic V. Hessman, Christian Jooss https://arxiv.org/abs/2603.29334 #arxiv #gravitationalWave #futurenobel
Mexican Burrowing Toads as gravitational wave detectors

It is generally assumed that gravitational waves are extremely difficult to detect. However, we show that the call of the Mexican Burrowing Toad has an amazing resemblance to cosmic gravitational wave signals due to the merging of neutron stars and/or black holes. It is known that toads exhibit magnetoreception - the ability to detect magnetic fields - and that magnetic fields thus subtly affect ion channel activities in toad neurons. We speculate that gravitational strains produce phonons and magnons in a ferromagnetic substance embedded in the nervous system of the toads and that these coherent signals are exponentially amplified by a Raman laser mechanism to the point where they can be detected. The fine tuning necessary for this mechanism to work would help to explain why this species of toad show this remarkable ability and others do not. We analyze the sound of a pond full of Mexican Burrowing Toads in the hopes of detecting slight phase shifts in their calls due to a gravitational wave event. No effect was found and the the LIGO/VIRGO consortia have not reported an event during the recording, illustrating the power of this approach. We suggest the massive use of these toads would be an inexpensive way to support the operation of optical interferometric gravitational wave detector facilities.

arXiv.org

TinyLoRA – Learning to Reason in 13 Parameters

https://arxiv.org/abs/2602.04118

#arxiv

Learning to Reason in 13 Parameters

Recent research has shown that language models can learn to \textit{reason}, often via reinforcement learning. Some work even trains low-rank parameterizations for reasoning, but conventional LoRA cannot scale below the model dimension. We question whether even rank=1 LoRA is necessary for learning to reason and propose TinyLoRA, a method for scaling low-rank adapters to sizes as small as one parameter. Within our new parameterization, we are able to train the 8B parameter size of Qwen2.5 to 91\% accuracy on GSM8K with only 13 trained parameters in bf16 (26 total bytes). We find this trend holds in general: we are able to recover 90\% of performance improvements while training $1000x$ fewer parameters across a suite of more difficult learning-to-reason benchmarks such as AIME, AMC, and MATH500. Notably, we are only able to achieve such strong performance with RL: models trained using SFT require $100-1000x$ larger updates to reach the same performance.

arXiv.org

Agentic AI and the next intelligence explosion

https://arxiv.org/abs/2603.20639

#ai #arxiv

Agentic AI and the next intelligence explosion

The "AI singularity" is often miscast as a monolithic, godlike mind. Evolution suggests a different path: intelligence is fundamentally plural, social, and relational. Recent advances in agentic AI reveal that frontier reasoning models, such as DeepSeek-R1, do not improve simply by "thinking longer". Instead, they simulate internal "societies of thought," spontaneous cognitive debates that argue, verify, and reconcile to solve complex tasks. Moreover, we are entering an era of human-AI centaurs: hybrid actors where collective agency transcends individual control. Scaling this intelligence requires shifting from dyadic alignment (RLHF) toward institutional alignment. By designing digital protocols, modeled on organizations and markets, we can build a social infrastructure of checks and balances. The next intelligence explosion will not be a single silicon brain, but a complex, combinatorial society specializing and sprawling like a city. No mind is an island.

arXiv.org

In these times, I think such a change in the governance of public-interest knowledge bases is an important signal.

https://tech.cornell.edu/arxiv/

#arxiv #knowledge #openAccess #scholarship #academia

Cornell Tech - arXiv

Cornell Tech