Even GPT-5.2 Can't Count to Five: Zero-Error Horizons in Trustworthy LLMs
https://arxiv.org/abs/2601.15714
#arxiv #llm #llms

Even GPT-5.2 Can't Count to Five: The Case for Zero-Error Horizons in Trustworthy LLMs
We propose Zero-Error Horizon (ZEH) for trustworthy LLMs, which represents the maximum range that a model can solve without any errors. While ZEH itself is simple, we demonstrate that evaluating the ZEH of state-of-the-art LLMs yields abundant insights. For example, by evaluating the ZEH of GPT-5.2, we found that GPT-5.2 cannot even compute the parity of a short string like 11000, and GPT-5.2 cannot determine whether the parentheses in ((((()))))) are balanced. This is surprising given the excellent capabilities of GPT-5.2. The fact that LLMs make mistakes on such simple problems serves as an important lesson when applying LLMs to safety-critical domains. By applying ZEH to Qwen2.5 and conducting detailed analysis, we found that while ZEH correlates with accuracy, the detailed behaviors differ, and ZEH provides clues about the emergence of algorithmic capabilities. Finally, while computing ZEH incurs significant computational cost, we discuss how to mitigate this cost by achieving up to one order of magnitude speedup using tree structures and online softmax.
arXiv.orgToday on the #arXiv:
Nolan et al. 2026, "Planetary Radar at the Arecibo Observatory" - https://arxiv.org/abs/2604.00332
@mikeynolan , @lynncarter, and @PlanetTreky review everything that was done by #TeamRadar at Arecibo.

Planetary Radar at the Arecibo Observatory
In the late 1990s, the Arecibo Observatory and its planetary radar system were upgraded to increase sensitivity by a factor of 20. This upgrade substantially improved the quality of the data and the ability to observe terrestrial planets, outer planet satellites, planetary rings, and near-Earth objects until the telescope's collapse in 2020. The higher sensitivity allowed radar observations of 889 near-Earth asteroids and comets from 1997 to 2020, compared to the 40 achieved in the previous 30 years, and showed that the population of near-Earth asteroids is heterogeneous, suggesting a wide variety of formation and evolution mechanisms. The planetary radar's ability to see through the atmospheres of Venus and Titan, into the shadows of Mercury and the Moon, and under the surface of the Moon and Mars provided a unique perspective on those bodies that has driven in-situ exploration. No other existing or planned facility matches the sensitivity that Arecibo had.
arXiv.orgSurprisingly elaborated and, actually, interesting #AprilFools prank paper at #arXiv #astro-ph today, by @Tiylaya :
Where to Search For Life: Evidence from narrative sources with established predictive efficacy:
https://arxiv.org/abs/2603.28883
#academicchatter #astronomy #SciFi

Where to Search For Life: Evidence from narrative sources with established predictive efficacy
The search for habitable planets, and even for ``Earth 2.0'', is a major driver in contemporary astronomy. However selecting target fields to prioritise for such searches presents a challenge. Here we establish a statistical analysis of the appearance of constellation names in science fiction magazines of the pulp era, evaluating the most commonly mentioned constellations and thus those which the science fiction community collectively identify as the most likely locations to find life. Given that the predictive power of science fiction is well established, we suggest that these locations might be prioritised by searches for extrasolar biospheres.
arXiv.org"Mexican Burrowing Toads as gravitational wave detectors" by Frederic V. Hessman, Christian Jooss
https://arxiv.org/abs/2603.29334 #arxiv #gravitationalWave #futurenobel
Mexican Burrowing Toads as gravitational wave detectors
It is generally assumed that gravitational waves are extremely difficult to detect. However, we show that the call of the Mexican Burrowing Toad has an amazing resemblance to cosmic gravitational wave signals due to the merging of neutron stars and/or black holes. It is known that toads exhibit magnetoreception - the ability to detect magnetic fields - and that magnetic fields thus subtly affect ion channel activities in toad neurons. We speculate that gravitational strains produce phonons and magnons in a ferromagnetic substance embedded in the nervous system of the toads and that these coherent signals are exponentially amplified by a Raman laser mechanism to the point where they can be detected. The fine tuning necessary for this mechanism to work would help to explain why this species of toad show this remarkable ability and others do not. We analyze the sound of a pond full of Mexican Burrowing Toads in the hopes of detecting slight phase shifts in their calls due to a gravitational wave event. No effect was found and the the LIGO/VIRGO consortia have not reported an event during the recording, illustrating the power of this approach. We suggest the massive use of these toads would be an inexpensive way to support the operation of optical interferometric gravitational wave detector facilities.
arXiv.orgTinyLoRA – Learning to Reason in 13 Parameters
https://arxiv.org/abs/2602.04118
#arxiv

Learning to Reason in 13 Parameters
Recent research has shown that language models can learn to \textit{reason}, often via reinforcement learning. Some work even trains low-rank parameterizations for reasoning, but conventional LoRA cannot scale below the model dimension. We question whether even rank=1 LoRA is necessary for learning to reason and propose TinyLoRA, a method for scaling low-rank adapters to sizes as small as one parameter. Within our new parameterization, we are able to train the 8B parameter size of Qwen2.5 to 91\% accuracy on GSM8K with only 13 trained parameters in bf16 (26 total bytes). We find this trend holds in general: we are able to recover 90\% of performance improvements while training $1000x$ fewer parameters across a suite of more difficult learning-to-reason benchmarks such as AIME, AMC, and MATH500. Notably, we are only able to achieve such strong performance with RL: models trained using SFT require $100-1000x$ larger updates to reach the same performance.
arXiv.orgAgentic AI and the next intelligence explosion
https://arxiv.org/abs/2603.20639
#ai #arxiv

Agentic AI and the next intelligence explosion
The "AI singularity" is often miscast as a monolithic, godlike mind. Evolution suggests a different path: intelligence is fundamentally plural, social, and relational. Recent advances in agentic AI reveal that frontier reasoning models, such as DeepSeek-R1, do not improve simply by "thinking longer". Instead, they simulate internal "societies of thought," spontaneous cognitive debates that argue, verify, and reconcile to solve complex tasks. Moreover, we are entering an era of human-AI centaurs: hybrid actors where collective agency transcends individual control. Scaling this intelligence requires shifting from dyadic alignment (RLHF) toward institutional alignment. By designing digital protocols, modeled on organizations and markets, we can build a social infrastructure of checks and balances. The next intelligence explosion will not be a single silicon brain, but a complex, combinatorial society specializing and sprawling like a city. No mind is an island.
arXiv.orgIn these times, I think such a change in the governance of public-interest knowledge bases is an important signal.
https://tech.cornell.edu/arxiv/
#arxiv #knowledge #openAccess #scholarship #academia
Mathematical methods and human thought in the age of AI
https://arxiv.org/abs/2603.26524
#ai #arxiv

Mathematical methods and human thought in the age of AI
Artificial intelligence (AI) is the name popularly given to a broad spectrum of computer tools designed to perform increasingly complex cognitive tasks, including many that used to solely be the province of humans. As these tools become exponentially sophisticated and pervasive, the justifications for their rapid development and integration into society are frequently called into question, particularly as they consume finite resources and pose existential risks to the livelihoods of those skilled individuals they appear to replace.
In this paper, we consider the rapidly evolving impact of AI to the traditional questions of philosophy
with an emphasis on its application in mathematics and on the broader real-world outcomes of its more general use. We assert that artificial intelligence is a natural evolution of human tools developed throughout history to facilitate the creation, organization, and dissemination of ideas, and argue that it is paramount that the development and application of AI remain fundamentally human-centered. With an eye toward innovating solutions to meet human needs, enhancing the human quality of life and expanding the capacity for human thought and understanding, we propose a pathway to integrating AI into our most challenging and intellectually rigorous fields to the benefit of all humankind.
arXiv.orgfly51fly (@fly51fly)
NVIDIA 연구진이 발표한 ‘AVO: Agentic Variation Operators for Autonomous Evolutionary Search’ 논문을 공유한 트윗이다. 자율적 진화 탐색을 위한 에이전틱 변이 연산자를 제안하는 연구로, AI 탐색·최적화 분야의 새로운 방법론을 다룬다.
https://x.com/fly51fly/status/2038379058887807158
#nvidia #arxiv #agents #optimization #research

fly51fly (@fly51fly) on X
[LG] AVO: Agentic Variation Operators for Autonomous Evolutionary Search
T Chen, Z Ye, B Xu, Z Ye… [NVIDIA] (2026)
https://t.co/AGErre448p
X (formerly Twitter)fly51fly (@fly51fly)
AI 추론 과정을 얼마나 읽기 쉽게 표현할 수 있는지 측정하는 연구 논문이 공개되었습니다. 사람의 이해가 다른 사람의 추론 과정을 가르치는 데 도움이 되는지 검토하며, 모델의 reasoning trace 해석 가능성과 교육 가능성을 다룹니다.
https://x.com/fly51fly/status/2036563955670458442
#reasoning #interpretability #llm #research #arxiv

fly51fly (@fly51fly) on X
[CL] Measuring Reasoning Trace Legibility: Can Those Who Understand Teach?
D Roytburg, S Sridhar, D Ippolito [CMU] (2026)
https://t.co/cGNDwYubsR
X (formerly Twitter)