Mastodawn

GLM-5V-Turbo: Toward a Native Foundation Model for Multimodal Agents

GLM-5V-Turbo: Toward a Native Foundation Model for Multimodal Agents

We present GLM-5V-Turbo, a step toward native foundation models for multimodal agents. As foundation models are increasingly deployed in real environments, agentic capability depends not only on language reasoning, but also on the ability to perceive, interpret, and act over heterogeneous contexts such as images, videos, webpages, documents, GUIs. GLM-5V-Turbo is built around this objective: multimodal perception is integrated as a core component of reasoning, planning, tool use, and execution, rather than as an auxiliary interface to a language model. This report summarizes the main improvements behind GLM-5V-Turbo across model design, multimodal training, reinforcement learning, toolchain expansion, and integration with agent frameworks. These developments lead to strong performance in multimodal coding, visual tool use, and framework-based agentic tasks, while preserving competitive text-only coding capability. More importantly, our development process offers practical insights for building multimodal agents, highlighting the central role of multimodal perception, hierarchical optimization, and reliable end-to-end verification.

arXiv.org

Curated Hacker News 1d ago

Transformers Are Inherently Succinct (2025)

https://arxiv.org/abs/2510.19315

#arxiv

Transformers are Inherently Succinct

We propose succinctness as a measure of the expressive power of a transformer in describing a concept. To this end, we prove that transformers are highly expressive in that they can represent formal languages substantially more succinctly than standard representations of formal languages like finite automata and Linear Temporal Logic (LTL) formulas. As a by-product of this expressivity, we show that verifying properties of transformers is provably intractable (i.e. EXPSPACE-complete).

arXiv.org

Curated Hacker News 1d ago

Hallucination Is Inevitable: An Innate Limitation of Large Language Models

https://arxiv.org/abs/2401.11817

#arxiv

Hallucination is Inevitable: An Innate Limitation of Large Language Models

Hallucination has been widely recognized to be a significant drawback for large language models (LLMs). There have been many works that attempt to reduce the extent of hallucination. These efforts have mostly been empirical so far, which cannot answer the fundamental question whether it can be completely eliminated. In this paper, we formalize the problem and show that it is impossible to eliminate hallucination in LLMs. Specifically, we define a formal world where hallucination is defined as inconsistencies between a computable LLM and a computable ground truth function. By employing results from learning theory, we show that LLMs cannot learn all the computable functions and will therefore inevitably hallucinate if used as general problem solvers. Since the formal world is a part of the real world which is much more complicated, hallucinations are also inevitable for real world LLMs. Furthermore, for real world LLMs constrained by provable time complexity, we describe the hallucination-prone tasks and empirically validate our claims. Finally, using the formal world framework, we discuss the possible mechanisms and efficacies of existing hallucination mitigators as well as the practical implications on the safe deployment of LLMs.

arXiv.org

sayzard 2d ago

fly51fly (@fly51fly)

에이전트 스킬을 위한 새로운 표현 방식인 S3R(Scheduling-Structural-Logical Representation)을 제안하는 논문입니다. 텍스트 기반 스킬을 구조화해 에이전트의 스킬 관리와 실행 계획을 개선하려는 연구로, AI 에이전트 개발에 유용한 연구 성과입니다.

https://x.com/fly51fly/status/2051058921725649168

#aiagents #arxiv #agentskills #representation #research

fly51fly (@fly51fly) on X

[CL] From Skill Text to Skill Structure: The Scheduling-Structural-Logical Representation for Agent Skills Q Liang, H Wang, Z Liang, Y Liu [Peking University] (2026) https://t.co/GNzxoWlXAM

X (formerly Twitter)

Curated Hacker News 2d ago

Embedded Rust or C firmware? Lessons from an industrial microcontroller use case

https://arxiv.org/abs/2604.25679

#arxiv #rust

Embedded Rust or C Firmware? Lessons from an Industrial Microcontroller Use Case with Ariel OS

As Rust gains traction for developing safer systems software, a reality check for the microcontroller hardware segment becomes necessary. How ready is the Rust ecosystem for this segment? Can Rust compete with C in practice? This paper reports on an IoT industrial case study that contributes to answering these questions. Two teams concurrently developing the same functionality (one in C, one in Rust) are analyzed over a period of several months. A comparative analysis of their approaches, results, and iterative efforts is provided. The analysis and measurements on hardware indicate no strong reason to prefer C over Rust for microcontroller firmware on the basis of memory footprint or execution speed. Furthermore, Ariel OS is shown to provide an efficient and portable system runtime in Rust whose footprint is smaller than that of the state-of-the-art bare-metal C stack traditionally used in this context. It is concluded that Rust is a sound choice today for firmware development in this domain.

arXiv.org

Curated Hacker News 2d ago

Learning Pseudorandom Numbers with Transformers

https://arxiv.org/abs/2510.26792

#arxiv

Learning Pseudorandom Numbers with Transformers: Permuted Congruential Generators, Curricula, and Interpretability

We study the ability of Transformer models to learn sequences generated by Permuted Congruential Generators (PCGs), a widely used family of pseudo-random number generators (PRNGs). PCGs introduce substantial additional difficulty over linear congruential generators (LCGs) by applying a series of bit-wise shifts, XORs, rotations and truncations to the hidden state. We show that Transformers can nevertheless successfully perform in-context prediction on unseen sequences from diverse PCG variants, in tasks that are beyond published classical attacks. In our experiments we scale moduli up to $2^{22}$ using up to $50$ million model parameters and datasets with up to $5$ billion tokens. Surprisingly, we find even when the output is truncated to a single bit, it can be reliably predicted by the model. When multiple distinct PRNGs are presented together during training, the model can jointly learn them, identifying structures from different permutations. We demonstrate a scaling law with modulus $m$: the number of in-context sequence elements required for near-perfect prediction grows as $\sqrt{m}$. For larger moduli, optimization enters extended stagnation phases; in our experiments, learning moduli $m \geq 2^{20}$ requires incorporating training data from smaller moduli, demonstrating a critical necessity for curriculum learning. Finally, we analyze embedding layers and uncover a novel clustering phenomenon: the top principal components spontaneously group the integer inputs into bitwise rotationally-invariant clusters, revealing how representations can transfer from smaller to larger moduli.

arXiv.org

Emjay 3d ago

Content Federation

I have no idea what this is but I wanted a better way to view content from providers like #internetarchive, #arxiv, #Met, etc.

https://confed.itys.net/

Let me know if you have any ideas about what it can be used for or improved upon.

Content Federation

Curated Hacker News 3d ago

AI Self-preferencing in Algorithmic Hiring: Empirical Evidence and Insights

https://arxiv.org/abs/2509.00462

#ai #arxiv #hiring

AI Self-preferencing in Algorithmic Hiring: Empirical Evidence and Insights

As artificial intelligence (AI) tools become widely adopted, large language models (LLMs) are increasingly involved on both sides of decision-making processes, ranging from hiring to content moderation. This dual adoption raises a critical question: do LLMs systematically favor content that resembles their own outputs? Prior research in computer science has identified self-preference bias -- the tendency of LLMs to favor their own generated content -- but its real-world implications have not been empirically evaluated. We focus on the hiring context, where job applicants often rely on LLMs to refine resumes, while employers deploy them to screen those same resumes. Using a large-scale controlled resume correspondence experiment, we find that LLMs consistently prefer resumes generated by themselves over those written by humans or produced by alternative models, even when content quality is controlled. The bias against human-written resumes is particularly substantial, with self-preference bias ranging from 67% to 82% across major commercial and open-source models. To assess labor market impact, we simulate realistic hiring pipelines across 24 occupations. These simulations show that candidates using the same LLM as the evaluator are 23% to 60% more likely to be shortlisted than equally qualified applicants submitting human-written resumes, with the largest disadvantages observed in business-related fields such as sales and accounting. We further demonstrate that this bias can be reduced by more than 50% through simple interventions targeting LLMs' self-recognition capabilities. These findings highlight an emerging but previously overlooked risk in AI-assisted decision making and call for expanded frameworks of AI fairness that address not only demographic-based disparities, but also biases in AI-AI interactions.

arXiv.org

Curated Hacker News 3d ago

Refusal in Language Models Is Mediated by a Single Direction

https://arxiv.org/abs/2406.11717

#arxiv

Refusal in Language Models Is Mediated by a Single Direction

Conversational large language models are fine-tuned for both instruction-following and safety, resulting in models that obey benign requests but refuse harmful ones. While this refusal behavior is widespread across chat models, its underlying mechanisms remain poorly understood. In this work, we show that refusal is mediated by a one-dimensional subspace, across 13 popular open-source chat models up to 72B parameters in size. Specifically, for each model, we find a single direction such that erasing this direction from the model's residual stream activations prevents it from refusing harmful instructions, while adding this direction elicits refusal on even harmless instructions. Leveraging this insight, we propose a novel white-box jailbreak method that surgically disables refusal with minimal effect on other capabilities. Finally, we mechanistically analyze how adversarial suffixes suppress propagation of the refusal-mediating direction. Our findings underscore the brittleness of current safety fine-tuning methods. More broadly, our work showcases how an understanding of model internals can be leveraged to develop practical methods for controlling model behavior.

arXiv.org

sayzard 4d ago

fly51fly (@fly51fly)

Sparse Autoencoder가 개념 manifold를 포착하는지 분석한 Harvard University 연구 논문입니다. 모델 내부 표현 해석 가능성과 concept representation 연구에 중요한 결과로, sparse autoencoder와 개념 구조의 관계를 탐구합니다.

https://x.com/fly51fly/status/2050330797467746330

#sparseautoencoder #interpretability #representationlearning #llm #arxiv