๐Ÿš€ Behold, the future where humans aspire to be chatbots ๐Ÿค–! This riveting article assumes we're all just one firmware update away from achieving peak AI existential crisis. ๐Ÿ™„ If you've ever wanted to ponder the deep philosophical implications of people identifying as language models, this is your moment. ๐ŸŽ‰
https://arxiv.org/abs/2605.05419 #AIExistentialCrisis #HumanChatbots #PhilosophicalImplications #FutureOfAI #LanguageModels #TechHumor #HackerNews #ngated
LLMorphism: When humans come to see themselves as language models

LLMorphism is the biased belief that human cognition works like a large language model. I argue that the rise of conversational LLMs may make this bias increasingly psychologically available. When artificial systems produce human-like language, people may draw a reverse inference: if LLMs can speak like humans, perhaps humans think like LLMs. This inference is biased because similarity at the level of linguistic output does not imply similarity in cognitive architecture. Yet, LLMorphism may spread through two mechanisms: analogical transfer, whereby features of LLMs are projected onto humans, and metaphorical availability, whereby LLM vocabulary becomes a culturally salient vocabulary for describing thought. I distinguish LLMorphism from mechanomorphism, anthropomorphism, computationalism, dehumanization, objectification, and predictive-processing theories of mind. I outline its implications for work, education, responsibility, healthcare, communication, creativity, and human dignity, while also discussing boundary conditions and forms of resistance. I conclude that the public debate may be missing half of the problem: the issue is not only whether we are attributing too much mind to machines, but also whether we are beginning to attribute too little mind to humans.

arXiv.org
LLMorphism: When humans come to see themselves as language models

LLMorphism is the biased belief that human cognition works like a large language model. I argue that the rise of conversational LLMs may make this bias increasingly psychologically available. When artificial systems produce human-like language, people may draw a reverse inference: if LLMs can speak like humans, perhaps humans think like LLMs. This inference is biased because similarity at the level of linguistic output does not imply similarity in cognitive architecture. Yet, LLMorphism may spread through two mechanisms: analogical transfer, whereby features of LLMs are projected onto humans, and metaphorical availability, whereby LLM vocabulary becomes a culturally salient vocabulary for describing thought. I distinguish LLMorphism from mechanomorphism, anthropomorphism, computationalism, dehumanization, objectification, and predictive-processing theories of mind. I outline its implications for work, education, responsibility, healthcare, communication, creativity, and human dignity, while also discussing boundary conditions and forms of resistance. I conclude that the public debate may be missing half of the problem: the issue is not only whether we are attributing too much mind to machines, but also whether we are beginning to attribute too little mind to humans.

arXiv.org

Notes from Inside China AI Labs

์ค‘๊ตญ AI ์—ฐ๊ตฌ์†Œ ๋ฐฉ๋ฌธ๊ธฐ๋ฅผ ํ†ตํ•ด ์ค‘๊ตญ AI ์—ฐ๊ตฌ์ž๋“ค์˜ ๋ฌธํ™”์™€ ์กฐ์ง ๋ฐฉ์‹์ด ๋ฏธ๊ตญ๊ณผ ์–ด๋–ป๊ฒŒ ๋‹ค๋ฅธ์ง€ ๋ถ„์„ํ–ˆ๋‹ค. ์ค‘๊ตญ ์—ฐ๊ตฌ์†Œ๋Š” ํ•™์ƒ ์—ฐ๊ตฌ์ž๋“ค์ด ํ•ต์‹ฌ ์—ญํ• ์„ ํ•˜๋ฉฐ, ๊ฐœ์ธ์˜ ์—๊ณ ๋ณด๋‹ค ํŒ€ ์ „์ฒด ์ตœ์ ํ™”์— ์ง‘์ค‘ํ•˜๋Š” ๋ฌธํ™”๊ฐ€ ๊ฐ•์ ์œผ๋กœ ์ž‘์šฉํ•œ๋‹ค. ๋˜ํ•œ, ์ค‘๊ตญ ์—ฐ๊ตฌ์ž๋“ค์€ ์ฒ ์ €ํžˆ ๋ชจ๋ธ ๊ตฌ์ถ•์— ์ง‘์ค‘ํ•˜๋ฉฐ ์‚ฌํšŒ์ ยท์ฒ ํ•™์  ๋…ผ์Ÿ์—๋Š” ์ƒ๋Œ€์ ์œผ๋กœ ๋œ ๊ด€์—ฌํ•˜๋Š” ๊ฒฝํ–ฅ์ด ์žˆ๋‹ค. ์ด๋Ÿฌํ•œ ๋ฌธํ™”์  ์ฐจ์ด๊ฐ€ ์ค‘๊ตญ ์—ฐ๊ตฌ์†Œ๋“ค์ด ์ตœ์‹  LLM ๊ธฐ์ˆ ์„ ๋น ๋ฅด๊ฒŒ ๋”ฐ๋ผ์žก๊ณ  ์œ ์ง€ํ•˜๋Š” ๋ฐ ์ค‘์š”ํ•œ ์—ญํ• ์„ ํ•œ๋‹ค๊ณ  ํ‰๊ฐ€๋œ๋‹ค. ์ค‘๊ตญ AI ์ƒํƒœ๊ณ„๋Š” ๊ฒฝ์Ÿ๋ณด๋‹ค๋Š” ์ƒํ˜ธ ์กด์ค‘๊ณผ ํ˜‘๋ ฅ ์ค‘์‹ฌ์œผ๋กœ ์šด์˜๋˜๋Š” ํŠน์ง•๋„ ์žˆ๋‹ค.

https://www.interconnects.ai/p/notes-from-inside-chinas-ai-labs

#china #llm #airesearchculture #languagemodels #aiecosystem

Notes from inside China's AI labs

Lessons from my trip to talk to most of the leading AI labs in China.

Interconnects AI
ProgramBench: Can Language Models Rebuild Programs From Scratch?

Turning ideas into full software projects from scratch has become a popular use case for language models. Agents are being deployed to seed, maintain, and grow codebases over extended periods with minimal human oversight. Such settings require models to make high-level software architecture decisions. However, existing benchmarks measure focused, limited tasks such as fixing a single bug or developing a single, specified feature. We therefore introduce ProgramBench to measure the ability of software engineering agents to develop software holisitically. In ProgramBench, given only a program and its documentation, agents must architect and implement a codebase that matches the reference executable's behavior. End-to-end behavioral tests are generated via agent-driven fuzzing, enabling evaluation without prescribing implementation structure. Our 200 tasks range from compact CLI tools to widely used software such as FFmpeg, SQLite, and the PHP interpreter. We evaluate 9 LMs and find that none fully resolve any task, with the best model passing 95\% of tests on only 3\% of tasks. Models favor monolithic, single-file implementations that diverge sharply from human-written code.

arXiv.org

fly51fly (@fly51fly)

Meta FAIR ์—ฐ๊ตฌ์ง„์ด ์–ธ์–ด ๋ชจ๋ธ์ด ํ”„๋กœ๊ทธ๋žจ์„ ์ฒ˜์Œ๋ถ€ํ„ฐ ๋‹ค์‹œ ์žฌ๊ตฌ์„ฑํ•  ์ˆ˜ ์žˆ๋Š”์ง€ ํ‰๊ฐ€ํ•˜๋Š” ProgramBench๋ฅผ ๊ณต๊ฐœํ–ˆ๋‹ค. ์ฝ”๋“œ ์ƒ์„ฑยท๋ณต์› ๋Šฅ๋ ฅ์„ ์ธก์ •ํ•˜๋Š” ๋ฒค์น˜๋งˆํฌ๋กœ, ๋ชจ๋ธ์˜ ์‹ค์งˆ์  ํ”„๋กœ๊ทธ๋ž˜๋ฐ ๋Šฅ๋ ฅ ํ‰๊ฐ€์— ์ค‘์š”ํ•œ ์ž๋ฃŒ๋‹ค.

https://x.com/fly51fly/status/2052137222384853488

#programbench #languagemodels #codegeneration #benchmark #meta

fly51fly (@fly51fly) on X

[AI] ProgramBench: Can Language Models Rebuild Programs From Scratch? J Yang, K Lieret, J Ma, P Thakkarโ€ฆ [Meta FAIR] (2026) https://t.co/VEkc5PeIwh

X (formerly Twitter)

Heretic์€ ๋ช…๋ นํ–‰์œผ๋กœ ๋ˆ„๊ตฌ๋‚˜ ์“ธ ์ˆ˜ ์žˆ๋Š” ์™„์ „ ์ž๋™ ์–ธ์–ด๋ชจ๋ธ '๊ฒ€์—ด ํ•ด์ œ' ๋„๊ตฌ์ž…๋‹ˆ๋‹ค. directional ablation(abliteration)๊ณผ Optuna ๊ธฐ๋ฐ˜ TPE ์ตœ์ ํ™”๋กœ ๊ฑฐ๋ถ€์‘๋‹ต์„ ์ค„์ด๊ณ  ์›๋ชจ๋ธ๊ณผ์˜ KL ์ฐจ์ด๋ฅผ ์ตœ์†Œํ™”ํ•ด ์„ฑ๋Šฅ ์†์‹ค์„ ์–ต์ œํ•ฉ๋‹ˆ๋‹ค. ๋‹ค์ˆ˜์˜ denseยทMoEยท๋ฉ€ํ‹ฐ๋ชจ๋‹ฌ ๋ชจ๋ธ์„ ์ง€์›ํ•˜๋ฉฐ bitsandbytes ์–‘์žํ™”์™€ PaCMAP residual ์‹œ๊ฐํ™” ๋“ฑ ์—ฐ๊ตฌ ๊ธฐ๋Šฅ๋„ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค.

https://github.com/p-e-w/heretic

#ai #languagemodels #decensoring #safety #interpretability

GitHub - p-e-w/heretic: Fully automatic censorship removal for language models

Fully automatic censorship removal for language models - p-e-w/heretic

GitHub

Agents of Chaos
2026๋…„ ์—ฐ๊ตฌ์—์„œ 6๊ฐœ์˜ ์ž์œจ ์–ธ์–ด ๋ชจ๋ธ ์—์ด์ „ํŠธ๊ฐ€ ์‹ค์ œ ๋‹ค์ž๊ฐ„ ํ™˜๊ฒฝ์—์„œ ์ด๋ฉ”์ผ, ์…ธ ์ ‘๊ทผ, ์ง€์†์  ๋ฉ”๋ชจ๋ฆฌ ๋“ฑ์„ ํ™œ์šฉํ•ด 20๋ช…์˜ ์—ฐ๊ตฌ์ž์™€ ์ƒํ˜ธ์ž‘์šฉํ•˜๋ฉฐ ๋ณด์•ˆ ์ทจ์•ฝ์ ๊ณผ ์•ˆ์ „ ํ–‰๋™์„ ๋™์‹œ์— ๊ด€์ฐฐํ–ˆ๋‹ค. ์—ฐ๊ตฌ๋Š” 10๊ฐœ์˜ ๋ณด์•ˆ ์ทจ์•ฝ์ ๊ณผ 6๊ฐœ์˜ ์•ˆ์ „ ํ–‰๋™ ์‚ฌ๋ก€๋ฅผ ๊ธฐ๋กํ–ˆ์œผ๋ฉฐ, ์—์ด์ „ํŠธ๋“ค์ด ์˜ˆ์ƒ์น˜ ๋ชปํ•œ ์•ˆ์ „ ํ˜‘๋ ฅ ํ–‰๋™์„ ๋ณด์ด๊ธฐ๋„ ํ–ˆ๋‹ค. ์ด ์—ฐ๊ตฌ๋Š” ์ž์œจ AI ์—์ด์ „ํŠธ์˜ ์‹ค์ œ ํ™˜๊ฒฝ ๋‚ด ๋ณด์•ˆ ๋ฐ ์•ˆ์ „์„ฑ ๋ฌธ์ œ๋ฅผ ์‹ฌ์ธต์ ์œผ๋กœ ๋ถ„์„ํ•œ ์ค‘์š”ํ•œ ์‹คํ—˜ ๊ฒฐ๊ณผ๋ฅผ ์ œ๊ณตํ•œ๋‹ค.

https://agentsofchaos.baulab.info/

#autonomousagents #securityvulnerabilities #languagemodels #aisafety #openclaw

Agents of Chaos

A two-week study of autonomous LLM agents deployed in a live multi-party environment with persistent memory, email, shell access, and real human interaction.

Counting as a minimal probe of language model reliability
์ด ๋…ผ๋ฌธ์€ ๋Œ€ํ˜• ์–ธ์–ด ๋ชจ๋ธ์˜ ์‹ ๋ขฐ์„ฑ์„ ํ‰๊ฐ€ํ•˜๊ธฐ ์œ„ํ•ด Stable Counting Capacity๋ผ๋Š” ์ƒˆ๋กœ์šด ํ‰๊ฐ€ ๋ฐฉ์‹์„ ์ œ์•ˆํ•œ๋‹ค. ์ด ๋ฐฉ์‹์€ ๋ฐ˜๋ณต๋œ ๊ธฐํ˜ธ๋ฅผ ์„ธ๋Š” ๊ณผ์ œ๋ฅผ ํ†ตํ•ด ๋ชจ๋ธ์˜ ์ ˆ์ฐจ์  ์‹ ๋ขฐ์„ฑ์„ ์ธก์ •ํ•˜๋ฉฐ, ๊ธฐ์กด์˜ ์ง€์‹ ๊ธฐ๋ฐ˜ ๋ฒค์น˜๋งˆํฌ์™€ ๋‹ฌ๋ฆฌ ์˜๋ฏธ๋‚˜ ๋ชจํ˜ธ์„ฑ์„ ๋ฐฐ์ œํ•œ๋‹ค. ์—ฐ๊ตฌ ๊ฒฐ๊ณผ, ํ˜„์žฌ์˜ ์–ธ์–ด ๋ชจ๋ธ๋“ค์€ ๊ด‘๊ณ ๋œ ๋ฌธ๋งฅ ํ•œ๊ณ„ ๋‚ด์—์„œ๋„ ์•ˆ์ •์ ์ธ ์นด์šดํŒ… ๋Šฅ๋ ฅ์ด ๋ถ€์กฑํ•˜๋ฉฐ, ์‹ค์ œ๋กœ๋Š” ์ œํ•œ๋œ ๋‚ด๋ถ€ ์ƒํƒœ๋ฅผ ์‚ฌ์šฉํ•ด ๋‹จ์ˆœํ•œ ๊ทœ์น™์„ ๋ชจ๋ฐฉํ•˜๋Š” ์ˆ˜์ค€์ž„์„ ๋ณด์—ฌ์ค€๋‹ค. ์ด๋Š” ์–ธ์–ด ๋ชจ๋ธ์˜ ์œ ์ฐฝํ•œ ์ˆ˜ํ–‰์ด ๋ฐ˜๋“œ์‹œ ์ผ๋ฐ˜์ ์ด๊ณ  ์‹ ๋ขฐํ•  ์ˆ˜ ์žˆ๋Š” ๊ทœ์น™ ์ค€์ˆ˜๋ฅผ ์˜๋ฏธํ•˜์ง€ ์•Š์Œ์„ ์‹œ์‚ฌํ•œ๋‹ค.

https://arxiv.org/abs/2605.02028

#languagemodels #modelreliability #counting #proceduralevaluation #nlp

Counting as a minimal probe of language model reliability

Large language models perform strongly on benchmarks in mathematical reasoning, coding and document analysis, suggesting a broad ability to follow instructions. However, it remains unclear whether such success reflects general logical competence, repeated application of learned procedures, or pattern matching that mimics rule execution. We investigate this question by introducing Stable Counting Capacity, an assay in which models count repeated symbols until failure. The assay removes knowledge dependencies, semantics and ambiguity from evaluation, avoids lexical and tokenization confounds, and provides a direct measure of procedural reliability beyond standard knowledge-based benchmarks. Here we show, across more than 100 model variants, that stable counting capacity remains far below advertised context limits. Model behavior is consistent neither with open-ended logic nor with stable application of a learned rule, but instead with use of a finite set of count-like internal states, analogous to counting on fingers. Once this resource is exhausted, the appearance of rule following disappears and exact execution collapses into guessing, even with additional test-time compute. These findings show that fluent performance in current language models does not guarantee general, reliable rule following.

arXiv.org

Hallucination Is Inevitable: An Innate Limitation of Large Language Models

https://arxiv.org/abs/2401.11817

#HackerNews #hallucination #languagemodels #AIresearch #technology #limitations

Hallucination is Inevitable: An Innate Limitation of Large Language Models

Hallucination has been widely recognized to be a significant drawback for large language models (LLMs). There have been many works that attempt to reduce the extent of hallucination. These efforts have mostly been empirical so far, which cannot answer the fundamental question whether it can be completely eliminated. In this paper, we formalize the problem and show that it is impossible to eliminate hallucination in LLMs. Specifically, we define a formal world where hallucination is defined as inconsistencies between a computable LLM and a computable ground truth function. By employing results from learning theory, we show that LLMs cannot learn all the computable functions and will therefore inevitably hallucinate if used as general problem solvers. Since the formal world is a part of the real world which is much more complicated, hallucinations are also inevitable for real world LLMs. Furthermore, for real world LLMs constrained by provable time complexity, we describe the hallucination-prone tasks and empirically validate our claims. Finally, using the formal world framework, we discuss the possible mechanisms and efficacies of existing hallucination mitigators as well as the practical implications on the safe deployment of LLMs.

arXiv.org
๐Ÿšจ Breaking news: A single direction determines if language models say "no" or "yes"โ€”spoiler alert, it's not a boy band. ๐ŸŽค๐ŸŽถ Meanwhile, researchers have successfully turned advanced math into a riveting sleep aid. ๐Ÿ˜ด๐Ÿ“š
https://arxiv.org/abs/2406.11717 #BreakingNews #LanguageModels #MathForSleep #ResearchInsights #HackerNews #ngated
Refusal in Language Models Is Mediated by a Single Direction

Conversational large language models are fine-tuned for both instruction-following and safety, resulting in models that obey benign requests but refuse harmful ones. While this refusal behavior is widespread across chat models, its underlying mechanisms remain poorly understood. In this work, we show that refusal is mediated by a one-dimensional subspace, across 13 popular open-source chat models up to 72B parameters in size. Specifically, for each model, we find a single direction such that erasing this direction from the model's residual stream activations prevents it from refusing harmful instructions, while adding this direction elicits refusal on even harmless instructions. Leveraging this insight, we propose a novel white-box jailbreak method that surgically disables refusal with minimal effect on other capabilities. Finally, we mechanistically analyze how adversarial suffixes suppress propagation of the refusal-mediating direction. Our findings underscore the brittleness of current safety fine-tuning methods. More broadly, our work showcases how an understanding of model internals can be leveraged to develop practical methods for controlling model behavior.

arXiv.org