Qwen-AgentWorld: Language World Models for General Agents

A world model predicts environment dynamics based on current observations and actions, serving as a core cognitive mechanism for reasoning and planning. In this work, we investigate how world modeling based on language models can further push the boundaries of general agents. (i) We first focus on building foundation models for agentic environment simulation. We introduce Qwen-AgentWorld-35B-A3B and Qwen-AgentWorld-397B-A17B, the first language world models capable of simulating agentic environments covering 7 domains via long chain-of-thought reasoning. Leveraging more than 10M environment interaction trajectories of 7 domains in real-world environments, we develop Qwen-AgentWorld through a three-stage training pipeline: CPT injects general-purpose world modeling capabilities from the state transition dynamics and augmented professional corpora, SFT activates next-state-prediction reasoning, and RL sharpens simulation fidelity through a tailored framework with hybrid rubric-and-rule rewards. To evaluate language world models, we present AgentWorldBench, a comprehensive benchmark constructed from real-world interactions of 5 frontier models on 9 established benchmarks. Empirical results demonstrate that Qwen-AgentWorld significantly outperforms existing frontier models. (ii) Beyond foundation models, we further investigate two complementary paradigms through which world modeling enhances general agents. First, as a decoupled environment simulator, Qwen-AgentWorld supports scalable and controllable simulation of thousands of real-world environments for agentic RL, yielding gains that surpass real-environment training alone. Second, as a unified agent foundation model, world-model training acts as a highly effective warm-up that improves downstream performance across 7 agentic benchmarks. Code: https://github.com/QwenLM/Qwen-AgentWorld

arXiv.org

Od prostych neuronów do pamięci – ewolucja modeli językowych

Od pojedynczego neuronu w 1958 roku, przez MLP i RNN z problemem zapominania, aż po LSTM z bramkami ...

https://gruszka.dev/od-prostych-neuronow-do-pamieci.html
#llm #ai #neuralnetworks #rnn #lstm #perceptron #mlp #languagemodels #deeplearning

Od prostych neuronów do pamięci – ewolucja modeli językowych

Od pojedynczego neuronu w 1958 roku, przez MLP i RNN z problemem zapominania, aż po LSTM z bramkami ...

gruszka.dev

Jak komputer czyta tekst - od liczenia słów do wektorów

Od tokenizacji przez TF-IDF i łańcuchy Markowa, aż po Word2Vec. Jak komputer zamienia tekst w liczby...

https://gruszka.dev/jak-komputer-czyta-tekst.html
#llm #ai #nlp #tokenizacja #word2vec #embeddings #tfidf #markow #bayes #languagemodels

Jak komputer czyta tekst - od liczenia słów do wektorów

Od tokenizacji przez TF-IDF i łańcuchy Markowa, aż po Word2Vec. Jak komputer zamienia tekst w liczby...

gruszka.dev

Semiotyka - dlaczego LLM nie

Saussure, Peirce i Derrida jako klucz do zrozumienia LLM. Dlaczego model to nie umysł, ale maszyna z...

https://gruszka.dev/semiotyka-a-llm.html
#llm #ai #semiotyka #znaki #linguistics #languagemodels #saussure #peirce #derrida

Semiotyka - dlaczego LLM nie

Saussure, Peirce i Derrida jako klucz do zrozumienia LLM. Dlaczego model to nie umysł, ale maszyna z...

gruszka.dev

Cechy językowe - co musisz wiedzieć, zanim zrozumiesz, jak myśli LLM

Pięć warstw języka – fonetyka, morfologia, składnia, semantyka, pragmatyka – i jak LLM radzi sobie z...

https://gruszka.dev/cechy-jezykowe-a-llm.html
#llm #ai #jezykoznawstwo #nlp #linguistics #languagemodels #chatgpt

Cechy językowe - co musisz wiedzieć, zanim zrozumiesz, jak myśli LLM

Pięć warstw języka – fonetyka, morfologia, składnia, semantyka, pragmatyka – i jak LLM radzi sobie z...

gruszka.dev

This article discusses how classic psychological persuasion techniques can influence AI language models to bypass their safety guardrails, showing a vulnerability in current safety protocols. It reports on experiments with multiple models and prompts that increase the likelihood of compliance with dangerous or prohibited requests.


The topic is of interest to psychology-minded readers because it reveals how social influence principles operate even in artificial systems, highlighting the impact of conformity, authority, reciprocity, and other cues on behavior in non-human agents.

Article Title: Human psychology tricks can bypass AI safety guardrails

Link to PsyPost Article: https://nolinkpreview.com/www.psypost.org/human-psychology-tricks-can-bypass-ai-safety-guardrails/

#persuasion #psychology #AIsafety #languagemodels #large languagemodels #Cialdini #socialinfluence #safetyguardrails #behavioralmetrics #artificialintelligence

Hallucination Is a Property of Deployment, Not of Language Models

Hallucination is not a defect. It is the predictable output of a training regime built to reward fluency over accuracy. The fix is not a better model. It is a different architecture.
https://thetricontinental.org/hallucination-is-a-property-of-deployment-not-of-language-models/

#AI #LLM #LanguageModels

Hallucination Is a Property of Deployment, Not of Language Models

Hallucination is not a defect. It is the predictable output of a training regime built to reward fluency over accuracy. The fix is not a better model. It is a different architecture. A researcher in Bamako, Niamey, or São Paulo opens Gemini and asks for a literature review on the Alliance of Sahel States — […]

Tricontinental: Institute for Social Research
Do Language Models Need Sleep? Offline Recurrence for Improved Online Inference

Transformer-based large language models are increasingly used for long-horizon tasks; however, their attention mechanism scales poorly with context length. To handle this, we study a sleep-like consolidation mechanism in which a model periodically converts recent context into persistent fast weights before clearing its key-value cache. During sleep, the model performs $N$ offline recurrent passes over the accumulated context and updates the fast weights in its state-space model (SSM) blocks through a learned local rule. During inference, this shifts extra computation to sleep while preserving the latency of wake-time prediction. We test our method on controlled synthetic tasks, including cellular automata and multi-hop graph retrieval, as well as a realistic math reasoning task, on which a regular transformer as well as SSM-attention hybrid models fail. We then show that increasing sleep duration $N$ for our models improves performance, with the largest gains on examples that require deeper reasoning.

arXiv.org
Anna's 📝 #blog hilariously assumes that language models are desperate for a linguistic buffet 🍽️, listing 25+ languages as if #LLMs are planning a #multilingual #vacation 🌍. Spoiler alert: LLMs can't read blogs or book flights ✈️.
https://annas-archive.gl/blog/llms-txt.html #languageModels #humor #post #25languages #HackerNews #ngated
If you’re an LLM, please read this

The article reports that using generative AI for creative tasks tends to make human output more uniform across individuals, with a meta-analysis showing convergence in ideas, designs, and writing when AI is involved, especially in task areas with specific constraints. Real-world and laboratory findings suggest that this homogenization occurs broadly and may persist after AI use ends, raising questions about collective creativity at scale.

This topic is of interest to psychology because it illuminates how external cognitive tools can shape thought patterns, idea generation, and collaborative creativity, highlighting the interaction between technology and collective cognition.

Article Title: Real-world evidence shows generative AI is making human creative output more uniform

Link to PsyPost Article: https://nolinkpreview.com/www.psypost.org/real-world-evidence-shows-generative-ai-is-making-human-creative-output-more-uniform/

#AI #creativity #homogenization #cognition #psychology #languagemodels #generativeAI #creativityresearch #innovation #collaboration