2025 saw significant advancements in #LLMs, particularly in the areas of #reasoning and #agent based systems. #Reasoningmodels, capable of breaking down #complextasks and utilising tools, revolutionised #coding and #search. The year witnessed the rise of #codingagents, exemplified by #ClaudeCode, which can autonomously write, execute, and refine code. https://simonwillison.net/2025/Dec/31/the-year-in-llms/?eicker.news #tech #media #news
2025: The year in LLMs

This is the third in my annual series reviewing everything that happened in the LLM space over the past 12 months. For previous years see Stuff we figured out about …

Simon Willison’s Weblog

Manning Publications (@ManningBooks)

추론(reasoning) 모델의 중요성이 장기적으로 큰 변화를 가져온다는 내용입니다. Meta 등 기업들이 추론 모델을 밀고 있으며 VentureBeat가 MobileLLM-R1을 언급했고, @rasbt의 Build를 통해 추론 모델이 실제로 어떻게 구축되고 평가되는지 배울 수 있다는 점을 강조합니다.

https://x.com/ManningBooks/status/2003903560921018508

#reasoningmodels #mobilellmr1 #meta #modelevaluation

Manning Publications (@ManningBooks) on X

AI moves fast, but some shifts matter long after the headlines pass. Reasoning models are one of 'em. As it grows, even companies like @Meta are pushing them, as @VentureBeat highlights with MobileLLM-R1. Want to learn how they're are actually built & evaluated? @rasbt's Build

X (formerly Twitter)

FINE-TUNING Qwen3 VỚI "THINKING MODE" KHÓ KHĂN TRONG LẬP LUẬN. Tài liệu hướng dẫn tạo tập dữ liệu "giải thích" (thinking) chưa rõ ràng khiến việc huấn luyện mô hình gặp trục trặc. Ai có kinh nghiệm hoặc tài liệu về kiến thức này chia sẻ giúp #AI #MachineLearning #LậpLý #MôHìnhQwen #ReasoningModels #KnowledgeInjection

*(Tóm tắt: Người dùng gặp khó khăn khi tinh chỉnh Qwen3 để bổ sung kiến thức Vật lý nhờ "thinking mode". Cố tạo dữ liệu giải thích bằng Qwen3 dẫn đến hiệu suất giảm. Cần chia sẻ

New AI reasoning models built as neural networks are showing striking convergence across diverse training sets. Researchers say this hints at emergent structure in how machines learn to reason, opening fresh avenues for open‑source computational tools. Dive into the findings and see why this could reshape our approach to artificial intelligence. #AI #NeuralNetworks #ReasoningModels #Convergence

🔗 https://aidailypost.com/news/new-ai-reasoning-models-built-neural-networks-show-striking

Reasoning Models Reason Well, Until They Don't

Large language models (LLMs) have shown significant progress in reasoning tasks. However, recent studies show that transformers and LLMs fail catastrophically once reasoning problems exceed modest complexity. We revisit these findings through the lens of large reasoning models (LRMs) -- LLMs fine-tuned with incentives for step-by-step argumentation and self-verification. LRM performance on graph and reasoning benchmarks such as NLGraph seem extraordinary, with some even claiming they are capable of generalized reasoning and innovation in reasoning-intensive fields such as mathematics, physics, medicine, and law. However, by more carefully scaling the complexity of reasoning problems, we show existing benchmarks actually have limited complexity. We develop a new dataset, the Deep Reasoning Dataset (DeepRD), along with a generative process for producing unlimited examples of scalable complexity. We use this dataset to evaluate model performance on graph connectivity and natural language proof planning. We find that the performance of LRMs drop abruptly at sufficient complexity and do not generalize. We also relate our LRM results to the distributions of the complexities of large, real-world knowledge graphs, interaction graphs, and proof datasets. We find the majority of real-world examples fall inside the LRMs' success regime, yet the long tails expose substantial failure potential. Our analysis highlights the near-term utility of LRMs while underscoring the need for new methods that generalize beyond the complexity of examples in the training distribution.

arXiv.org

AI Models Table

This webpage has reference info on a lot of AI models.

https://lifearchitect.ai/models-table/

#AI #LLM #AIchina #ReasoningModels
#FoundationModel

Phân tích METR-Horizon cho thấy các mô hình AI có khả năng suy luận (từ T9/2024) vượt trội: hiệu suất tăng 2.2 lần và khả năng mở rộng nhanh hơn 37%. Điều này giúp AI hoàn thành các nhiệm vụ phức tạp, đòi hỏi thời gian dài một cách tự chủ, nhanh hơn đáng kể so với trước đây.

#AI #ReasoningModels #MachineLearning #TechNews #METRAnalysis #TríTuệNhânTạo #MôHìnhSuyLuận #CôngNghệ

https://www.reddit.com/r/singularity/comments/1o3dnz2/reasoning_models_show_22_performance_jump_and_37/

"The point is that with each advance in AI, new hurdles become apparent; when one missing aspect of “intelligence” is filled in, we find ourselves bumping up against another gap. When I speculated about GPT-5 last year, it didn’t occur to me to question whether it would know how to set priorities, because the models of the time weren’t even capable enough for that to be a limiting factor. In a post from November, AI is Racing Forward – on a Very Long Road, I wrote:

…the real challenges may be things that we can’t easily anticipate right now, weaknesses that we will only start to put our finger on when we observe [future models] performing astonishing feats and yet somehow still not being able to write that tightly-plotted novel.

In April 2024, it seemed like agentic AI was going to be the next big thing. The ensuing 16 months have brought enormous progress on many fronts, but very little progress on real-world agency. With projects like AI Village shining a light on the profound weakness of current AI agents, I think robust real-world capability is still years away."

https://secondthoughts.ai/p/gpt-5-the-case-of-the-missing-agent

#AI #GenerativeAI #LLMs #Chatbots #AIAgents #AgenticAI #ReasoningModels

GPT-5: The Case of the Missing Agent

AI has made enormous progress in the last 16 months. Agentic AI seems farther off than ever.

Second Thoughts

🧠 What if you could tell AI how much to think before answering?
Seed-OSS 36B gives builders a thinking budget knob + 512K context window—control depth vs speed like never before. ⚡

👉 See how it changes product SLAs, costs, and user experience:
https://medium.com/@rogt.x1997/seed-oss-36b-a-tweakable-reasoning-engine-for-long-context-work-66aa05a72548

#AI #ReasoningModels #LongContext
https://medium.com/@rogt.x1997/seed-oss-36b-a-tweakable-reasoning-engine-for-long-context-work-66aa05a72548

Seed‑OSS 36B A Tweakable Reasoning Engine for Long‑Context Work

A field guide for builders who want deep thinking when it matters, and speed when it doesn’t.

Medium