Manning Publications (@ManningBooks)
추론(reasoning) 모델의 중요성이 장기적으로 큰 변화를 가져온다는 내용입니다. Meta 등 기업들이 추론 모델을 밀고 있으며 VentureBeat가 MobileLLM-R1을 언급했고, @rasbt의 Build를 통해 추론 모델이 실제로 어떻게 구축되고 평가되는지 배울 수 있다는 점을 강조합니다.

AI moves fast, but some shifts matter long after the headlines pass. Reasoning models are one of 'em. As it grows, even companies like @Meta are pushing them, as @VentureBeat highlights with MobileLLM-R1. Want to learn how they're are actually built & evaluated? @rasbt's Build
OpenAI: GPT-5 Thinking Models Are The Most "Monitarable" Models To Date
#AI #OpenAI #AISafety #LLM #MachineLearning #GPT5 #DeepMind #AIResearch #ChainOfThought #Monitorability #AIAlignment #ReasoningModels
FINE-TUNING Qwen3 VỚI "THINKING MODE" KHÓ KHĂN TRONG LẬP LUẬN. Tài liệu hướng dẫn tạo tập dữ liệu "giải thích" (thinking) chưa rõ ràng khiến việc huấn luyện mô hình gặp trục trặc. Ai có kinh nghiệm hoặc tài liệu về kiến thức này chia sẻ giúp #AI #MachineLearning #LậpLý #MôHìnhQwen #ReasoningModels #KnowledgeInjection
*(Tóm tắt: Người dùng gặp khó khăn khi tinh chỉnh Qwen3 để bổ sung kiến thức Vật lý nhờ "thinking mode". Cố tạo dữ liệu giải thích bằng Qwen3 dẫn đến hiệu suất giảm. Cần chia sẻ
New AI reasoning models built as neural networks are showing striking convergence across diverse training sets. Researchers say this hints at emergent structure in how machines learn to reason, opening fresh avenues for open‑source computational tools. Dive into the findings and see why this could reshape our approach to artificial intelligence. #AI #NeuralNetworks #ReasoningModels #Convergence
🔗 https://aidailypost.com/news/new-ai-reasoning-models-built-neural-networks-show-striking
Reasoning Models Reason Well, Until They Don't
https://arxiv.org/abs/2510.22371
#HackerNews #ReasoningModels #ReasonWell #AIResearch #MachineLearning #HackerNews

Large language models (LLMs) have shown significant progress in reasoning tasks. However, recent studies show that transformers and LLMs fail catastrophically once reasoning problems exceed modest complexity. We revisit these findings through the lens of large reasoning models (LRMs) -- LLMs fine-tuned with incentives for step-by-step argumentation and self-verification. LRM performance on graph and reasoning benchmarks such as NLGraph seem extraordinary, with some even claiming they are capable of generalized reasoning and innovation in reasoning-intensive fields such as mathematics, physics, medicine, and law. However, by more carefully scaling the complexity of reasoning problems, we show existing benchmarks actually have limited complexity. We develop a new dataset, the Deep Reasoning Dataset (DeepRD), along with a generative process for producing unlimited examples of scalable complexity. We use this dataset to evaluate model performance on graph connectivity and natural language proof planning. We find that the performance of LRMs drop abruptly at sufficient complexity and do not generalize. We also relate our LRM results to the distributions of the complexities of large, real-world knowledge graphs, interaction graphs, and proof datasets. We find the majority of real-world examples fall inside the LRMs' success regime, yet the long tails expose substantial failure potential. Our analysis highlights the near-term utility of LRMs while underscoring the need for new methods that generalize beyond the complexity of examples in the training distribution.
AI Models Table
This webpage has reference info on a lot of AI models.
Phân tích METR-Horizon cho thấy các mô hình AI có khả năng suy luận (từ T9/2024) vượt trội: hiệu suất tăng 2.2 lần và khả năng mở rộng nhanh hơn 37%. Điều này giúp AI hoàn thành các nhiệm vụ phức tạp, đòi hỏi thời gian dài một cách tự chủ, nhanh hơn đáng kể so với trước đây.
#AI #ReasoningModels #MachineLearning #TechNews #METRAnalysis #TríTuệNhânTạo #MôHìnhSuyLuận #CôngNghệ
"The point is that with each advance in AI, new hurdles become apparent; when one missing aspect of “intelligence” is filled in, we find ourselves bumping up against another gap. When I speculated about GPT-5 last year, it didn’t occur to me to question whether it would know how to set priorities, because the models of the time weren’t even capable enough for that to be a limiting factor. In a post from November, AI is Racing Forward – on a Very Long Road, I wrote:
…the real challenges may be things that we can’t easily anticipate right now, weaknesses that we will only start to put our finger on when we observe [future models] performing astonishing feats and yet somehow still not being able to write that tightly-plotted novel.
In April 2024, it seemed like agentic AI was going to be the next big thing. The ensuing 16 months have brought enormous progress on many fronts, but very little progress on real-world agency. With projects like AI Village shining a light on the profound weakness of current AI agents, I think robust real-world capability is still years away."
https://secondthoughts.ai/p/gpt-5-the-case-of-the-missing-agent
#AI #GenerativeAI #LLMs #Chatbots #AIAgents #AgenticAI #ReasoningModels
🧠 What if you could tell AI how much to think before answering?
Seed-OSS 36B gives builders a thinking budget knob + 512K context window—control depth vs speed like never before. ⚡
👉 See how it changes product SLAs, costs, and user experience:
https://medium.com/@rogt.x1997/seed-oss-36b-a-tweakable-reasoning-engine-for-long-context-work-66aa05a72548
#AI #ReasoningModels #LongContext
https://medium.com/@rogt.x1997/seed-oss-36b-a-tweakable-reasoning-engine-for-long-context-work-66aa05a72548