fly51fly (@fly51fly)

논문 'Consistency of Large Reasoning Models Under Multi-Turn Attacks' 발표(Y Li, R Krishnan, R Padman, CMU, 2026). 다중 턴 공격 상황에서 대형 추론 모델의 일관성(consistency) 문제를 분석·보고하는 연구 논문으로, 모델의 공격 내성 및 안정성 관련 인사이트를 제공합니다(원문 링크 포함).

https://x.com/fly51fly/status/2023583155425583127

#robustness #reasoningmodels #adversarial #arxiv

fly51fly (@fly51fly) on X

[LG] Consistency of Large Reasoning Models Under Multi-Turn Attacks Y Li, R Krishnan, R Padman [CMU] (2026) https://t.co/6nwEU2mzrp

X (formerly Twitter)

xAI’s co‑founder exits keep coming, while Lambda outlines a 2025 shift toward bigger context windows, multimodal reasoning models and open‑source inference for AI production. What could this mean for the future of machine learning? Read on for the full story. #AIProduction #ReasoningModels #MultimodalAI #OpenSourceInference

🔗 https://aidailypost.com/news/xai-co-founder-departures-persist-lambda-outlines-2025-ai-production

AI that thinks instead of guessing?

Reasoning models use techniques like chain of thought and tree of thought to decompose problems, explore alternatives, and choose better answers, often at the cost of more compute and latency.

A practical explainer:
🔗 https://techglimmer.io/what-is-ai-thinking-reasoning-models/

#AI #ReasoningModels #ChainOfThought #TreeOfThought #GenAI #FediTech #MachineLearning

The Pause That Changed Everything: Why AI Thinking is the Future

We are moving from chatbots to reasoning engines. Discover what AI thinking is, how Chain of Thought works, and why the future of intelligence is slow, not fast.

techglimmer.io
2025 saw significant advancements in #LLMs, particularly in the areas of #reasoning and #agent based systems. #Reasoningmodels, capable of breaking down #complextasks and utilising tools, revolutionised #coding and #search. The year witnessed the rise of #codingagents, exemplified by #ClaudeCode, which can autonomously write, execute, and refine code. https://simonwillison.net/2025/Dec/31/the-year-in-llms/?eicker.news #tech #media #news
2025: The year in LLMs

This is the third in my annual series reviewing everything that happened in the LLM space over the past 12 months. For previous years see Stuff we figured out about …

Simon Willison’s Weblog

Manning Publications (@ManningBooks)

추론(reasoning) 모델의 중요성이 장기적으로 큰 변화를 가져온다는 내용입니다. Meta 등 기업들이 추론 모델을 밀고 있으며 VentureBeat가 MobileLLM-R1을 언급했고, @rasbt의 Build를 통해 추론 모델이 실제로 어떻게 구축되고 평가되는지 배울 수 있다는 점을 강조합니다.

https://x.com/ManningBooks/status/2003903560921018508

#reasoningmodels #mobilellmr1 #meta #modelevaluation

Manning Publications (@ManningBooks) on X

AI moves fast, but some shifts matter long after the headlines pass. Reasoning models are one of 'em. As it grows, even companies like @Meta are pushing them, as @VentureBeat highlights with MobileLLM-R1. Want to learn how they're are actually built & evaluated? @rasbt's Build

X (formerly Twitter)

FINE-TUNING Qwen3 VỚI "THINKING MODE" KHÓ KHĂN TRONG LẬP LUẬN. Tài liệu hướng dẫn tạo tập dữ liệu "giải thích" (thinking) chưa rõ ràng khiến việc huấn luyện mô hình gặp trục trặc. Ai có kinh nghiệm hoặc tài liệu về kiến thức này chia sẻ giúp #AI #MachineLearning #LậpLý #MôHìnhQwen #ReasoningModels #KnowledgeInjection

*(Tóm tắt: Người dùng gặp khó khăn khi tinh chỉnh Qwen3 để bổ sung kiến thức Vật lý nhờ "thinking mode". Cố tạo dữ liệu giải thích bằng Qwen3 dẫn đến hiệu suất giảm. Cần chia sẻ

New AI reasoning models built as neural networks are showing striking convergence across diverse training sets. Researchers say this hints at emergent structure in how machines learn to reason, opening fresh avenues for open‑source computational tools. Dive into the findings and see why this could reshape our approach to artificial intelligence. #AI #NeuralNetworks #ReasoningModels #Convergence

🔗 https://aidailypost.com/news/new-ai-reasoning-models-built-neural-networks-show-striking

Reasoning Models Reason Well, Until They Don't

Large language models (LLMs) have shown significant progress in reasoning tasks. However, recent studies show that transformers and LLMs fail catastrophically once reasoning problems exceed modest complexity. We revisit these findings through the lens of large reasoning models (LRMs) -- LLMs fine-tuned with incentives for step-by-step argumentation and self-verification. LRM performance on graph and reasoning benchmarks such as NLGraph seem extraordinary, with some even claiming they are capable of generalized reasoning and innovation in reasoning-intensive fields such as mathematics, physics, medicine, and law. However, by more carefully scaling the complexity of reasoning problems, we show existing benchmarks actually have limited complexity. We develop a new dataset, the Deep Reasoning Dataset (DeepRD), along with a generative process for producing unlimited examples of scalable complexity. We use this dataset to evaluate model performance on graph connectivity and natural language proof planning. We find that the performance of LRMs drop abruptly at sufficient complexity and do not generalize. We also relate our LRM results to the distributions of the complexities of large, real-world knowledge graphs, interaction graphs, and proof datasets. We find the majority of real-world examples fall inside the LRMs' success regime, yet the long tails expose substantial failure potential. Our analysis highlights the near-term utility of LRMs while underscoring the need for new methods that generalize beyond the complexity of examples in the training distribution.

arXiv.org

AI Models Table

This webpage has reference info on a lot of AI models.

https://lifearchitect.ai/models-table/

#AI #LLM #AIchina #ReasoningModels
#FoundationModel