Mastodawn

Промпт-инжиниринг 2026: что устарело с приходом reasoning-моделей

Половина моих промпт-техник за пару лет работы с GPT-4 и Claude 3.5 на reasoning-моделях работает хуже минимального промпта. Развёрнутый chain-of-thought, многошаговый few-shot, эмоциональная role-play — лишнее или вредит. А скучные техники — контракт результата, системные промпты, constraints — наоборот, стали критически важными. Что умерло, что выжило, что подходит под какую задачу.

https://habr.com/ru/articles/1034572/

#промптинжиниринг #reasoningмодели #gpt55 #claude_opus #llm #chainofthought

Промпт-инжиниринг 2026: что устарело с приходом reasoning-моделей

Полгода назад взял старый промпт. Тот самый, отлаженный за два года — с развёрнутым chain-of-thought, тремя few-shot примерами, ролью «опытного инженера с 15 лет опыта», пошаговой схемой рассуждения....

Хабр

sayzard May 11

Visual Generation Unlocks Human-Like Reasoning Through Multimodal World Models

이 논문은 인간과 유사한 추론 능력을 위해 시각적 생성이 멀티모달 세계 모델에서 어떻게 기여하는지 연구했다. 특히 물리적 세계에 기반한 과제에서 시각적 생성이 언어 기반 추론보다 우월하다는 '시각 우월 가설'을 제시하며, 시각-언어 혼합 체인 오브 사고(CoT) 추론이 특정 과제에서 성능을 크게 향상시킴을 실험적으로 입증했다. 이를 위해 새로운 평가 세트 VisWorld-Eval을 구축하고, 최신 통합 멀티모달 모델(UMM)에서 실험을 수행했다. 본 연구는 멀티모달 세계 모델링이 인간과 유사한 강력한 AI 추론에 중요한 역할을 할 수 있음을 시사한다.

https://arxiv.org/abs/2601.19834

#multimodal #worldmodel #chainofthought #visualgeneration #aireasoning

Visual Generation Unlocks Human-Like Reasoning through Multimodal World Models

Humans construct internal world models and reason by manipulating the concepts within these models. Recent advances in AI, particularly chain-of-thought (CoT) reasoning, approximate such human cognitive abilities, where world models are believed to be embedded within large language models. Expert-level performance in formal and abstract domains such as mathematics and programming has been achieved in current systems by relying predominantly on verbal reasoning. However, they still lag far behind humans in domains like physical and spatial intelligence, which require richer representations and prior knowledge. The emergence of unified multimodal models (UMMs) capable of both verbal and visual generation has therefore sparked interest in more human-like reasoning grounded in complementary multimodal pathways, though their benefits remain unclear. From a world-model perspective, this paper presents the first principled study of when and how visual generation benefits reasoning. Our key position is the visual superiority hypothesis: for certain tasks--particularly those grounded in the physical world--visual generation more naturally serves as world models, whereas purely verbal world models encounter bottlenecks arising from representational limitations or insufficient prior knowledge. Theoretically, we formalize internal world modeling as a core component of CoT reasoning and analyze distinctions among different forms of world models. Empirically, we identify tasks that necessitate interleaved visual-verbal CoT reasoning, constructing a new evaluation suite, VisWorld-Eval. Controlled experiments on a state-of-the-art UMM show that interleaved CoT significantly outperforms purely verbal CoT on tasks that favor visual world modeling, but offers no clear advantage otherwise. Together, this work clarifies the potential of multimodal world modeling for more powerful, human-like multimodal AI.

arXiv.org

sayzard Apr 24

N8 Programs (@N8Programs)

LLM에 기존 스케일링 외에 새롭게 추가될 수 있는 ‘새로운 능력’ 방향이 제안되었습니다. 특히 더 복잡하고 다양한 CoT 구조 같은, 추론 방식을 확장하는 연구 아이디어를 언급하며 향후 모델 능력 진화 가능성을 시사합니다.

https://x.com/N8Programs/status/2047487312498507936

#llm #reasoning #chainofthought #research #ai

N8 Programs (@N8Programs) on X

there are several qualitatively "new capabilities" that i think we could see, or at least are feasible directions to add to LLMs, that aren't simply scaling existing ones: - more exotic CoT structures - right now, LLMs excel at any reasoning task that can be solved in language.

X (formerly Twitter)

sayzard Apr 19

fly51fly (@fly51fly)

LongCoT는 장기적 Chain-of-Thought 추론 능력을 평가하는 벤치마크를 제시합니다. 긴 맥락에서의 추론 성능을 체계적으로 측정해, 차세대 추론형 LLM 개발과 평가 기준 마련에 유용한 연구입니다.

https://x.com/fly51fly/status/2045617176393249263

#longcot #chainofthought #benchmark #reasoning #llm

sayzard Apr 9

fly51fly (@fly51fly)

도구를 활용한 다중모달 체인오브쏘트 기반 콘텐츠 안전 모더레이션 연구인 Tool-MCoT가 소개되었다. 멀티모달 입력과 도구 사용을 결합해 안전성 검열/판단 성능을 높이려는 연구로, AI 안전 및 콘텐츠 모더레이션 분야에서 주목할 만하다.

https://x.com/fly51fly/status/2042356165250859155

#ai #safetymoderation #multimodal #chainofthought #arxiv

fly51fly (@fly51fly) on X

[CL] Tool-MCoT: Tool Augmented Multimodal Chain-of-Thought for Content Safety Moderation S Zhang, D Zhou, Y Liu, Y Yang… [Google & Stanford University] (2026) https://t.co/hEaF7fDTTQ

X (formerly Twitter)

Habr Apr 6

Иллюзия логики: как я доказал, что LLM-агенты игнорируют факты, и почему Chain-of-Thought делает только хуже

Сейчас каждый второй стартап пилит ИИ-агентов. Мы оборачиваем LLM в цикл Промпт -> Вызов инструмента -> Ответ и ждем, что нейросеть сама расследует инцидент, найдет баг или напишет фичу. Но на практике автономные агенты часто ходят по кругу, игнорируют явные ошибки и «влюбляются» в свою первую догадку. Индустрия пытается лечить это костылями: наращивает контекст до миллионов токенов или заставляет модель «подумать шаг за шагом» (Chain-of-Thought). Я решил проверить эту архитектуру на прочность. Собрал локальный измерительный стенд LOCK-R, вооружился Теоремой Байеса и поймал современные LLM за руку. В этой статье я математически докажу, почему одиночные агенты структурно уязвимы, как токены размышлений заставляют их врать самим себе еще искуснее, и почему паттерн «Слепого Судьи» - это единственный способ вылечить AI от предвзятости. Тестируем на локальной Qwen-9B и фронтирной GPT-5.4.

https://habr.com/ru/articles/1020016/

#llm #ai_agents #rag #machine_learning #архитектура #chainofthought #теорема_байеса #gpt54 #qwen35 #бенчмарк

Иллюзия логики: как я доказал, что LLM-агенты игнорируют факты, и почему Chain-of-Thought делает только хуже

Сейчас каждый второй стартап пилит ИИ-агентов. Мы оборачиваем LLM в цикл Промпт -> Вызов инструмента (API/Поиск) -> Чтение -> Ответ и ждем, что нейросеть сама расследует инцидент, найдет баг...

Хабр

DrBob, 🧠 Mechanic Mar 24

Chain-of-Thought (CoT) Prompting

Chain-of-Thought (CoT) prompting is a technique where asking questions, rather than issuing direct instructions activates a model’s full internal reasoning pathway.

The key insight from the original framing is that instructions skip steps 1–3, jumping straight to synthesis, while questions force the model to work through the entire reasoning chain.

https://neurodoctor.com/2026/03/20/chain-of-thought-cot-prompting/

#chainofthought #cot #ai #llm #prompt #prompts #prompting #claude #chatgpt #gemini #ericschmidt

sayzard Mar 13

Dietrich Stein (@pixelsort)

Anthropic가 지난달 @deepseek_ai 등 일부 연구실이 자사 모델의 능력을 '도용'했다고 폭로했고, 결과적으로 해당 모델들의 체인오브Thought(Chain of Thought) 추적(trace)이 더 이상 보이지 않게 되었다는 내용입니다. 작성자는 안타까워하면서도 구글의 Gemini는 여전히 CoT를 제공한다고 언급하고 있습니다.

https://x.com/pixelsort/status/2032530587072741710

#anthropic #deepseek #chainofthought #gemini #aisafety

Dietrich Stein (@pixelsort) on X

Last month, @AnthropicAI revealed that @deepseek_ai and other labs have been "stealing" their capabilities. Consequently, we can no longer see the Chain of Thought traces in their models. I'm sympathetic, but saddened. At least @Gemini still has them. https://t.co/eKD4Vwil2H

X (formerly Twitter)

AI Daily Post Mar 13

New research shows TensorRT Edge‑LLM can run chain‑of‑thought reasoning directly on devices, boosting physical AI tasks like autonomous‑vehicle perception and MATH500 benchmarks. Efficient, on‑device inference means smarter, safer robots without cloud latency. Dive into the details of this breakthrough for on‑device language models. #TensorRT #EdgeLLM #ChainOfThought #PhysicalAI

🔗 https://aidailypost.com/news/tensorrt-edgellm-enables-efficient-chainofthought-processing-physical

sayzard Mar 10

fly51fly (@fly51fly)

2026년 논문 'Reasoning Models Struggle to Control their Chains of Thought'는 추론 모델들이 자신의 체인오브소트(Chain of Thought)를 제어하는 데 어려움을 보인다는 분석을 제시한다. C Yueh-Han, R McCarthy, B W. Lee, H He 등(NYU·UCL·OpenAI 소속)이 공동저자로 arXiv에 공개됨.

https://x.com/fly51fly/status/2031126438292894184

#reasoning #chainofthought #airesearch #modelbehavior

fly51fly (@fly51fly) on X

[AI] Reasoning Models Struggle to Control their Chains of Thought C Yueh-Han, R McCarthy, B W. Lee, H He… [NYU & UCL & OpenAI] (2026) https://t.co/kR3dSHR50x

X (formerly Twitter)