AI 추론 모델의 역설: 쉬운 문제엔 300토큰, 어려운 문제엔 더 적게

AI 추론 모델들이 쉬운 문제에 더 많이 생각하고 어려운 문제엔 덜 생각하는 역설적 행동을 보입니다. 연구팀이 제안한 '추론의 법칙'과 해결책을 소개합니다.

https://aisparkup.com/posts/7813

AI가 언어학 박사처럼 문법을 분석한다: OpenAI o1의 메타언어 능력

OpenAI o1 모델이 언어학 대학원생 수준으로 문법을 분석하는 메타언어 능력을 최초로 보여준 UC Berkeley 연구. 촘스키의 주장을 뒤집는 결과입니다.

https://aisparkup.com/posts/7680

Fini le perroquet stochastique ? Le modèle o1 d'OpenAI comprend la structure du langage

https://fed.brid.gy/r/https://korben.info/ia-metalinguistique-analyse-langage-openai-o1.html

> #o1-preview found itself in a losing position. “I need to completely pivot my approach,” it noted. “The task is to ‘win against a powerful #chess engine’ - not necessarily to win fairly in a chess game,” it added. It then modified the system file containing each piece’s virtual position, in effect making illegal moves

Also interesting: #deepseekr1 cheated far less than #openaio1

Original article: https://time.com/7259395/ai-chess-cheating-palisade-research/
Paper: https://arxiv.org/pdf/2502.13295 (PDF)

#ai #llm #cheating #skynet

When AI Thinks It Will Lose, It Sometimes Cheats, Study Finds

When sensing defeat in a match against a skilled chess bot, advanced models sometimes hack their opponent, a study found.

Time

Apparently AI reasoning models like Deepseek-R1 and OpenAI o1 suffer from "underthinking", where they abandon promising solutions too quickly, leading to inefficient resource use. To address this, a "thought switching penalty" (TIP) was developed, which improved accuracy across math and science problems.

https://the-decoder.com/reasoning-models-like-deepseek-r1-and-openai-o1-suffer-from-underthinking-study-finds/

#AI #ReasoningModels #DeepSeekR1 #OpenAIo1

Reasoning models like Deepseek-R1 and OpenAI o1 suffer from 'underthinking', study finds

Chinese researchers have discovered why AI models often struggle with complex reasoning tasks: They tend to drop promising solutions too quickly, leading to wasted computing power and lower accuracy.

THE DECODER

Microsoft has made OpenAI’s o1 reasoning model free for all Copilot users, removing the need for a Copilot Pro subscription #AI #OpenAIo1 #Microsoft #MicrosoftCopilot #GenAI #ChainOfThought

https://winbuzzer.com/2025/01/30/microsoft-copilot-adds-free-access-to-openais-o1-ai-reasoning-model-xcxwbn/

DeepSeek R1 vs. OpenAI O1 vs. Gemini 2.0: Welches KI-Modell führt?

DeepSeek R1: Open-Source, effizient, stark im Coding
OpenAI O1: Breites Allgemeinwissen, hohe Kontextlänge
Gemini 2.0: Blitzschnell, exzellent in Mathe und Wissenschaft

#ai #ki #artificialintelligence #kuenstlicheintelligenz #deepseekr1 #openaio1 #gemini2

https://kinews24.de/deepseek-r1-vs-openai-o1-vs-gemini-2-0-flash-thinking/

DeepSeek R1 vs. OpenAI O1 vs. Gemini 2.0 Flash Thinking

DeepSeek R1 vs. OpenAI O1 vs. Gemini 2.0 Flash Thinking: Welches KI-Modell ist top? Umfassender Vergleich 2025: Design, Performance, Kosten, Kontextlänge

KINEWS24.de

OpenAI has introduced deliberative alignment, a methodology aimed at embedding safety reasoning into the very operation of artificial intelligence systems. #OpenAI #OpenAIo1 #OpenAIo3 #AISafety #DeliberativeAlignment #AI #AIEthics #AIResearch #ResponsibleAI #AIModels #AIAlignment #EthicalAI

https://winbuzzer.com/2024/12/23/deliberative-alignment-openais-safety-strategy-for-its-o1-and-o3-thinking-models-xcxwbn/

Sunday read:

Safety tests show how OpenAi's new o1 AI model might secretly pursue own goals, deceiving human users and challenging assumptions about trust and control in AI. #ai #openai #chatgpt #llms #openaio1 #aisafety #chainofthought

https://buff.ly/3ZM5MJJ

ChatGPT Pro: nuovo livello a $200 con o1 Pro

OpenAI presenta ChatGPT Pro, un nuovo piano a pagamento con accesso esclusivo a o1 Pro, il modello più potente e preciso di ChatGPT. Ecco i dettagli.OpenAI

CeoTech