BOOTOSHI (@KingBootoshi)

LLM은 혁신할 수 없다는 주장에 반박하며, 에이전트가 시행착오를 통해 가능한 모든 탐색 공간을 탐색해 정답을 찾는 모습을 언급한다. LLM/에이전트의 문제 해결 및 탐색 능력을 강조하는 내용이다.

https://x.com/KingBootoshi/status/2039562681611669774

#llm #agent #ai #reasoning #exploration

BOOTOSHI 👑 (@KingBootoshi) on X

"LLM's can't innovate" ok then explain what my agent is doing when it navigates the ENTIRE possibility space of existence with trial and error until it finds the correct solution or inevitably exhausts all possible exploration options like can u explain what this is fr tho

X (formerly Twitter)
‘We’re there to help our allies’: Trump once again shifts reasoning for Iran war
It was president's first address on the war and it comes at a pivotal moment at home and abroad as he faces mounting questions about the costs, rationale and objectives of the war.
#USNews #World #DonaldTrump #Iran
https://globalnews.ca/news/11756224/trump-address-nation-iran-war/
🚀 Wow, Trinity Large Thinking is here to save the day with its phenomenal #reasoning skills! With an astounding 0.25M input tokens, surely it can comprehend the complexities of a Taco Bell menu 🌮. But hey, it's free, so even if it fails, at least you didn't pay for the privilege of disappointment! 😜
https://openrouter.ai/arcee-ai/trinity-large-thinking #TrinityLargeThinking #TacoBell #AI #Skills #FreeTech #HackerNews #ngated
Trinity Large Thinking - API Pricing & Providers

Trinity Large Thinking is a powerful open source reasoning model from the team at Arcee AI. $0.25 per million input tokens, $0.90 per million output tokens. 262,144 token context window, maximum output of 80,000 tokens.

Kevin Weil (@kevinweil)

AI가 더 많은 미해결 문제를 해결하는 데 그치지 않고, 모델이 발전할수록 증명도 더 우아해지고 있다고 언급한다. AI 추론 능력과 수학적 증명 생성의 질이 함께 향상되고 있음을 시사하는 내용이다.

https://x.com/kevinweil/status/2039200605672284572

#ai #reasoning #math #proofs #llm

Kevin Weil 🇺🇸 (@kevinweil) on X

Not only is AI solving more open problems—its proofs are getting more elegant as the models improve

X (formerly Twitter)

AI Leaks and News (@AILeaksAndNews)

OpenAI의 내부 모델이 새로 에르되시(Erdos) 문제 3개를 추가로 해결했다는 소식입니다. 아직 ChatGPT에는 공개되지 않은 차기 모델 계열이 수학 추론에서 성과를 보였다는 점에서, AI for science와 고난도 추론 능력의 진전을 보여주는 중요한 신호입니다.

https://x.com/AILeaksAndNews/status/2039343661847036278

#openai #reasoning #math #aiforscience #llm

AI Leaks and News (@AILeaksAndNews) on X

An internal model at OpenAI has solved three more Erdos problems (not an April fools joke) An AI model not currently available in ChatGPT that many are expecting is a version of the upcoming Spud model has solved three new Erdos problems AI for science and mathematics is here

X (formerly Twitter)

TinyLoRA pushes low-rank adaptation almost to zero.

An 8B Qwen2.5 model reportedly hits 91% on GSM8K with just 13 trained bf16 params, or 26 bytes. The core idea: RL-based post-training may improve reasoning through an extremely low-dimensional update. But this seems to work far better for RL than SFT, which reportedly needs 100–1000x more parameters for similar gains.

https://arxiv.org/abs/2602.04118

#AI #genAI #reasoning

Learning to Reason in 13 Parameters

Recent research has shown that language models can learn to \textit{reason}, often via reinforcement learning. Some work even trains low-rank parameterizations for reasoning, but conventional LoRA cannot scale below the model dimension. We question whether even rank=1 LoRA is necessary for learning to reason and propose TinyLoRA, a method for scaling low-rank adapters to sizes as small as one parameter. Within our new parameterization, we are able to train the 8B parameter size of Qwen2.5 to 91\% accuracy on GSM8K with only 13 trained parameters in bf16 (26 total bytes). We find this trend holds in general: we are able to recover 90\% of performance improvements while training $1000x$ fewer parameters across a suite of more difficult learning-to-reason benchmarks such as AIME, AMC, and MATH500. Notably, we are only able to achieve such strong performance with RL: models trained using SFT require $100-1000x$ larger updates to reach the same performance.

arXiv.org
Learning to Reason in 13 Parameters

Recent research has shown that language models can learn to \textit{reason}, often via reinforcement learning. Some work even trains low-rank parameterizations for reasoning, but conventional LoRA cannot scale below the model dimension. We question whether even rank=1 LoRA is necessary for learning to reason and propose TinyLoRA, a method for scaling low-rank adapters to sizes as small as one parameter. Within our new parameterization, we are able to train the 8B parameter size of Qwen2.5 to 91\% accuracy on GSM8K with only 13 trained parameters in bf16 (26 total bytes). We find this trend holds in general: we are able to recover 90\% of performance improvements while training $1000x$ fewer parameters across a suite of more difficult learning-to-reason benchmarks such as AIME, AMC, and MATH500. Notably, we are only able to achieve such strong performance with RL: models trained using SFT require $100-1000x$ larger updates to reach the same performance.

arXiv.org

金のニワトリ (@gosrum)

모델 선택 화면에서 reasoning-effort 옵션이 보이며, 좌우 키로 xhigh 같은 고급 추론 수준을 선택할 수 있다는 설정 팁을 공유한다.

https://x.com/gosrum/status/2038857791767798205

#reasoning #llm #modelsettings #prompting #aidevelopment

金のニワトリ (@gosrum) on X

これ私もハマったのですが、モデルを選ぶときによく見ると下の方にreasoning-effortが表示されてます。このときに←→で選択してからEnterすればxhighなどにできます

X (formerly Twitter)

I don't program in C or defend that language, but each time I read someone complain about the need to manage null-termination and such in C, I wonder if they'd have similar complaints about assembly programming. Sure, C is "higher level" than assembly but only barely? 🤔

#programming #reasoning

vitrupo (@vitrupo)

François Chollet이 지능과 지식 사이에는 항상 트레이드오프가 있다고 말하며, 더 많은 운영 지식이 있으면 필요한 지능은 줄어든다고 설명했습니다. 또한 코딩 에이전트가 자신의 출력을 검증하고 코드 실행을 시뮬레이션할 수 있게 된 점을 강조하며, 해당 능력이 실제로 중요해졌다고 평가합니다.

https://x.com/vitrupo/status/2038562881344770498

#codingagents #aiagents #verification #reasoning #llm

vitrupo (@vitrupo) on X

François Chollet says there is always a tradeoff between intelligence and knowledge. More operational knowledge means you need less intelligence to be competent. Coding agents can now verify their own outputs and simulate code execution. The capability is real. But the need

X (formerly Twitter)