fly51fly (@fly51fly)

블랙박스 LLM에서 다단계 추론과 도구 사용을 위한 프롬프트 정책을 다루고, 경험의 반복적 distillation으로 이를 개선하는 연구입니다. 에이전트/툴유즈 파이프라인 설계에 직접 관련된 주제로, 실무 적용 가능성이 있는 편이지만 아직 논문 단계입니다.

https://x.com/fly51fly/status/2055400219324641370

#llm #agents #tooluse #prompting #distillation

fly51fly (@fly51fly) on X

[LG] Prompting Policies for Multi-step Reasoning and Tool-Use in Black-box LLMs with Iterative Distillation of Experience K Sayana, K Todi, A Jash [Google Research] (2026) https://t.co/Re2TGOe4sh

X (formerly Twitter)

Clever Cockatoo Manufactures And Uses Tools

"Not known to manufacture or use tools in the wild, a captive cockatoo demonstrates that parrots can make tools to suit their needs"

#SciComm by @grrlscientist

#cognition #ToolUse #Cockatoos #ornithology https://grrlscientist.medium.com/clever-cockatoo-manufactures-and-uses-tools-350214a04362

Clever Cockatoo Manufactures And Uses Tools

"Not known to manufacture or use tools in the wild, a captive cockatoo demonstrates that parrots can make tools to suit their needs"

#SciComm by @GrrlScientist

#cognition #ToolUse #Cockatoos #ornithology https://grrlscientist.medium.com/clever-cockatoo-manufactures-and-uses-tools-350214a04362

Clever Cockatoo Manufactures And Uses Tools

"Not known to manufacture or use tools in the wild, a captive cockatoo demonstrates that parrots can make tools to suit their needs"

#SciComm by @GrrlScientist

#cognition #ToolUse #Cockatoos #ornithology https://grrlscientist.medium.com/clever-cockatoo-manufactures-and-uses-tools-350214a04362

Clever Cockatoo Manufactures And Uses Tools

"Not known to manufacture or use tools in the wild, a captive cockatoo demonstrates that parrots can make tools to suit their needs"

#SciComm by @grrlscientist

#cognition #ToolUse #Cockatoos #ornithology https://grrlscientist.medium.com/clever-cockatoo-manufactures-and-uses-tools-350214a04362

Dario Cositore (@DarioCositore)

프리릴리즈 단계에서 100개 이상의 실제 비즈니스 워크플로우로 모델을 평가한 결과, 단순히 최신 모델이 항상 더 나쁜 것은 아니며 성능 변화가 영역별로 다르다고 설명한다. Opus 4.7은 구조화된 출력은 일부 퇴행했지만 멀티스텝 툴 체인은 개선됐고, Gemini 3.1은 추론 능력이 저하됐다고 언급한다.

https://x.com/DarioCositore/status/2053892255438536725

#ai #llm #modelevaluation #reasoning #tooluse

Dario Cositore (@DarioCositore) on X

@bindureddy I'm one of the people who evaluated these models pre-release across 100+ real business workflows before prod. The picture is way more nuanced than "Newer = worse" Opus 4.7 intentionally regressed on structured output but improved multi-step tool chains. Gemini 3.1 lost reasoning

X (formerly Twitter)

Agents Manage Other Agents: Four Subagents Patterns in 2026

2026년 AI 에이전트 관리 패턴에 대해 4가지 주요 유형을 소개한다. 첫째, 인라인 툴 방식은 메인 에이전트가 서브에이전트를 함수 호출처럼 다루며, 동기 및 비동기 실행을 지원한다. 둘째, 팬아웃 패턴은 여러 서브에이전트를 병렬로 생성하고 결과를 일괄 수집하는 방식이다. 셋째, 에이전트 풀 패턴은 상태를 유지하는 장기 서브에이전트를 메시지로 관리하며, 다중 단계 워크플로우와 상호작용을 가능하게 한다. 각 패턴은 제어 수준과 활용 사례에 따라 장단점이 있으며, AI 에이전트 설계 및 운영에 실질적 인사이트를 제공한다.

https://www.philschmid.de/subagent-patterns-2026

#aiagents #subagents #agentpatterns #tooluse #multistepworkflow

How Agents Manage Other Agents: Four Subagents Patterns in 2026

Subagents solve context pollution, but how the main agent manages them matters more than whether they run in sync or async. Four orchestration patterns, from a simple tool call to an autonomous agent team, each with different requirements for model capability and result collection.

Stometa (@stometaverse)

벤치마크 추론 성능은 비슷해져도 도구 사용 방식과 평가 엄격성은 모델마다 다르며, 실제 프로덕션 트레이스에서는 서로 다른 목표 함수를 최적화한다고 지적합니다. 연구/평가 방식과 실제 배포 간 차이를 강조한 인사이트성 트윗입니다.

https://x.com/stometaverse/status/2052725090387972165

#llm #benchmark #tooluse #evaluation #production

Stometa (@stometaverse) on X

@OfficialLoganK agree. convergence on benchmark reasoning, divergence on tool-use surface and eval discipline. when you look at production traces, model families clearly optimize different objective functions — paper benchmarks hide this.

X (formerly Twitter)

Tool Calls en Claude: el costo que no ves venir

Cuánto cuestan realmente las tool calls en Claude API en 2026, qué es Tool Search, cómo Programmatic Tool Calling ahorra 37% de tokens y por qué explota...

https://blog.donweb.com/costos-herramientas-claude-api-tool-use/

#claudeapi #tooluse #functioncalling #anthropic #agentesia

Costos herramientas Claude API: tool use real

Cuánto cuestan realmente las tool calls en Claude API en 2026, qué es Tool Search, cómo Programmatic Tool Calling ahorra 37% de tokens y por qué explota...

Blog Donweb

How Bruce The Broken Beaked Kea Became King Of His Circus

"Bruce surprised researchers by turning his disability into such a successful advantage, both behaviorally and physiologically."

#SciComm by @GrrlScientist

#parrots #ToolUse #Behavior #cognition #Disability
https://www.forbes.com/sites/grrlscientist/2026/04/25/how-bruce-the-broken-beaked-kea-became-king-of-his-circus/