Reading List 6

This week’s reading list is a mix of high-level theory and low-level pragmatism. I found myself bouncing between the philosophical implications of how we build AI and the immediate satisfaction of writing a good Go component.

[article] The Century-Long Pause in Fundamental Physics. The author argues that physics has stagnated by swapping “ontology-first” theory for mathematical models that merely fit data. This debate perfectly mirrors current machine learning disputes about whether LLMs build internal world models or just pattern-match at scale, which is the open empirical front currently being adjudicated in mechanistic interpretability.

[release] Onyx Has Released a New Remote Page Turner Called Tappy. I wish Amazon would support page turners for their Kindle line. It would be great if they supported a device as delightful as this one.

[blog] The agent principal-agent problem. This is a great look at one of the biggest problems with agentic development: code review. In my open source work, I now use a pattern where I work with an agent to make a change, test it locally, and create a pull request before having another agent review the code. This back-and-forth works well and keeps a good balance of mental state for the codebase and efficiency.

[article] ReMarkable Paper Pure wants to be the only notebook you’ll ever need. I have always liked the reMarkable tablets, but every time I try one I miss having my Kindle library alongside it. Reading and writing are deeply linked for me, which is why I recently got a Kindle Scribe Colorsoft and found it really hits the mark for what I want.

[blog] Just Fucking Use Go. I have been working on a project that has a Go component to it recently. This is the first time I have really started to look at the language, and it inspires me to spend more time with it.

I built my 7MB Full AI Terminal in Rust & Tauri. This is a neat open source AI terminal. It feels similar to Warp but is a lot smaller.

[article] Computer Use Is 45x More Expensive Than Structured APIs. I am not surprised at all by these findings. I think computer use will remain a last resort, and a lot of apps will expose some kind of API for an agent to use instead. My guess is that this eventually becomes the way we automate unmaintained applications that need to fit into an agentic workflow.

#Agents #AI #Clippings #Developer #Hardware #OpenSource #Tools
Refero Styles — Design Systems for AI Agents

Search a curated DESIGN.md library for AI agents: colors, typography, spacing, and component patterns from top websites.

Refero Styles

Agentic Systems

Notes and resources on building and operating agentic AI systems, covering orchestration frameworks, task routing, memory, and evaluation approaches that extend baseline LLM capabi(...)

#agents #ai #orchestration

https://taoofmac.com/space/ai/agentic?utm_content=atom&utm_source=mastodon&utm_medium=social

Faruk Guney (@farukguney)

solo founder가 acilabs.ai를 만든 이유로, LLM은 컨텍스트가 길어질수록 성능이 떨어지며 이를 '잘못된 해법'으로 보고 있다고 주장합니다. 긴 컨텍스트를 단순히 늘리기보다, 학습과 추론을 어떻게 관리·조율할지에 초점을 둔 새로운 접근이 필요하다는 문제의식을 제시합니다.

https://x.com/farukguney/status/2056032664176455744

#llm #contextwindow #ai #agents

Faruk Guney (@farukguney) on X

@rohanpaul_ai That’s why i built https://t.co/ukEiGfLUmf as a solo founder. You don’t have to be a scientist to know and experience that llms get worse as context grows. It is the wrong solution to the problem. we need a hippo campus for the ai gigantic brain. one that governs learning and

X (formerly Twitter)

Erica (@ericavaneee)

실세계 경제 협상에서 LLM 에이전트를 평가하는 3단계 벤치마크 TERMS-Bench를 공개했다. LLM-as-judge나 결과 기반 루브릭 없이, 환경 자체를 검증자로 사용한다. 프론티어 모델 중 Claude Opus 4.6이 1위, GLM 5.1이 2위로 언급됐다.

https://x.com/ericavaneee/status/2055868536099381638

#llm #agents #benchmark #evaluation #anthropic

Erica (@ericavaneee) on X

We built TERMS-Bench, a three-tier benchmark for LLM agents in real-world economic negotiation. No LLM-as-judge, no outcome rubrics: the environment itself is the verifier. 🏆Among frontier models, @AnthropicAI Claude Opus 4.6 #1, @Zai_org GLM 5.1 #2. ✨Surprisingly strong:

X (formerly Twitter)

Jacopo Nardiello (@jnardiello)

@ivanfioravanti에게 DS4가 SOTA 모델의 지시에 맞춰 실제 업무를 처리할 수 있는지 묻는 내용입니다. 로컬 추론이나 에이전트형 워크플로에서 “강한 모델 + 적절한 스티어링”이 실무적으로 유효한지에 대한 질문으로, LLM 에이전트 활용 관점에서 의미가 있습니다.

https://x.com/jnardiello/status/2055997688714346709

#llm #agents #inference #sota

Jacopo Nardiello (@jnardiello) on X

@antirez @deepseek_ai Quick question for @ivanfioravanti - what is your take on this? Can DS4 get real work done when steered by a sota model?

X (formerly Twitter)

Bindu Reddy (@bindureddy)

Agent Swarms라는 멀티에이전트 시스템이 소개되었으며, 프론트엔드·백엔드·비전·효율화 등 역할별로 여러 최상위 LLM(Claude Opus 4.7, GPT-5.5, Gemini Pro, Kimi, DeepSeek)을 조합해 복잡한 소프트웨어 개발과 마케팅 업무까지 수행하는 방향을 제시한다. 멀티모델 기반 에이전트 오케스트레이션 사례로 볼 수 있다.

https://x.com/bindureddy/status/2055698703877329014

#multiagent #agents #llm #software #automation

Bindu Reddy (@bindureddy) on X

🚨 Agent Swarms Are Multi-Agent Systems That Create Agents with Top LLMs Including - Opus 4.7 for front-end - GPT 5.5 for backend - Gemini Pro for visual understanding - Kimi and DeepSeek for efficiency With Agent Swarms, AI can build complex software, run marketing teams and

X (formerly Twitter)

Python Trending (@pythontrending)

모든 소프트웨어를 에이전트 네이티브하게 다루겠다는 CLI-Anything 프로젝트가 공개됐다. CLI-Hub와 GitHub 저장소가 함께 언급된 것으로 보아, 다양한 CLI 도구를 에이전트 워크플로우에 연결하는 개발자용 인프라/프레임워크 성격의 도구다.

https://x.com/pythontrending/status/2055599823202426895

#cli #agents #developertools #automation #opensource

Python Trending 🇺🇦 (@pythontrending) on X

CLI-Anything - "CLI-Anything: Making ALL Software Agent-Native" -- CLI-Hub: https://t.co/FMbm91idT1 https://t.co/CWpzmewjcA

X (formerly Twitter)

Reflexion splits self-correction in two: an Evaluator that detects success/failure, and a Self-Reflection model that diagnoses what went wrong. The Evaluator's external signal — heuristic, exact-match, or test execution — gates whether diagnosis fires. When that signal misfires, as on MBPP Python's high false-negative rate, Self-Reflection rewrites correct code wrong, exactly the failure mode Cannot-Self-Correct documented.

https://benjaminhan.net/posts/20260516-reflexion/?utm_source=mastodon&utm_medium=social

#LLMs #AI #Reasoning #Agents #Metacognition

Reflexion: Language Agents with Verbal Reinforcement Learning – synesis

An LLM agent that converts environment feedback into natural-language reflections stored in episodic memory beats strong baselines on AlfWorld, HotPotQA, and HumanEval without updating any weights.

synesis