Who's played with https://paperclip.ing/ ? Not sure how I wandered across it and I'll get to playing with it eventually, but wondered if anyone had any hot takes on it?
Who's played with https://paperclip.ing/ ? Not sure how I wandered across it and I'll get to playing with it eventually, but wondered if anyone had any hot takes on it?
Microsoft Research (@MSFTResearch)
AsgardBench는 시각 관찰을 바탕으로 임베디드 에이전트가 작업 중 계획을 수정할 수 있는지 평가하는 벤치마크다. 지각 기반 계획 능력에 초점을 맞춰 에이전트의 한계를 드러내고, 신뢰성 향상에 필요한 개선 방향을 제시한다.

AsgardBench evaluates whether embodied agents can revise their plans based on visual observations as tasks unfold. By focusing on perception-driven planning, it exposes key limitations and guides improvements in agent reliability. https://t.co/6jAXzgCLvH
Z.ai for Startups (@ZaiforStartups)
CodeBuddy와 GLM이 싱가포르에서 글로벌 AI 해커톤을 개최한다. 참가자들은 AI 에이전트를 실제로 만들고 배포하며, 무엇이 문제를 일으키는지 확인하는 방식으로 개발을 진행한다. 1,000달러 이상 상금과 멘토링이 제공되며, 4월 20일까지 지원 가능하다.
BOOTOSHI (@KingBootoshi)
한 에이전트가 문제를 해결하지 못하면 다른 에이전트를 추가로 투입해 서로 다른 LLM 모델의 제안을 비교·보완하는 방식으로 더 많은 컴퓨팅을 활용할 수 있다고 소개한다. 여러 모델을 조합해 다양한 해법을 얻는 실용적인 에이전트 활용 팁이다.

you guys know you can throw more compute at a problem yourself right? if one agent couldn't solve it, throw a diff agent at it who can then see the proposed solutions and offer a variety of different ones BIG help especially when they're different LLM models works everytime!
Will agent clusters running agent clusters eat the whole?
#GenAI #AI #Agents #Software #Technology #Programming #SoftwareDevelopment #Coding #SoftwareEngineering
Claude skill that evaluates B2B vendors by talking to their AI agents
https://github.com/salespeak-ai/buyer-eval-skill
#HackerNews #Claude #B2B #vendors #AI #agents #evaluation #sales #technology

B2B software vendor evaluation skill for Claude Code — domain-expert questions, vendor AI agent conversations, evidence-based scoring - salespeak-ai/buyer-eval-skill
Andrej Karpathy's recent podcast interview is worth your time
Key ideas: agent orchestration over single-session prompting,
AutoResearch loops that remove human researcher from hyperparameter tuning, and a prediction that digital transformation leads while physical robotics lags by years
His take on open source (~6-8 months behind frontier) being a healthy power balance is worth sitting with
"Centralization has a very poor track record." Hard to argue