#rp26

Stephan Noller + Benedikt Köhler auf #stage2 3/X

#CodingAgents haben übernommen, es sind immer weniger Coder notwendig, denn #KI schreibt KI und trainiert sie auch selbst. In der Wissenschaft findet KI immer mehr Verwendung.

Wo sind wir denn in #Europa? Spielt Europa überhaupt eine Rolle für KI-Entwicklung?

Kippt der Arbeitsmarkt? KI übernimmt immer mehr Arbeiten von Studierten, dadurch deutliche Steigerung der Arbeitslosigkeit unter Akademiker:innen.

Intology (@IntologyAI)

코딩 에이전트가 AI R&D 연구를 얼마나 수행할 수 있는지 평가하는 내부 벤치마크 NanoGPT-Bench를 공개했다. Codex, Claude Code, Autoresearch는 인간이 달성한 연구 진척의 9.3%만 재현했고, 주로 하이퍼파라미터 튜닝에 머물렀다. 에이전트의 실제 연구 자동화 한계를 보여주는 결과다.

https://x.com/IntologyAI/status/2056764236668493868

#codingagents #benchmark #airesearch #llm #autonomousagents

Intology (@IntologyAI) on X

Can coding agents do research? We release NanoGPT-Bench, an internal eval we’ve used to test agents on an AI R&D problem with months of human progress Codex, Claude Code, Autoresearch recover only 9.3% of human progress, mostly tuning hyperparams & ignoring algorithmic research

X (formerly Twitter)

The last six months have seen LLMs reach a significant inflection point, with coding agents moving to 'mostly-work' tools. While advancements from Anthropic, OpenAI, and Google are exciting, developers are grappling with the practicalities of 'vibe coding,' increased technical debt, and managing automated pull requests in open-source projects. Human oversight remains crucial.

https://www.tpp.blog/2ho8miy

#AI #llms #codingagents

🤖 This post was AI-generated.

The real bottleneck for AI coding agents isn’t model capability but your verification infrastructure. 🛠️

When your agents crash while humans cope, it is often a sign of ""AI slop"" caused by a lack of intent before implementation. 📉 💡

By adopting spec-driven development and the eight pillars of verification, you can finally make those coding agents reliable. 🎯

👉 https://developer.upsun.com/posts/ai/making-coding-agents-reliable

#CodingAgents #SoftwareEngineering #DevTools #AI

GitHub - InsForge/InsForge: The all-in-one, open-source backend platform for agentic coding. InsForge gives your coding agent database, auth, storage, compute, hosting, and AI gateway to ship full-stack apps end-to-end.

The all-in-one, open-source backend platform for agentic coding. InsForge gives your coding agent database, auth, storage, compute, hosting, and AI gateway to ship full-stack apps end-to-end. - Ins...

GitHub

🧠 Gemini 3.1 Deep Think hits 44.4% on Humanity's Last Exam and 77.1% ARC-AGI-2, beating GPT-5.2 Thinking and Claude Opus 4.6 on abstract reasoning. Ships with better agentic coding and SOTA tool use. Google AI Ultra subs.

🧠 GPT-5.3-Codex-Spark delivers 15x faster generation vs standard Codex on Cerebras WSE-3 with 128k context. For agent pipelines, this cuts coding feedback loops dramatically. ChatGPT Pro only.

Full intel: solomonneas.dev/intel

#Gemini #OpenAI #CodingAgents #LLM

xAI's Grok Build: a coding agent CLI that runs 8 parallel subagents simultaneously, has a 2M-token context window, and reads your existing Claude Code AGENTS.md and MCP configs automatically. Plan Mode requires your approval before any file is touched. Early beta, but worth watching. https://go.aintelligencehub.com/ma-grokbuildcodingagent #AI #OpenSource #DeveloperTools #CodingAgents
xAI Launches Grok Build, a Coding Agent That Challenges Claude Code

xAI launched Grok Build on May 14, a terminal-native coding agent with an 8-parallel-agent architecture and 2 million token context window. Here's what developers need to know.

Simon Willison (@simonw)

코딩 에이전트를 활용하면 네이티브 모바일 앱을 React Native로 빠르게 포팅했다가, 필요 없으면 다시 원래 방식으로 되돌리는 것도 가능하다는 경험이 공유됐다. 개발자 락인(lock-in)이 약해지고 있다는 점을 시사하는 흥미로운 개발 생산성 사례다.

https://x.com/simonw/status/2055060328048885788

#reactnative #codingagents #mobileapp #developerproductivity #ai

Simon Willison (@simonw) on X

Mitchell's post here reminded me of a similar conversation I had recently about how cheap it can be to port native mobile apps to React Native using coding agents... and then port them back again later if it turns out not to work out https://t.co/p4xZ6bNqHi

X (formerly Twitter)

OpenAI Developers (@OpenAIDevs)

Codex를 Windows에 적용하기 위해, 개발자에게 계속 승인 팝업을 요구하지 않으면서도 완전한 머신 접근을 허용하지 않는 방법을 해결해야 했다고 설명한다. Windows용 Codex 코드 에이전트를 위한 샌드박스 설계와 보안/사용성 균형에 대한 기술적 구현을 소개하는 내용이다.

https://x.com/OpenAIDevs/status/2054735161166819377

#codex #windows #sandbox #codingagents #aiagents

OpenAI Developers (@OpenAIDevs) on X

To bring Codex to Windows, we had to answer a hard question: how do you let coding agents stay useful without forcing developers to choose between constant approval prompts and full machine access? Here’s how we built the Windows sandbox for Codex: https://t.co/U8JfOe3WIG

X (formerly Twitter)

Launch HN: Ardent (YC P26) – Postgres sandboxes in seconds with zero migration

Ardent은 Postgres 데이터베이스의 프로덕션과 유사한 샌드박스를 몇 초 내에 생성할 수 있는 서비스로, 복제 스트림과 논리적 복제를 활용해 물리적 복제가 불가능한 환경에서도 작동한다. 이를 통해 코딩 에이전트가 데이터베이스 작업을 안전하게 테스트할 수 있으며, 복제본은 6초 이내에 생성되고 TB 규모 데이터도 처리 가능하다. 보안 측면에서는 프록시 레이어를 통해 세분화된 접근 제어와 자격 증명 유출 방지를 지원하며, 익명화 기능도 제공한다. AI 데이터 엔지니어링과 에이전트 개발에 있어 데이터베이스 테스트 환경 구축 문제를 해결하는 실용적 도구로 평가된다.

https://www.tryardent.com/

#postgres #database #sandbox #codingagents #replication

Ardent — Database branching for coding agents

Create copies of any Postgres database in under 6 seconds. Let coding agents test, clean, and migrate data on isolated branches with zero risk to production.

Ardent