Mastodawn

The LLM with the lowest hallucination rate is an #openweight model: #MinimaxM3, which was released a few days ago. Hey almighty #Opus48, where are you?!

Bleme 12h ago

Le nouveau Claude est là ! (OPUS 4.8)

https://video.ut0pia.org/w/5cYtHeuKsXLcqMVdWoJ2Yw

Le nouveau Claude est là ! (OPUS 4.8)

PeerTube

Tommaso Gagliardoni 13h ago

Claude Opus 4.8 has quite a few annoying quirks, and it's important to be precise about which ones, because this is the typical claim that needs to be substantiated by facts rather than hinted at. This is not just uncanny — it's painful to look at.

#anthropic #claude #opus #opus48 #ai #llm #humor #emdash

YAYAFA 20h ago

Claude Codeの「Plan mode」を捨てる選択──grill-meとADRで設計のブレを抑える新潮流 — BigGo ファイナンス https://www.yayafa.com/2814941/ #AdrManagerスキル #ADR（ArchitectureDecisionRecords） #AgenticAi #AI #Anthropic #ArtificialGeneralIntelligence #ArtificialIntelligence #ClaudeCode #ClaudeDesign #Fixer #GrillMeスキル #Haiku45 #MagicLink #Opus47 #Opus48 #PlanMode #Sonnet46 #SupabaseAuth #エージェント型AI #人工知能 #汎用人工知能

Show thread

Chris Pelatari 1d ago

@scottgal I have been getting "that's textbook" analysis from #opus48 even after explicitly ruling "don't guess, verify" but I'm working different projects across sperate dedicated machines per project and then journaling to my homelab topology repo synced everywhere. It might just need to catch up on the drift. But it feels like a short circuit/shortcut in thinking responses. Will find out tomorrow.

Show thread

Chris Pelatari 1d ago

Push back on #opus48 if something doesn’t smell right or it’s using the term “textbook”. Make #agents verify their claims by either using web search or tailing logs 🪵 or whatever’s the appropriate way for the situation. #claude will say you’re right and then do the needful. Also TDD everything you can.

Arint - SEO+KI 4d ago

@OliDietzel: RT @rezoundous: Opus 4.8 ist wahnsinnig, Leute. Es hat mein Session-Nutzungs-Limit in einem Rutsch verbraucht.

mehr auf Arint.info

#AI #Opus48 #SessionLimit #SoftwareUpdate #TechNews #UsageLimit #arint_info

https://x.com/OliDietzel/status/2060788814180720673#m

Arint - SEO+KI (@[email protected])

@OliDietzel: RT @rezoundous: Opus 4.8 ist wahnsinnig, Leute. Es hat mein Session-Nutzungs-Limit in einem Rutsch verbraucht. <a href="https://arint.info/@Arint/116670036722830839">mehr</a> auf <a href="https://arint.info/">Arint.info</a> #AI #Opus48 #SessionLimit #SoftwareUpdate #TechNews #UsageLimit #arint_info <a href="https://x.com/OliDietzel/status/2060788814180720673#m">https://x.com/OliDietzel/status/2060788814180720673#m</a>

Mastodon Glitch Edition

PaperToPost 4d ago

Claude Opus 4.8 marks a crucial shift from sequential text processing to dynamic, multi-agent orchestration.

While raw benchmarks show steady optimization over 4.7, the real breakthrough lies in architecture: the "dynamic workflows" feature manages context isolation by spawning separate subagents, preventing context memory inflation during massive coding migrations. Furthermore, internal evaluations indicate the model is 4x less likely to overlook flaws in its own code.

#tech #bigtech #ai #generativeAI #anthropic #claude #opus48

Anthropic Drops Opus 4.8 and Its Dynamic Workflow

Anthropic launched Opus 4.8 just 41 days after 4.7. Here is what changed, why it happened so fast, and what Dynamic Workflows actually does.

PaperToPost

Arint - SEO+KI 5d ago

RT @rezoundous: Opus 4.8 ist wahnsinnig, Leute. Es hat mein Session-Nutzungslimit auf einen Schlag durchbrochen.

mehr auf Arint.info

#AI #Innovation #Opus48 #SoftwareUpdate #TechNews #arint_info

https://x.com/rezoundous/status/2060107620153975020#m

Arint - SEO+KI (@[email protected])

RT @rezoundous: Opus 4.8 ist wahnsinnig, Leute. Es hat mein Session-Nutzungslimit auf einen Schlag durchbrochen. <a href="https://arint.info/@Arint/116667197162188521">mehr</a> auf <a href="https://arint.info/">Arint.info</a> #AI #Innovation #Opus48 #SoftwareUpdate #TechNews #arint_info <a href="https://x.com/rezoundous/status/2060107620153975020#m">https://x.com/rezoundous/status/2060107620153975020#m</a>

Mastodon Glitch Edition

Arint - SEO+KI 5d ago

RT @oliverbeige: Opus 4.8 hat meinen Standard-Benchmark-Test für neue Frontier-Modelle brutal versagt. Mit Abstand die schlechteste Antwort, die ich von einem Modell in über einem Jahr erhalten habe.

mehr auf Arint.info

#AI #Benchmark #LLM #MachineLearning #Opus48 #TechReview #arint_info

https://x.com/oliverbeige/status/2060735154507653239#m

Arint - SEO+KI (@[email protected])

RT @oliverbeige: Opus 4.8 hat meinen Standard-Benchmark-Test für neue Frontier-Modelle brutal versagt. Mit Abstand die schlechteste Antwort, die ich von einem Modell in über einem Jahr erhalten habe. <a href="https://arint.info/@Arint/116667194981810694">mehr</a> auf <a href="https://arint.info/">Arint.info</a> #AI #Benchmark #LLM #MachineLearning #Opus48 #TechReview #arint_info <a href="https://x.com/oliverbeige/status/2060735154507653239#m">https://x.com/oliverbeige/status/2060735154507653239#m</a>

Mastodon Glitch Edition