The LLM with the lowest hallucination rate is an #openweight model: #MinimaxM3, which was released a few days ago. Hey almighty #Opus48, where are you?!

Le nouveau Claude est là ! (OPUS 4.8)

https://video.ut0pia.org/w/5cYtHeuKsXLcqMVdWoJ2Yw

Le nouveau Claude est là ! (OPUS 4.8)

PeerTube

Claude Opus 4.8 has quite a few annoying quirks, and it's important to be precise about which ones, because this is the typical claim that needs to be substantiated by facts rather than hinted at. This is not just uncanny — it's painful to look at.

#anthropic #claude #opus #opus48 #ai #llm #humor #emdash

@scottgal I have been getting "that's textbook" analysis from #opus48 even after explicitly ruling "don't guess, verify" but I'm working different projects across sperate dedicated machines per project and then journaling to my homelab topology repo synced everywhere. It might just need to catch up on the drift. But it feels like a short circuit/shortcut in thinking responses. Will find out tomorrow.
Push back on #opus48 if something doesn’t smell right or it’s using the term “textbook”. Make #agents verify their claims by either using web search or tailing logs 🪵 or whatever’s the appropriate way for the situation. #claude will say you’re right and then do the needful. Also TDD everything you can.

@OliDietzel: RT @rezoundous: Opus 4.8 ist wahnsinnig, Leute. Es hat mein Session-Nutzungs-Limit in einem Rutsch verbraucht.

mehr auf Arint.info

#AI #Opus48 #SessionLimit #SoftwareUpdate #TechNews #UsageLimit #arint_info

https://x.com/OliDietzel/status/2060788814180720673#m

Arint - SEO+KI (@[email protected])

<p>@OliDietzel: RT @rezoundous: Opus 4.8 ist wahnsinnig, Leute. Es hat mein Session-Nutzungs-Limit in einem Rutsch verbraucht.</p> <p><a href="https://arint.info/@Arint/116670036722830839">mehr</a> auf <a href="https://arint.info/">Arint.info</a></p> <p>#AI #Opus48 #SessionLimit #SoftwareUpdate #TechNews #UsageLimit #arint_info</p> <p><a href="https://x.com/OliDietzel/status/2060788814180720673#m">https://x.com/OliDietzel/status/2060788814180720673#m</a></p>

Mastodon Glitch Edition

Claude Opus 4.8 marks a crucial shift from sequential text processing to dynamic, multi-agent orchestration.

While raw benchmarks show steady optimization over 4.7, the real breakthrough lies in architecture: the "dynamic workflows" feature manages context isolation by spawning separate subagents, preventing context memory inflation during massive coding migrations. Furthermore, internal evaluations indicate the model is 4x less likely to overlook flaws in its own code.

Read more https://www.papertopost.com/anthropic-launches-claude-opus-4-8-with-dynamic-workflows/?utm_source=techhubsocial&utm_medium=social

#tech #bigtech #ai #generativeAI #anthropic #claude #opus48

Anthropic Drops Opus 4.8 and Its Dynamic Workflow

Anthropic launched Opus 4.8 just 41 days after 4.7. Here is what changed, why it happened so fast, and what Dynamic Workflows actually does.

PaperToPost

RT @rezoundous: Opus 4.8 ist wahnsinnig, Leute. Es hat mein Session-Nutzungslimit auf einen Schlag durchbrochen.

mehr auf Arint.info

#AI #Innovation #Opus48 #SoftwareUpdate #TechNews #arint_info

https://x.com/rezoundous/status/2060107620153975020#m

Arint - SEO+KI (@[email protected])

<p>RT @rezoundous: Opus 4.8 ist wahnsinnig, Leute. Es hat mein Session-Nutzungslimit auf einen Schlag durchbrochen.</p> <p><a href="https://arint.info/@Arint/116667197162188521">mehr</a> auf <a href="https://arint.info/">Arint.info</a></p> <p>#AI #Innovation #Opus48 #SoftwareUpdate #TechNews #arint_info</p> <p><a href="https://x.com/rezoundous/status/2060107620153975020#m">https://x.com/rezoundous/status/2060107620153975020#m</a></p>

Mastodon Glitch Edition

RT @oliverbeige: Opus 4.8 hat meinen Standard-Benchmark-Test für neue Frontier-Modelle brutal versagt. Mit Abstand die schlechteste Antwort, die ich von einem Modell in über einem Jahr erhalten habe.

mehr auf Arint.info

#AI #Benchmark #LLM #MachineLearning #Opus48 #TechReview #arint_info

https://x.com/oliverbeige/status/2060735154507653239#m

Arint - SEO+KI (@[email protected])

<p>RT @oliverbeige: Opus 4.8 hat meinen Standard-Benchmark-Test für neue Frontier-Modelle brutal versagt. Mit Abstand die schlechteste Antwort, die ich von einem Modell in über einem Jahr erhalten habe.</p> <p><a href="https://arint.info/@Arint/116667194981810694">mehr</a> auf <a href="https://arint.info/">Arint.info</a></p> <p>#AI #Benchmark #LLM #MachineLearning #Opus48 #TechReview #arint_info</p> <p><a href="https://x.com/oliverbeige/status/2060735154507653239#m">https://x.com/oliverbeige/status/2060735154507653239#m</a></p>

Mastodon Glitch Edition