Mastodawn

Federico Viticci

Claude Opus 4.6 has definitely been nerfed in the last few days, especially in the Claude app.

Here's a comparison with Claude Code in Telegram, where I manually set Max reasoning effort.

(I hate that Anthropic does this regularly and keeps pretending it doesn't.)

Show thread

Federico Viticci

Apr 12

Also saw some folks suggest switching back to Opus 4.5 and yep, whatever it is they've done to Opus 4.6, the older version isn't affected.

Show thread

Oliver Ames Apr 12

@viticci Damn, seriously? Fingers crossed code isn’t affected.

Show thread

David Boyd Apr 13

@oliverames @viticci Code is definitely affected

Show thread

Jeff Jessie Apr 13

@viticci had to try myself

Show thread

Son of a Sailor Apr 13

@viticci Maybe it assumed you work at the car wash… 😋

Show thread

Stephen Hackett Apr 12

@viticci is this like when apple slows down my iphone before a new one comes out

@ismh86 yes

Franklin Delano Stallone Apr 13

@viticci Both make assumptions as to why you're going to the car wash. It also gave me a different response. Not that it matters, I have no car so the drive option is always wrong (I have gone to a car wash despite this too).

If you ask if the reason you're going matters, then it says you obviously need to drive if you're washing your car so that seems fairly reasonable, imo.

Show thread

Eshu Marneedi Apr 13

@viticci At this point, I only use Opus 4.6 when I need to search the web (since its web search tool, and generally its tool calling, are markedly improved). Opus 4.5 is such a great model *and* doesn’t get nerfed by Anthropic whenever they want.

Show thread

Matt Brown Apr 13

@viticci it’s a confusing question tbh. If a user is asking the question it presupposes that the car is not a necessity for the activity. Kinda like asking “should I walk or take my bike to the velodrome”

Show thread

Jorge Salvador Caffarena Apr 13

@mattbrowndev @viticci it certainly can make an actual intelligent being question back “do you need to wash your car or just go there?” But that is not what the LLM does, because it does not think.

Show thread

Joe Fabisevich

Apr 13

@viticci I think this test is too simple to be evidence of nerfing. I tried the car wash experiment a few months ago and got drive 60% of the time and walk 40% of the time, the results are just not deterministic.

Show thread

ArnoudB Apr 13

@mergesort @viticci Exactly. In this case, I ended up in the bracket where it suggests both:

Show thread

Michael Brown Apr 13

@arnoudb @mergesort @viticci that answer is actually the correct one. The question doesn’t specify *why* you want to go to the car wash. It’s probably to wash the car, but it might be just to meet with someone who works there, or meet someone else who is washing their car.

Show thread

Alex Rosenberg Apr 13

@mergesort @viticci Exactly. LLMs are non-deterministic.

1. ML engineers don't bother getting all the math to be identical across all runtime scenarios.
2. They intentionally add 'temperature' to make it a bit more random.

Show thread

Jorge Salvador Caffarena Apr 13

@viticci companies and individuals are, once again, self putting shackles to become slaves to the tech companies, I guess we don’t learn

Show thread

Carter Apr 13

@viticci read this post while listening to Sacred Realms talk about TotK 🤯