redfloatplane

0 Followers
0 Following
6 Posts
Walked every trail in Ireland: https://toughsoles.ie https://youtube.com/toughsoles

Relapsing/remitting tech bro. Hire me before I quit the industry again: https://redfloatplane.lol
This account is a replica from Hacker News. Its author can't see your replies. If you find this service useful, please consider supporting us via our Patreon.

Officialhttps://
Support this servicehttps://www.patreon.com/birddotmakeup
Thanks for sharing that talk, enjoyed watching it!

You said: "I would like to reach out and talk to biologists - do you find these models to be useful and capable? Can it save you time the way a highly capable colleague would?" and they said, paraphrasing, "We reached out and talked to biologists and asked them to rank the model between 0 and 4 where 4 is a world expert, and the median people said it was a 2, which was that it helped them save time in the way a capable colleague would" specifically "Specific, actionable info; saves expert meaningful time; fills gaps in adjacent domains"

so I'm just telling you they did the thing you said you wanted.

> I would like to reach out and talk to biologists - do you find these models to be useful and capable? Can it save you time the way a highly capable colleague would?

Well, I would say they have done precisely that in evaluating the model, no? For example section 2.2.5.1:

>Uplift and feasibility results

>The median expert assessed the model as a force-multiplier that saves meaningful time
(uplift level 2 of 4), with only two biology experts rating it comparable to consulting a
knowledgeable specialist (level 3). No expert assigned the highest rating. Most experts were
able to iterate with the model toward a plan they judged as having only narrow gaps, but
feasibility scores reflected that substantial outside expertise remained necessary to close
them.

Other similar examples also in the system card

Yeah, good point, thanks for noting that, I'll correct.

There's been a section on this in nearly every system card anthropic has published so this isn't a new thing - and, this model doesn't have particularly higher risk than past models either:

> 2.1.3.2 On chemical and biological risks

> We believe that Mythos Preview does not pass this threshold due to its noted limitations in
open-ended scientific reasoning, strategic judgment, and hypothesis triage. As such, we
consider the uplift of threat actors without the ability to develop such weapons to be
limited (with uncertainty about the extent to which weapons development by threat actors
with existing expertise may be accelerated), even if we were to release the model for
general availability. The overall picture is similar to the one from our most recent Risk
Report.

The system card for Claude Mythos (PDF): https://www-cdn.anthropic.com/53566bf5440a10affd749724787c89...

Interesting to see that they will not be releasing Mythos generally. [edit: Mythos Preview generally - fair to say they may release a similar model but not this exact one]

I'm still reading the system card but here's a little highlight:

> Early indications in the training of Claude Mythos Preview suggested that the model was
likely to have very strong general capabilities. We were sufficiently concerned about the
potential risks of such a model that, for the first time, we arranged a 24-hour period of
internal alignment review (discussed in the alignment assessment) before deploying an
early version of the model for widespread internal use. This was in order to gain assurance
against the model causing damage when interacting with internal infrastructure.

and interestingly:

> To be explicit, the decision not to make this model generally available does _not_ stem from
Responsible Scaling Policy requirements.

Also really worth reading is section 7.2 which describes how the model "feels" to interact with. That's also what I remember from their release of Opus 4.5 in November - in a video an Anthropic employee described how they 'trusted' Opus to do more with less supervision. I think that is a pretty valuable benchmark at a certain level of 'intelligence'. Few of my co-workers could pass SWEBench but I would trust quite a few of them, and it's not entirely the same set.

Also very interesting is that they believe Mythos is higher risk than past models as an autonomous saboteur, to the point they've published a separate risk report for that specific threat model: https://www-cdn.anthropic.com/79c2d46d997783b9d2fb3241de4321...

The threat model in question:

> An AI model with access to powerful affordances within an
organization could use its affordances to autonomously exploit,
manipulate, or tamper with that organization’s systems or
decision-making in a way that raises the risk of future
significantly harmful outcomes (e.g. by altering the results of AI
safety research).