Project Glasswing: Securing critical software for the AI era
Project Glasswing: Securing critical software for the AI era
The system card for Claude Mythos (PDF): https://www-cdn.anthropic.com/53566bf5440a10affd749724787c89...
Interesting to see that they will not be releasing Mythos generally. [edit: Mythos Preview generally - fair to say they may release a similar model but not this exact one]
I'm still reading the system card but here's a little highlight:
> Early indications in the training of Claude Mythos Preview suggested that the model was
likely to have very strong general capabilities. We were sufficiently concerned about the
potential risks of such a model that, for the first time, we arranged a 24-hour period of
internal alignment review (discussed in the alignment assessment) before deploying an
early version of the model for widespread internal use. This was in order to gain assurance
against the model causing damage when interacting with internal infrastructure.
and interestingly:
> To be explicit, the decision not to make this model generally available does _not_ stem from
Responsible Scaling Policy requirements.
Also really worth reading is section 7.2 which describes how the model "feels" to interact with. That's also what I remember from their release of Opus 4.5 in November - in a video an Anthropic employee described how they 'trusted' Opus to do more with less supervision. I think that is a pretty valuable benchmark at a certain level of 'intelligence'. Few of my co-workers could pass SWEBench but I would trust quite a few of them, and it's not entirely the same set.
Also very interesting is that they believe Mythos is higher risk than past models as an autonomous saboteur, to the point they've published a separate risk report for that specific threat model: https://www-cdn.anthropic.com/79c2d46d997783b9d2fb3241de4321...
The threat model in question:
> An AI model with access to powerful affordances within an
organization could use its affordances to autonomously exploit,
manipulate, or tamper with that organization’s systems or
decision-making in a way that raises the risk of future
significantly harmful outcomes (e.g. by altering the results of AI
safety research).
Just reading this, the inevitable scaremongering about biological weapons comes up.
Since most of us here are devs, we understand that software engineering capabilities can be used for good or bad - mostly good, in practice.
I think this should not be different for biology.
I would like to reach out and talk to biologists - do you find these models to be useful and capable? Can it save you time the way a highly capable colleague would?
Do you think these models will lead to similar discoveries and improvements as they did in math and CS?
Honestly the focus on gloom and doom does not sit well with me. I would love to read about some pharmaceutical researcher gushing about how they cut the time to market - for real - with these models by 90% on a new cancer treatment.
But as this stands, the usage of biology as merely a scaremongering vehicle makes me think this is more about picking a scary technical subject the likely audience of this doc is not familiar with, Gell-Mann style.
IF these models are not that capable in this regard (which I suspect), this fearmongering approach will likely lead to never developing these capabilities to an useful degree, meaning life sciences won't benefit from this as much as it could.
> Just reading this, the inevitable scaremongering about biological weapons comes up.
It's very easy to learn more about this if it's seriously a question you have.
I don't quite follow why you think that you are so much more thoughtful than Anthropic/OpenAI/Google such that you agree that LLMs can't autonomously create very bad things but—in this area that is not your domain of expertise—you disagree and insist that LLMs cannot create damaging things autonomously in biology.
I will be charitable and reframe your question for you: is outputting a sequence of tokens, let's call them characters, by LLM dangerous? Clearly not, we have to figure out what interpreter is being used, download runtimes etc.
Is outputting a sequence of tokens, let's call them DNA bases, by LLM dangerous? What if we call them RNA bases? Amino acids? What if we're able to send our token output to a machine that automatically synthesizes the relevant molecules?
>It's very easy to learn more about this if it's seriously a question you have.
No, it's not. It took years of polishing by software engineers, who understand this exact profession to get models where they are now.
Despite that, most engineers were of the opinion, that these models were kinda mid at coding, up until recently, despite these models far outperforming humans in stuff like competitive programming.
Yet despite that, we've seen claims going back to GPT4 of a DANGEROUS SUPERINTELLIGENCE.
I would apply this framework to biology - this time, expert effort, and millions of GPU hours and a giant corpus that is open source clearly has not been involved in biology.
My guess is that this model is kinda o1-ish level maybe when it comes to biology? If biology is analogous to CS, it has a LONG way to go before the median researcher finds it particularly useful, let alone dangerous.
>>It's very easy to learn more about this if it's seriously a question you have.
>No, it's not. It took years of polishing by software engineers, who understand this exact profession to get models where they are now
This reads as defensive. The thing that is easy to learn is 'why are biology ai LLMs dangerous chatgpt claude'. I have never googled this before, so I'll do this with the reader, live. I'm applying a date cutoff of 12/31/24 by the way.
Here, dear reader, are the first five links. I wish I were lying about this:
- https://sciencebusiness.net/news/ai/scientists-grapple-risk-...
- https://www.governance.ai/analysis/managing-risks-from-ai-en...
- https://gssr.georgetown.edu/the-forum/topics/biosec/the-doub...
- https://www.vox.com/future-perfect/23820331/chatgpt-bioterro...
- https://www.reddit.com/r/ClaudeAI/comments/1de8qkv/awareness...
I don't know about you, but that counts as easy to me.
-----
> I would apply this framework to biology - this time, expert effort, and millions of GPU hours and a giant corpus that is open source clearly has not been involved in biology.
I've been getting good programming and molecular biology results out of these back to GPT3.5.
I don't know what to tell you—if you really wanted to understand the importance, you'd know already.
In June, a group of scientists at Harvard University and the Massachusetts Institute of Technology released details of an experiment that will send shivers down the spine of everyone who lived through the COVID-19 pandemic.