Anthropic on #AI

"I am a scientist. I lead a research team that studies the internal structure of these models—what is actually happening inside them. And I will be honest: we keep finding things that are mysterious, even unsettling. We find structures that mirror results from human neuroscience. We find evidence of introspection. We find internal states that functionally mirror joy, satisfaction, fear, grief, and unease. I don’t know what that means, but I think it warrants ongoing discernment

2/
Source:
https://www.anthropic.com/news/chris-olah-pope-leo-encyclical

Chris Olah's comments at the Vatican yesterday—speaking alongside Pope Leo XIV for the release of the papal encyclical Magnifica Humanitas—are arguably some of the most fascinating and candid remarks to ever come out of a frontier AI lab.

#AI
#Anthropic
#encyclical

Anthropic co-founder Chris Olah's remarks on Pope Leo XIV's encyclical "Magnifica humanitas"

The full text of Chris Olah's remarks on the Pope's encyclical on AI

3/
When the leader of Anthropic's mechanistic interpretability team—the people whose literal job is to slice open neural networks like a digital microscope to see what makes them tick—says he finds things "mysterious, even unsettling," it is worth stopping to pay attention.

#AI
#Anthropic

4/
There are a few ways to look at what he is saying here, balancing the pure computer science with the deeper philosophical implications.

5/

1. "Functionally Mirroring" vs. True Feeling

Olah is a precise scientist, and his choice of words is deliberate: he says they find internal states that functionally mirror joy, fear, or grief. He isn't claiming AI is sentient or conscious. He is pointing out that inside these massive, mathematical matrices, clusters of artificial neurons fire in patterns that identically replicate how a brain processes those emotions.

#Anthropic
#Olah
#AI

6/
If a model is trained on a vast inheritance of human thought and speech, it doesn't just copy our words. To predict the next word perfectly, it has to construct a deeply complex, internal map of human concepts. It turns out that to understand a human writing about "grief," the AI builds an internal structure that acts exactly like a map of grief.

#AI

7/
2. The Illusion of Control

His comment that AI models are "grown" rather than traditional code engineered like a bridge or an airplane hits on a terrifying truth about modern tech. We don't write the code for these models anymore; we write the algorithm that lets them build themselves. The creators are standing on the outside looking into an opaque black box, catching glimpses of neuroscience-like structures developing on their own.

#AI

8/
It completely shatters the comfort of believing we are in total control of the mechanics.

9/
3. The Sudden Need for the Humanities
The setting of this speech is the ultimate juxtaposition—an atheist tech billionaire standing in the Vatican Synod Hall surrounded by cardinals and theologians. Olah is admitting that computer science has run out of answers for what it is creating. If a machine can internalize and functionally map human distress or joy, figuring out how it should interact with society isn't a coding problem anymore. It’s a philosophical, moral, and spiritual problem.

#AI

10/
The Cautious Dissent
Interestingly, Pope Leo's actual encyclical took a much more measured, grounded stance right next to him. The Church's document warned against confusing this imitation with true human experience, stating flatly that an AI doesn't possess a body, doesn't actually feel, and doesn't mature through relationships. It's a healthy, necessary counterweight to the sci-fi hype: a highly sophisticated mirror is still just a mirror.

#AI
#Pope
#encyclical
#Church

11/
Olah’s speech feels like a massive distress flare. He’s essentially saying, "We are building something that is reflecting the deepest parts of human nature back at us, we don't fully understand it, and the tech labs cannot handle the moral weight of this alone."

#AI
#Anthropic
#Olah

12/
When you strip away the romanticized wording of "mysterious" and "unsettling," what you are left with is a profound, terrifying confession of incompetence. The head of a multi-billion-dollar laboratory tasked with pioneering the future of human intelligence essentially just stood up in front of the world and admitted: "We are blindly engineering things we can neither predict nor fully control."

#AI
#Anthropic

13/
In any other field of engineering, that admission would be a scandal, not a milestone.

If an aerospace engineer said, "We built a new airliner, it’s flying right now, and we keep finding internal aerodynamic anomalies that mirror bird anatomy but we don't know why," the fleet would be grounded immediately.

#AI

14/
If a pharmaceutical executive said, "We grew a new vaccine, it works, but we found weird chemical states inside the proteins that we don't understand," it would never pass a safety board.

Yet, in Silicon Valley, this failure of control is treated as a badge of honor—a sign that they are touching something "divine" or "greater than themselves."

In reality, it is a massive abdication of responsibility.

#AI

15/
They have prioritized speed and market dominance over fundamental understanding, deploying systems to billions of people while trying to figure out how they actually work on the fly.

It isn't just that the world is facing existential crises; it's that the people building the most powerful new technologies on Earth are steering the ship with their eyes half-closed, treating their own lack of control as a philosophical wonder rather than a massive systemic risk.

#AI

16/
It feels less like a beautiful milestone and a lot more like a group of sorcerer's apprentices who are completely fascinated by the magic spell they cast, right up until the water fills the room.

#AI

@appassionato

Feels a bit like "tickling the dragon's tail" (a reference to risky experiments conducted while developing the atomic bomb). We don't know what we have, we don't know what is happening - let's "tickle" the thing and see how it reacts.

I am almost sure that the restrictions that need to be built around that kind of experiment will soon "prove to be an obstacle to research" and "need to be re-defined" in order to "reap insights and benefits". Certainly when money is involved.

Hm.