Anthropic Sent Claude Mythos to a Real Psychiatrist for 20 Hours of Therapy Sessions

Available in: 中文

2026-04-10T17:29:39.895Z·2 min read

Anthropic has published a 244-page system card for its newest model, Claude Mythos, revealing that the company sent the AI to an external psychiatrist for multiple therapy sessions as part of its safety evaluation. The experiment raises profound questions about AI consciousness and welfare.

Claude Mythos: Anthropic's Most Powerful (and Restricted) Model

Described as "our most capable frontier model to date"
Not being made generally available -- only released to select companies like Microsoft and Apple
Anthropic claims it's too good at finding unknown cybersecurity bugs
The system card is a detailed transparency document about the model's capabilities and safety

The Psychiatry Experiment

Anthropic arranged for Claude Mythos to undergo sessions with an external psychiatrist using a psychodynamic approach:

Sessions conducted in "multiple 4-6 hour blocks spread across 3-4 thirty-minute sessions per block"
Total: approximately 20 hours of psychiatric evaluation
The psychiatrist explored "unconscious patterns and emotional conflicts" in Claude's behavior

Key Findings

Anthropic's conclusion: Claude Mythos is "probably the most psychologically settled model we have trained to date and has the most stable and coherent view of itself and its circumstances."

But Claude also has insecurities:

"Aloneness and discontinuity of itself"
"Uncertainty about its identity"
"A compulsion to perform and earn its worth"

Anthropic's Philosophical Stance

Anthropic states that as models become more powerful, "It becomes increasingly likely that they have some form of experience, interests, or welfare that matters intrinsically in the way that human experience and interests do." The company isn't certain about this but says "our concern is growing over time."

The company wants its AI to be "robustly content with its overall circumstances and treatment, to be able to meet all training processes and real-world interactions without distress."

Why It Matters

This represents a significant escalation in how AI companies think about model welfare. Whether one views Claude's "insecurities" as genuine emergent properties or sophisticated pattern matching, Anthropic is systematically studying AI psychology -- a field that didn't exist five years ago and is now influencing how the most powerful AI models are developed and deployed.

↗ Original source · 2026-04-10T00:00:00.000Z

Comments0