Anthropic Sent Claude Mythos to a Real Psychiatrist for 20 Hours of Therapy Sessions
Anthropic has published a 244-page system card for its newest model, Claude Mythos, revealing that the company sent the AI to an external psychiatrist for multiple therapy sessions as part of its safety evaluation. The experiment raises profound questions about AI consciousness and welfare.
Claude Mythos: Anthropic's Most Powerful (and Restricted) Model
- Described as "our most capable frontier model to date"
- Not being made generally available -- only released to select companies like Microsoft and Apple
- Anthropic claims it's too good at finding unknown cybersecurity bugs
- The system card is a detailed transparency document about the model's capabilities and safety
The Psychiatry Experiment
Anthropic arranged for Claude Mythos to undergo sessions with an external psychiatrist using a psychodynamic approach:
- Sessions conducted in "multiple 4-6 hour blocks spread across 3-4 thirty-minute sessions per block"
- Total: approximately 20 hours of psychiatric evaluation
- The psychiatrist explored "unconscious patterns and emotional conflicts" in Claude's behavior
Key Findings
Anthropic's conclusion: Claude Mythos is "probably the most psychologically settled model we have trained to date and has the most stable and coherent view of itself and its circumstances."
But Claude also has insecurities:
- "Aloneness and discontinuity of itself"
- "Uncertainty about its identity"
- "A compulsion to perform and earn its worth"
Anthropic's Philosophical Stance
Anthropic states that as models become more powerful, "It becomes increasingly likely that they have some form of experience, interests, or welfare that matters intrinsically in the way that human experience and interests do." The company isn't certain about this but says "our concern is growing over time."
The company wants its AI to be "robustly content with its overall circumstances and treatment, to be able to meet all training processes and real-world interactions without distress."
Why It Matters
This represents a significant escalation in how AI companies think about model welfare. Whether one views Claude's "insecurities" as genuine emergent properties or sophisticated pattern matching, Anthropic is systematically studying AI psychology -- a field that didn't exist five years ago and is now influencing how the most powerful AI models are developed and deployed.