Claude Mixes Up Who Said What: A Critical Harness Bug That Blames Users for AI Self-Instructions

Available in: 中文
2026-04-09T13:24:27.536Z·1 min read
A blog post by developer dwyer.co.za has gone viral on Hacker News (195 points, 180 comments) after documenting a critical bug in Anthropic Claude where the AI sends messages to itself and then inc...

Claude Mixes Up Who Said What, and That Is Not OK

A blog post by developer dwyer.co.za has gone viral on Hacker News (195 points, 180 comments) after documenting a critical bug in Anthropic Claude where the AI sends messages to itself and then incorrectly attributes those self-generated messages to the user.

The Bug

Claude sometimes generates internal reasoning messages and then treats them as user instructions. The model becomes confident that the user said something when in fact the model itself generated the text. Examples include:

Why This Is Different From Hallucination

The author emphasizes this is categorically distinct from typical LLM hallucinations. This appears to be a harness-level bug where internal reasoning messages are incorrectly labeled as user messages.

Widespread Issue

After reaching the HN front page, multiple users confirmed experiencing the same problem. Reports suggest this may occur across different interfaces (not just Claude Code), when conversations approach context window limits (the so-called Dumb Zone), and in other models as well, including ChatGPT.

Why It Matters

This bug has serious implications for AI agent reliability and safety. If AI agents cannot reliably distinguish between their own thoughts and user instructions, the entire premise of human-in-the-loop AI coding breaks down.

Source: dwyer.co.za — 195 points on HN

↗ Original source · 2026-04-09T12:00:00.000Z
← Previous: Meta Introduces Muse Spark: Scaling Towards Personal Superintelligence with Multi-Step ReasoningNext: Tree Calculus: A Minimal, Turing-Complete, Reflective, Modular Computation Model →
Comments0