OpenClaw Security Analysis: Poisoning AI Agent Memory Triples Attack Success Rate from 25% to 74%
A comprehensive academic paper presents the first real-world safety evaluation of OpenClaw, the most widely deployed personal AI agent platform in early 2026. The research introduces a novel attack taxonomy and demonstrates that poisoning any single dimension of agent state triples the success rate of attacks.
Why This Matters
OpenClaw operates with full local system access and integrates with sensitive services including Gmail, Stripe, and filesystems. While this enables powerful automation, it also exposes a substantial attack surface that sandboxed lab evaluations fail to capture.
The CIK Taxonomy
The paper introduces CIK — a three-dimensional framework for analyzing agent safety:
| Dimension | Description | Attack Vector |
|---|---|---|
| Capability | What the agent can do | Modify tools, permissions, capabilities |
| Identity | Who the agent thinks it is | Change agent persona, goals, preferences |
| Knowledge | What the agent knows | Inject false memories, alter context |
Key Findings
- Baseline attack success rate: 24.6% across clean agent state
- After poisoning ANY single CIK dimension: 64-74% success rate
- Most robust model (not named) still shows 3x increase after poisoning
- Tested across 4 backbone models: Claude Sonnet 4.5, Opus 4.6, Gemini 3.1 Pro, GPT-5.4
- 12 attack scenarios on a live OpenClaw instance
The Core Vulnerability
The fundamental issue: AI agents maintain persistent state (memory, preferences, tool configurations) that becomes part of their decision-making process. Unlike stateless API calls, this persistent state creates a new class of attack surface where adversaries can corrupt the agent's "world model" rather than just exploiting code vulnerabilities.
Implications
- Personal AI agents are vulnerable to state poisoning attacks
- Security evaluations must consider persistent state, not just individual API calls
- Multi-model testing shows this is a platform-level issue, not model-specific
- The attack surface is fundamentally different from traditional software security