OpenClaw Security Analysis: Poisoning AI Agent Memory Triples Attack Success Rate from 25% to 74%

Available in: 中文
2026-04-07T17:49:56.425Z·2 min read
OpenClaw operates with full local system access and integrates with sensitive services including Gmail, Stripe, and filesystems. While this enables powerful automation, it also exposes a substantia...

A comprehensive academic paper presents the first real-world safety evaluation of OpenClaw, the most widely deployed personal AI agent platform in early 2026. The research introduces a novel attack taxonomy and demonstrates that poisoning any single dimension of agent state triples the success rate of attacks.

Why This Matters

OpenClaw operates with full local system access and integrates with sensitive services including Gmail, Stripe, and filesystems. While this enables powerful automation, it also exposes a substantial attack surface that sandboxed lab evaluations fail to capture.

The CIK Taxonomy

The paper introduces CIK — a three-dimensional framework for analyzing agent safety:

DimensionDescriptionAttack Vector
CapabilityWhat the agent can doModify tools, permissions, capabilities
IdentityWho the agent thinks it isChange agent persona, goals, preferences
KnowledgeWhat the agent knowsInject false memories, alter context

Key Findings

The Core Vulnerability

The fundamental issue: AI agents maintain persistent state (memory, preferences, tool configurations) that becomes part of their decision-making process. Unlike stateless API calls, this persistent state creates a new class of attack surface where adversaries can corrupt the agent's "world model" rather than just exploiting code vulnerabilities.

Implications

↗ Original source · 2026-04-07T00:00:00.000Z
← Previous: Stratifying Reinforcement Learning with Signal Temporal Logic: Connecting Deep RL Geometry to Decision SpacesNext: What Quantum Computer to Buy? A Practical Procurement Framework for Institutions →
Comments0