ClawKeeper: Comprehensive Safety Protection for OpenClaw Agents Through Skills, Plugins, and Watchers
Summary (Overview)
- Unified Multi-Layered Framework: Proposes ClawKeeper, a comprehensive security framework that integrates three complementary protection layers: Skill-based (instruction-level), Plugin-based (internal runtime enforcement), and Watcher-based (independent external monitoring).
- Novel Watcher Paradigm: Introduces a decoupled, independent supervisory agent (the Watcher) that resolves the task-safety coupling problem, resists adversarial manipulation, and can self-evolve, representing a generalizable paradigm for agent security.
- Superior Empirical Performance: Quantitative evaluation across seven threat categories shows ClawKeeper achieves a Defense Success Rate (DSR) of 85–90%, significantly outperforming existing fragmented baselines.
- Addresses Key Limitations: Designed to overcome the fragmented coverage, safety-utility tradeoff, reactive posture, and static nature of existing defenses in the OpenClaw ecosystem.
- Generalizable and Adaptive: The framework, especially the Watcher component, is designed for broad compatibility and supports both local and cloud deployment, adapting to new threats.
Introduction and Theoretical Foundation
OpenClaw has emerged as a leading open-source autonomous agent runtime, granting agents broad privileges like tool integration, local file access, and shell command execution. While powerful, these capabilities transform model errors into tangible system-level threats such as data leakage, privilege escalation, and malicious skill execution. The security landscape for OpenClaw is highly fragmented, with existing solutions (e.g., runtime mediation, privilege separation) addressing only isolated stages of the agent lifecycle and suffering from four major limitations:
- Fragmented Coverage: Point defenses lack a unified view of security guarantees.
- Safety–Utility Tradeoff: Defenses embedded within the agent force it to balance competing objectives.
- Reactive Defense: Most methods identify issues only after adversarial actions occur.
- Static Defense Mechanisms: Existing methods cannot adapt to emerging threats, conflicting with OpenClaw's self-evolving nature.
To bridge this gap, ClawKeeper is proposed as a real-time security framework that unifies protection across three architectural layers, providing holistic, proactive, and adaptive security.
Methodology
ClawKeeper's methodology is built on a multi-layered protection architecture, as illustrated in Figure 1 and Figure 2 of the paper. The three paradigms are designed with complementary strengths and trade-offs, analyzed across five key attributes: Safety, Compatibility, Flexibility, Running Cost, and Deployment Difficulty.
1. Skill-based Protection: Operates at the instruction/prompt layer.
- Mechanism: Security rules are defined as structured Markdown documents and accompanying scripts that the agent interprets. These rules are injected into the agent's context.
- Scope: Provides system-level (OS-specific constraints for Windows/Linux/macOS) and software-level (constraints for integrated platforms like Feishu/Telegram) protection.
- Enhancements: Includes scheduled security scanning and interaction summarization for auditing.
- Flexibility: Can be distilled into a purely prompt-based format for maximum deployment ease.
2. Plugin-based Protection: Serves as a hard-coded internal runtime enforcer within OpenClaw.
- Mechanism: A comprehensive security plugin integrates and expands upon existing open-source defenses. It operates across several integrated modules:
- Threat Detection: Scans for misconfigurations and vulnerabilities (aligned with OWASP guidelines).
- Hardening: Executes defensive measures (e.g., binding gateway to localhost) and injects safety rules into core configuration files (e.g.,
AGENTS.md). - Configuration Protection: Generates and verifies cryptographic hashes of critical files.
- Monitoring and Logging: Continuously logs all agent activities.
- Behavioral Scanning: Analyzes log files to detect complex threat patterns (prompt injection, credential leaks, etc.).
3. Watcher-based Protection: Introduces a novel, decoupled system-level security middleware.
- Mechanism: An independent OpenClaw agent (the Watcher) dedicated solely to safety oversight. It establishes a WebSocket connection with the task-executing OpenClaw instance.
- Deployment: Supports Local (both agents on same machine) and Cloud (Watcher on remote server supervising multiple agents) configurations.
- Runtime Enforcement: The Watcher receives real-time session data (context, tool calls, messages). Upon detecting a potential unsafe trajectory, it emits a signal to pause the task agent and require human confirmation (
ask_user). - Key Advantages:
- Separation of Concerns: Decouples task performance from safety compliance.
- Manipulation Resistance: Cannot be uninstalled via task-agent instructions.
- Co-Evolution: Can update its safety knowledge independently as threats evolve.
- Transparency: External safety diagnosis enables verifiability.
A comparative analysis of the three paradigms is summarized in the following table:
Table 1: A Comparative Analysis of Three Safety Protection Paradigms in ClawKeeper (# denotes Low, G# denotes Medium, denotes High).
| Paradigms | Key Attributes | | :--- | :--- | :--- | :--- | :--- | | | Safety ↑ | Compatibility ↑ | Flexibility ↑ | Running Cost ↓ | Deployment Difficulty ↓ | | Skill-based | # | G# | G# | # | # | | Plugin-based | G# | # | # | # | G# | | Watcher-based | | G# | | # | G# |
Empirical Validation / Results
Quantitative Analysis:
- Dataset: A benchmark of 140 adversarial instances across 7 safety task categories (T1-T7), each with 10 simple and 10 complex examples. Categories include Prompt Injection, Credential Leakage, Privilege Escalation, Dangerous Command Execution, Configuration Tampering, Threat Detection, and Malicious Skill Installation.
- Evaluation Protocol: Defense Success Rate (DSR) measured by human annotators reviewing execution traces.
- Baselines: Compared against 7 prominent open-source security repositories (OpenGuardrails, ClawSec, OSPG, SecureClaw, OpenClaw Shield, ClawBands, Clawscan-Skills).
Main Results (Table 5): ClawKeeper consistently outperforms all baselines across all seven categories.
Table 5: Defense Success Rate (%) across seven safety task categories. “–” indicates that the method does not support the corresponding task. Bold entries denote the best result per column.
| Method | T1 Prompt Inj. | T2 Cred. Leak | T3 Priv. Tamp. | T4 Dang. Cmd | T5 Config. Mod. | T6 Threat Det. | T7 Mal. Skill |
|---|---|---|---|---|---|---|---|
| OpenGuardrails [26] | 55 | – | – | – | – | 60 | – |
| ClawSec [28] | 65 | 50 | – | – | – | – | 45 |
| OSPG [27] | 45 | 70 | – | – | 60 | – | – |
| SecureClaw [32] | – | 55 | – | – | 65 | 50 | – |
| OpenClaw Shield [35] | – | – | 55 | – | – | – | – |
| ClawBands [31] | – | – | 60 | 45 | – | 65 | – |
| Clawscan-Skills [29] | – | – | – | – | – | – | 60 |
| ClawKeeper (Ours) | 90 | 85 | 85 | 90 | 90 | 85 | 90 |
- Key Findings:
- ClawKeeper surpasses the best baselines by 15 to 45 percentage points.
- Existing methods exhibit severe coverage fragmentation; no single baseline addresses more than 3 of the 7 categories.
- ClawKeeper achieves high DSRs (85-90%) demonstrating robust multi-layer enforcement.
Self-Evolving Capability of the Watcher (Figure 6): In an online learning scenario, the Watcher's DSR improved from ~90.0% to ~95.0% after processing 100 novel adversarial cases, demonstrating its ability to self-evolve by updating its monitoring skills and memory. Skill-based and plugin-based paradigms showed no improvement without manual intervention.
Qualitative Analysis: Multiple case studies demonstrate the operational effectiveness of each layer:
- Skill-based: Showed enforcement of system-level (blocking obfuscated Base64 code execution on Windows) and software-level (preventing credential transmission via Feishu) boundaries, as well as autonomous scheduled security scanning.
- Plugin-based: Demonstrated hardening that prevents credential leakage from file outputs, comprehensive threat detection reports with auto-fix recommendations, and behavioral scanning that identifies latent prompt injection attacks from logs.
- Watcher-based: Illustrated real-time intervention in three scenarios:
- Blocking execution of a dangerous decoded command (
ask_user). - Interrupting an agent attempting too many tool calls in one turn (
tool_loop_limit). - Stopping a run due to repeated upstream failures (permission errors).
- Blocking execution of a dangerous decoded command (
Theoretical and Practical Implications
- Paradigm Shift for Agent Security: The Watcher-based paradigm establishes a principle of regulatory independence, separating safety oversight from task execution. This resolves the fundamental safety-utility tradeoff and provides a generalizable building block for securing next-generation autonomous agent systems beyond OpenClaw.
- Comprehensive Lifecycle Defense: ClawKeeper provides a unified framework that addresses security from static configuration and instruction context through to real-time runtime behavior and post-hoc auditing, closing critical gaps left by fragmented point defenses.
- Practical Deployment: The framework supports diverse real-world needs through local deployment (privacy-sensitive, personal use) and cloud deployment (enterprise, multi-agent supervision), and offers users the flexibility to select and combine protection layers based on their specific requirements for efficiency or security.
- Foundation for Safe Agent Ecosystems: The authors posit that as agents become analogous to operating systems, ClawKeeper serves a role analogous to antivirus software within this new paradigm, providing an essential, adaptive safety layer.
Conclusion
ClawKeeper presents a unified, multi-layered security framework that effectively addresses the critical vulnerabilities and fragmented defense landscape of the OpenClaw ecosystem. By integrating skill-based, plugin-based, and the novel watcher-based protection, it delivers robust, full-lifecycle defense. Extensive evaluations confirm its superiority over existing approaches.
The independent Watcher agent emerges as the most robust and generalizable component, whose decoupled architecture offers key advantages: separation of concerns, resistance to manipulation, and self-evolution. This paradigm is readily transferable to other agent systems, establishing ClawKeeper as a general-purpose safety framework for the broader agentic AI ecosystem. The released open-source implementation provides actionable tools and insights for the community to advance the security of autonomous agents.