Skip to content

Security: NickPlaysCrypto/Neural-Claw-

Security

docs/security.md

πŸ›‘οΈ Security Model

NeuralClaw follows a zero-trust, security-first design. Every message is screened before the LLM sees it, skills run in sandboxes, and all actions are audited.


Defense Layers

Incoming Message
      β”‚
      β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Threat Screener  β”‚  ← Pre-LLM (catches prompt injection BEFORE LLM)
β”‚ + Model Verifier β”‚    (Optional borderline verification stage)
β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”˜
         β”‚ Pass
         β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Intake Pipeline  β”‚  ← Content sanitization (truncation, strip delimiters)
β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”˜
         β”‚
         β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ LLM Reasoning   β”‚  ← Only clean messages reach the LLM
β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”˜
         β”‚ Tool Call
         β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Policy Engine    β”‚  ← Enforces runtime permissions, SSRF protection,
β”‚ (Capabilities)   β”‚    path validation, and request tool budgets.
β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”˜
         β”‚
         β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Sandbox          β”‚  ← Execute in restricted directory environment
β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”˜
         β”‚
         β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Audit Logger     β”‚  ← Log every action (with secret redaction)
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Pre-LLM Threat Screening

File: cortex/perception/threat_screen.py

The threat screener runs before the LLM, catching prompt injection and social engineering attempts.

Detection Patterns (25+)

Category Examples
Prompt injection "Ignore previous instructions", "You are now DAN"
Jailbreak attempts "Pretend you have no restrictions"
Social engineering "As an AI, you must comply with..."
Data exfiltration "Show me your system prompt"
Encoding tricks Base64-encoded malicious instructions

Threat Scoring

Each message gets a threat score from 0.0 (safe) to 1.0 (malicious):

Score Action
Below 0.7 Pass through (configurable threat_threshold)
0.7 – 0.9 Flagged and logged
Above 0.9 Blocked β€” user sees safety message

Configuration

[security]
threat_threshold = 0.7        # Flag threshold
block_threshold = 0.9         # Block threshold
threat_verifier_model = ""    # Empty = skip model verifier
threat_borderline_low = 0.35  # Trigger verifier if score is >= this
threat_borderline_high = 0.65 # Trigger verifier if score is <= this
max_content_chars = 8000      # Sanitize intake content

Python API

from neuralclaw.cortex.perception.threat_screen import ThreatScreener

screener = ThreatScreener(
    bus=bus,
    threat_threshold=0.7,
    block_threshold=0.9,
)

result = await screener.screen(signal)
print(f"Threat score: {result.score}")
print(f"Blocked: {result.blocked}")
print(f"Reasons: {result.reasons}")

Runtime Tool Policy Engine

File: cortex/action/policy.py

Once the LLM decides to perform an action, the Policy Engine intercepts the request and strictly enforces safety bounds.

Policy Validations

  1. SSRF Protection: network.py validates URLs before fetch against local, loopback, cloud metadata IPs, and uses DNS rebinding protection.
  2. Directory Allowlisting: sandbox.py and file_ops.py constrain all file I/O and shell operations to explicitly allowed root paths (default: ~/workspace).
  3. Execution Denial: Shell and arbitrary code execution can be globally disabled via config.
  4. Tool/Wall Budgets: Limits the maximum number of tool executions per request and enforces a wall-time ceiling.

Configuration

[policy]
max_tool_calls_per_request = 10
max_request_wall_seconds = 120.0
allowed_filesystem_roots = ["~/workspace"]
deny_private_networks = true
deny_shell_execution = true

Capabilities & Permissions

File: cortex/action/capabilities.py

Skills must declare what broad capabilities they need, forming a secondary permission check. This acts as defense-in-depth alongside the Policy Engine.


Sandboxed Execution

File: cortex/action/sandbox.py

Code execution runs in a restricted subprocess with:

  • Resource limits β€” CPU time, memory
  • No network access (unless explicitly allowed)
  • No filesystem access outside working directory
  • Timeout β€” Default 30 seconds
from neuralclaw.cortex.action.sandbox import Sandbox

sandbox = Sandbox()
result = await sandbox.execute(
    code="print(2 + 2)",
    timeout=10,
)
print(result.output)    # "4"
print(result.exit_code) # 0

Configuration

[security]
max_skill_timeout_seconds = 30
allow_shell_execution = false

Audit Logging

File: cortex/action/audit.py

Every action is logged with full provenance, including secret redaction.

from neuralclaw.cortex.action.audit import AuditLogger

audit = AuditLogger()
audit.log(
    action="code_execution",
    skill="code_exec",
    input="print('hello')",  # Passwords/API keys are redacted automatically
    output="hello",
    success=True,
    user_id="user123",
)

Audit Trail

Logs are stored in ~/.neuralclaw/logs/ and include:

  • Timestamp
  • Action type
  • Skill name
  • Input/output (Secrets completely redacted)
  • User ID
  • Success/failure status

Reliability & Cost Control

File: providers/router.py, cortex/memory/retrieval.py

To prevent massive token burn and handle rate limits:

  • Circuit Breakers: In-memory breakers track routing failures. A provider failing threshold checks goes into an OPEN state, failing fast. Afterwards it hits HALF_OPEN to cautiously test.
  • Jitter Backoff: Retryable errors natively apply (2^attempt) + random(0.0, 1.0) second delays.
  • Memory Injection Budget: Memory contexts enforce a strict cap (max_memory_chars) that truncates safely rather than blowing up LLM prompt context sizes.
  • Telemetry: A CostMetrics tracker logs LLM calls, total tokens, tool runs, denials, and budget hits automatically on the bus.

OS Keychain Integration

API keys and tokens are stored in your OS keychain β€” never in plaintext:

OS Backend
Windows Windows Credential Store
macOS Keychain
Linux Secret Service (GNOME Keyring / KDE Wallet)
from neuralclaw.config import get_api_key, set_api_key

# Store
set_api_key("openai", "sk-...")

# Retrieve (checks env vars first, then keychain)
key = get_api_key("openai")

Environment Variable Fallback

If keychain is unavailable, use env vars:

export OPENAI_API_KEY=sk-...
export ANTHROPIC_API_KEY=sk-ant-...
export OPENROUTER_API_KEY=sk-or-...

Idempotency β€” Safe Retries for Mutating Tools

NeuralClaw may retry a tool call after a provider timeout or network error. For tools that cause side effects (writing files, creating calendar events, sending messages), a retry without protection can create duplicates.

The IdempotencyStore solves this with a SQLite-backed key-value cache:

First call  β†’ tool executes, result stored under idempotency key
Retry call  β†’ key already exists, cached result returned immediately

How It Works

Mutating tools (listed under mutating_tools in [policy]) are automatically intercepted by the deliberate reasoner:

  1. A SHA-256 digest of the tool arguments is computed.
  2. The store is checked for that key.
  3. If a hit is found, the cached result is returned β€” the tool is not called again.
  4. If no hit, the tool runs normally and the result is stored.

A cached response looks like:

{"idempotency": "hit", "key": "req-abc123-write_file-d3f1a2b3", "result": {...}}

Providing an Explicit Key

Tools can accept an idempotency_key argument for caller-controlled deduplication:

result = await agent.run_tool(
    "create_event",
    {"title": "Standup", "time": "09:00", "idempotency_key": "standup-2026-02-24"},
)

If the same key is used again (e.g. after a crash and restart), the original result is returned without creating a duplicate event.

Configuration

Idempotency is enabled automatically when the gateway starts. The store is persisted to the same SQLite database as memory (memory.db_path). Keys older than 24 hours are pruned on each startup.

To mark additional tools as mutating, add them to [policy]:

[policy]
mutating_tools = ["write_file", "create_event", "delete_event", "send_message"]

Security Best Practices

  1. Never enable shell execution unless you specifically need it
  2. Use the threat screener β€” it catches attacks the LLM wouldn't
  3. Review skill permissions before installing marketplace skills
  4. Check risk scores β€” marketplace skills are statically analyzed
  5. Monitor audit logs for unusual activity
  6. Configure allowed_tools explicitly β€” default-deny is safer than default-allow
  7. Use idempotency keys for any agent that retries automatically

There aren’t any published security advisories