Security Model for Control Plane Gateway

Overview

Security Architecture

SystemManager implements defense-in-depth security for control plane gateway deployments:

Network Layer: Tailscale ACLs control WHO can reach the gateway
Application Layer: Bearer tokens + scopes control WHAT they can do
Policy Gate: Capability-based authorization prevents "LLM imagination" risk
Target Registry: Explicit allowlisting of managed systems
Audit Layer: Comprehensive logging tracks WHO did WHAT WHERE

Critical: Gateway access ≠ target access. All operations are explicitly authorized.

Control Plane Gateway Trust Model

One Trusted Node Per Network Segment

The control plane gateway architecture reduces operational overhead by deploying one trusted node per network segment rather than agents on every target.

graph TD
    A[Threat: Compromised Gateway] -- Network Access --> B[Tailscale ACLs]
    B -- Application Access --> C[Bearer Tokens + Scopes]
    C -- Operation Authorization --> D[Policy Gate]
    D -- Target Validation --> E[Target Registry]
    E -- Capability Enforcement --> F[Execution Layer]
    F -- All Operations --> G[Audit Trail]

    H[Threat: LLM Imagination] -- Prevented by --> I[Capability Allowlisting]
    I -- Explicit Authorization --> J[Target Registry]

    K[Benefit: Reduced Blast Radius] -- Gateway Isolation --> L[Segment-Per-Gateway]
    L -- Target Separation --> M[Limited Impact]

Security Benefits of Control Plane Gateway

Reduced Blast Radius

Segment Isolation: Compromise affects only gateway, not all targets
Target Separation: Gateways manage specific target groups
Network Segmentation: Gateways deployed per network segment

Capability-Based Authorization

Explicit Allowlisting: Only explicitly configured operations are permitted
Parameter Validation: Operation parameters validated against constraints
LLM Imagination Prevention: Prevents AI from "imagining" unauthorized operations

Operational Security

Single Point of Control: Updates and configuration managed centrally
Audit Centralization: All operations logged through gateway
Secrets Management: Credentials stored only on gateway

Control Plane Gateway Security Assumptions

⚠️ CRITICAL: Gateway Isolation is MANDATORY

SystemManager gateways must be deployed in isolated environments (Proxmox LXC containers, dedicated VMs, or secure jump hosts).

✅ Safe: Gateway running in isolated LXC container with Tailscale
❌ UNSAFE: Gateway running on production target systems
❌ UNSAFE: Gateway with broad network access to all targets
❌ UNSAFE: Gateway without proper credential isolation

Gateway-to-Target Connectivity Security

SSH Target Security

✅ Safe: SSH keys stored securely on gateway with limited permissions
❌ UNSAFE: SSH keys with root access to all targets
❌ UNSAFE: SSH keys stored insecurely or shared across gateways

Docker Target Security

✅ Safe: Docker socket access with limited container capabilities
❌ UNSAFE: Docker socket with full root-equivalent access
❌ UNSAFE: Docker API tokens with broad permissions

HTTP Target Security

✅ Safe: API tokens with least-privilege scopes
❌ UNSAFE: API tokens with administrative permissions
❌ UNSAFE: API tokens stored in plaintext configuration

Policy Gate Security Controls

Capability Allowlisting

✅ Safe: Only explicitly configured capabilities are permitted
❌ UNSAFE: Broad capabilities that allow "LLM imagination" risk
❌ UNSAFE: Capabilities without parameter validation

Target Registry Validation

✅ Safe: Targets explicitly defined in registry with constraints
❌ UNSAFE: Dynamic target discovery without validation
❌ UNSAFE: Targets without capability restrictions

Multi-Gateway Security Benefits

Segment Isolation

Reduced Blast Radius: Compromise affects only gateway segment
Target Separation: Gateways manage specific target groups
Network Segmentation: Gateways deployed per network segment

Redundancy Security

Failover Capability: Multiple gateways can manage critical targets
Load Distribution: Operations spread across gateways
Maintenance Isolation: Update gateways without affecting all targets

Policy Gate & Capability-Based Authorization

Capability-Based Authorization Model

The Policy Gate enforces security through explicit capability allowlisting, preventing "LLM imagination" risk where AI assistants might attempt unauthorized operations.

Target Registry Capabilities

Each target in the registry defines explicit capabilities:

# Example target with limited capabilities
web-server-01:
  id: "web-server-01"
  capabilities:
    - "system:read"      # View system metrics
    - "network:read"     # Check network status
    - "container:read"   # List containers
  constraints:
    sudo_policy: "none"  # No sudo access

# Example target with broader capabilities
docker-host-01:
  id: "docker-host-01"
  capabilities:
    - "system:read"
    - "container:read"
    - "container:control"  # Start/stop containers
    - "stack:deploy"       # Deploy Docker stacks
  constraints:
    sudo_policy: "limited"  # Limited sudo access

Policy Gate Enforcement

# Policy Gate validates every operation
await policy_gate.authorize(
    operation="restart_container",
    target="docker-host-01",
    tier="control",
    parameters={"container": "nginx"}
)

# Authorization checks:
# 1. Target exists in registry
# 2. Target has required capability
# 3. Operation parameters are valid
# 4. User has appropriate scopes
# 5. Operation is within constraints

Security Benefits

LLM Imagination Prevention

Explicit Allowlisting: AI can only perform explicitly configured operations
Parameter Validation: Operation parameters validated against constraints
Target Isolation: Operations limited to specific targets

Defense in Depth

Network Layer: Tailscale ACLs control gateway access
Application Layer: Bearer tokens control user access
Policy Layer: Capability allowlisting controls operation access
Target Layer: Target-specific constraints limit impact

Tool Risk Levels with Policy Gate

Risk Level	Examples	Policy Gate Controls
low	get_system_status, list_containers	Target capability validation
moderate	ping_host, dns_lookup	Network access validation
high	manage_container, file_operations	Capability + parameter validation
critical	install_package, pull_docker_image	Multi-layer approval + validation

Configuration

Environment Variables

# Authentication
SYSTEMMANAGER_REQUIRE_AUTH=true          # Enforce bearer tokens
SYSTEMMANAGER_SHARED_SECRET=your-secret  # HMAC secret for tokens
SYSTEMMANAGER_JWT_SECRET=jwt-secret      # Or use JWT

# Approval System
SYSTEMMANAGER_ENABLE_APPROVAL=true       # Enable approval gates
SYSTEMMANAGER_APPROVAL_WEBHOOK=https://approval.example.com

# Audit Logging
SYSTEMMANAGER_AUDIT_LOG=/var/log/systemmanager/audit.log

Recommended Tailscale ACL

{
  "tagOwners": {
    "tag:systemmanager-server": ["group:ops"],
    "tag:systemmanager-client": ["group:ops", "group:automation"],
  },

  "acls": [
    // Only tagged clients can reach the MCP server
    {
      "action": "accept",
      "src": ["tag:systemmanager-client"],
      "dst": ["tag:systemmanager-server:8080"],
    },

    // Deny everything else
    {
      "action": "deny",
      "src": ["*"],
      "dst": ["tag:systemmanager-server:*"],
    },
  ],

  "services": {
    "svc:systemmanager-mcp": {
      "protocol": "tcp",
      "port": 8080,
      "tags": ["tag:systemmanager-server"],
      "allowedTags": ["tag:systemmanager-client"],
    },
  },
}

Token Generation

Create tokens with scopes using scripts/mint_token.py:

# Read-only token (safe for monitoring)
python scripts/mint_token.py \
  --agent "grafana-datasource" \
  --scopes "readonly" \
  --ttl 30d

# Admin token (use sparingly, short TTL)
python scripts/mint_token.py \
  --agent "ops-automation" \
  --scopes "admin" \
  --ttl 1h

# Custom scopes (principle of least privilege)
python scripts/mint_token.py \
  --agent "container-manager" \
  --scopes "container:read,container:write,network:read" \
  --ttl 7d

Audit Log Analysis

Example Audit Record

{
  "timestamp": "2025-11-15T20:30:00Z",
  "tool": "install_package",
  "subject": "ops-automation",
  "args": {"package_name": "nginx", "auto_approve": true},
  "result_status": "success",
  "scopes": ["admin"],
  "risk_level": "critical",
  "approved": true,
  "tailscale": {
    "tailscale_node": "dev1",
    "tailscale_user": "[email protected]",
    "tailscale_tags": ["tag:systemmanager-client"],
    "tailnet": "example.ts.net"
  }
}

Monitoring Queries

# Find all critical operations
jq 'select(.risk_level == "critical")' /var/log/systemmanager/audit.log

# Find operations by a specific Tailscale user
jq 'select(.tailscale.tailscale_user == "[email protected]")' audit.log

# Find failed authorization attempts (lateral movement detection)
jq 'select(.result_status == "error" and .error | contains("Insufficient"))' audit.log

# Find unapproved critical operations
jq 'select(.risk_level == "critical" and .approved == false)' audit.log

Integration with Tailscale Flow Logs

Combine application audit logs with Tailscale network logs:

Enable Tailscale flow logs in admin console
Correlate by timestamp + source IP + tailnet user
Detect anomalies:
- Network connection without successful auth
- Multiple auth failures from same node
- Unusual source node for specific tools

Control Plane Gateway Deployment Checklist

Gateway Deployment Security

Minimum Security (Development)

Deploy gateway in isolated LXC container
Configure Tailscale ACLs for gateway access
Define target registry with limited capabilities
Use readonly scope for monitoring tools
Configure target-specific constraints

Recommended Security (Production)

Maximum Security (Critical Infrastructure)

Target Registry Security

Capability Allowlisting Best Practices

Define explicit capabilities per target
Use least-privilege principle for capabilities
Validate operation parameters against constraints
Regularly review and update target capabilities
Test capability enforcement with dry-run mode

Target Connectivity Security

Store SSH keys securely with limited permissions
Use Docker socket access with container constraints
Configure API tokens with least-privilege scopes
Rotate target credentials regularly
Monitor target connectivity health

Threat Scenarios

1. Compromised Tailnet Node

Attack: Attacker compromises a node already in your tailnet

Mitigations:

Tailscale ACL denies access (must be tagged appropriately)
Bearer token required (attacker doesn't have it)
Audit log shows connection from unexpected node

2. Leaked Bearer Token

Attack: Token leaked via logs, config file, environment variable

Mitigations:

Token has limited scopes (principle of least privilege)
Token expires quickly (short TTL)
Audit log shows usage from unexpected Tailscale user/node
Approval required for critical operations

3. Insider Threat (Legitimate User)

Attack: Authorized user attempts malicious operation

Mitigations:

Scopes limit what token can do
Critical operations require approval
Audit log records who did what with Tailscale identity
Tailscale flow logs provide additional evidence

4. SSRF via http_request_test

Attack: Use http_request_test to scan internal network

Mitigations:

Tool requires network:diag scope
Tool marked as requires_approval=true
Implement URL allowlist in production
Audit log tracks all HTTP requests made

Incident Response

If you detect suspicious activity:

Immediate: Revoke compromised token (update SYSTEMMANAGER_SHARED_SECRET)
Immediate: Review recent audit logs for scope of compromise
Within 1 hour: Check Tailscale flow logs for lateral movement
Within 4 hours: Rotate all tokens
Within 24 hours: Review and strengthen ACLs
Post-mortem: Update monitoring/alerting based on lessons learned

Future Enhancements

Planned security improvements:

mTLS client certificates
Integration with external IdP (Okta, Azure AD)
Real-time alerting on critical operations
Webhook-based approval workflows
Path-based file access controls
Rate limiting per token
Token usage analytics
Automated token rotation

Security: mdlmarkham/TailOpsMCP

Security

docs/SECURITY.md

Security Model for Control Plane Gateway

Overview

Security Architecture

Control Plane Gateway Trust Model

One Trusted Node Per Network Segment

Security Benefits of Control Plane Gateway

Reduced Blast Radius

Capability-Based Authorization

Operational Security

Control Plane Gateway Security Assumptions

⚠️ CRITICAL: Gateway Isolation is MANDATORY

Gateway-to-Target Connectivity Security

SSH Target Security

Docker Target Security

HTTP Target Security

Policy Gate Security Controls

Capability Allowlisting

Target Registry Validation

Multi-Gateway Security Benefits

Segment Isolation

Redundancy Security

Policy Gate & Capability-Based Authorization

Capability-Based Authorization Model

Target Registry Capabilities

Policy Gate Enforcement

Security Benefits

LLM Imagination Prevention

Defense in Depth

Tool Risk Levels with Policy Gate

Configuration

Environment Variables

Recommended Tailscale ACL

Token Generation

Audit Log Analysis

Example Audit Record

Monitoring Queries

Integration with Tailscale Flow Logs

Control Plane Gateway Deployment Checklist

Gateway Deployment Security

Minimum Security (Development)

Recommended Security (Production)

Maximum Security (Critical Infrastructure)

Target Registry Security

Capability Allowlisting Best Practices

Target Connectivity Security

Threat Scenarios

1. Compromised Tailnet Node

2. Leaked Bearer Token

3. Insider Threat (Legitimate User)

4. SSRF via http_request_test

Incident Response

Future Enhancements

References

There aren’t any published security advisories