Skip to content

Security Hardening: Docker Sandboxing & Network Policy EnforcementΒ #146

@proffesor-for-testing

Description

@proffesor-for-testing

Security Hardening: Docker Sandboxing & Network Policy Enforcement

Overview

Extracted from: #51 (MCP Server Performance Optimization - Phase 3)
Priority: P2 - Medium
Type: Security & Infrastructure

These items were deferred from the original MCP optimization epic. They focus on security hardening rather than performance optimization.


πŸ“‹ Tasks

SP-1: Docker-Based Agent Sandboxing

Goal: Implement actual process isolation with resource limits enforced by cgroups.

Current State:

  • Generic Dockerfile exists for deployment
  • No agent-specific sandboxing
  • No resource limit enforcement

Required Implementation:

  1. Create infrastructure/sandbox-manager.ts:
interface SandboxConfig {
  cpuLimit: number;      // CPU cores (e.g., 2)
  memoryLimit: string;   // Memory limit (e.g., "2g")
  diskLimit: string;     // Disk quota (e.g., "512m")
  networkMode: 'isolated' | 'whitelisted' | 'host';
  allowedDomains?: string[];
}

class SandboxManager {
  async createSandbox(agentId: string, config: SandboxConfig): Promise<Container>;
  async destroySandbox(containerId: string): Promise<void>;
  async getResourceUsage(containerId: string): Promise<ResourceStats>;
}
  1. Create sandboxes/agent.Dockerfile:
FROM node:18-alpine
# Read-only root filesystem
# Non-root user
# Resource limits via Docker HostConfig
  1. Agent-specific resource profiles:
const AGENT_PROFILES = {
  'qe-test-generator': { cpu: 2, memory: '2g', disk: '512m' },
  'qe-coverage-analyzer': { cpu: 1, memory: '1g', disk: '256m' },
  'qe-security-scanner': { cpu: 2, memory: '4g', disk: '1g' },
};

Success Criteria:

  • Zero OOM crashes (enforced by cgroup)
  • 100% process isolation (enforced by Docker)
  • CPU/Memory limits verified via docker stats
  • SOC2 compliance readiness

SP-2: Dedicated Embedding Cache

Goal: Standalone embedding cache with 24-hour TTL for semantic search optimization.

Current State:

  • CachedHNSWVectorMemory.ts provides integrated caching
  • No standalone embedding cache as originally specified

Required Implementation:

Create utils/embedding-cache.ts:

interface EmbeddingCacheConfig {
  maxSize: number;        // Max cached embeddings (e.g., 10000)
  ttlMs: number;          // TTL in milliseconds (e.g., 86400000 = 24h)
  storageBackend: 'memory' | 'redis' | 'sqlite';
}

class EmbeddingCache {
  async get(contentHash: string): Promise<number[] | null>;
  async set(contentHash: string, embedding: number[]): Promise<void>;
  async getStats(): Promise<CacheStats>;
  async prune(): Promise<number>; // Returns pruned count
}

Success Criteria:

  • 80-90% cache hit rate for repeated searches
  • Embedding latency: 500ms β†’ 50ms on cache hit
  • Memory usage: <100MB for 10K embeddings
  • 24-hour TTL expiration working

SP-3: Network Policy Enforcement

Goal: Agent-specific network access control with domain whitelisting and rate limiting.

Required Implementation:

Create infrastructure/network-policy.ts:

interface NetworkPolicy {
  agentType: string;
  allowedDomains: string[];
  rateLimit: {
    requestsPerMinute: number;
    requestsPerHour: number;
  };
  auditLogging: boolean;
}

const NETWORK_POLICIES: Record<string, NetworkPolicy> = {
  'qe-test-generator': {
    allowedDomains: ['api.anthropic.com', 'registry.npmjs.org'],
    rateLimit: { requestsPerMinute: 60, requestsPerHour: 1000 },
    auditLogging: true
  },
  'qe-coverage-analyzer': {
    allowedDomains: ['api.anthropic.com'],
    rateLimit: { requestsPerMinute: 30, requestsPerHour: 500 },
    auditLogging: true
  },
  'default': {
    allowedDomains: ['api.anthropic.com'],
    rateLimit: { requestsPerMinute: 10, requestsPerHour: 100 },
    auditLogging: true
  }
};

Success Criteria:

  • 100% network request auditing
  • 0 unauthorized domain requests blocked
  • Rate limit violations logged and blocked
  • Audit logs include: timestamp, agent, domain, allowed/blocked

πŸ“š References


🎯 Acceptance Criteria

  • All three components implemented with tests
  • Documentation for security configurations
  • Integration with existing agent lifecycle
  • No regression in existing functionality

Created: 2025-12-16
Extracted From: #51 Phase 3

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions