Security Hardening: Docker Sandboxing & Network Policy Enforcement

# Security Hardening: Docker Sandboxing & Network Policy Enforcement

## Overview

**Extracted from**: #51 (MCP Server Performance Optimization - Phase 3)
**Priority**: P2 - Medium
**Type**: Security & Infrastructure

These items were deferred from the original MCP optimization epic. They focus on security hardening rather than performance optimization.

---

## 📋 Tasks

### SP-1: Docker-Based Agent Sandboxing

**Goal**: Implement actual process isolation with resource limits enforced by cgroups.

**Current State**:
- Generic `Dockerfile` exists for deployment
- No agent-specific sandboxing
- No resource limit enforcement

**Required Implementation**:

1. Create `infrastructure/sandbox-manager.ts`:
```typescript
interface SandboxConfig {
  cpuLimit: number;      // CPU cores (e.g., 2)
  memoryLimit: string;   // Memory limit (e.g., "2g")
  diskLimit: string;     // Disk quota (e.g., "512m")
  networkMode: 'isolated' | 'whitelisted' | 'host';
  allowedDomains?: string[];
}

class SandboxManager {
  async createSandbox(agentId: string, config: SandboxConfig): Promise<Container>;
  async destroySandbox(containerId: string): Promise<void>;
  async getResourceUsage(containerId: string): Promise<ResourceStats>;
}
```

2. Create `sandboxes/agent.Dockerfile`:
```dockerfile
FROM node:18-alpine
# Read-only root filesystem
# Non-root user
# Resource limits via Docker HostConfig
```

3. Agent-specific resource profiles:
```typescript
const AGENT_PROFILES = {
  'qe-test-generator': { cpu: 2, memory: '2g', disk: '512m' },
  'qe-coverage-analyzer': { cpu: 1, memory: '1g', disk: '256m' },
  'qe-security-scanner': { cpu: 2, memory: '4g', disk: '1g' },
};
```

**Success Criteria**:
- [ ] Zero OOM crashes (enforced by cgroup)
- [ ] 100% process isolation (enforced by Docker)
- [ ] CPU/Memory limits verified via `docker stats`
- [ ] SOC2 compliance readiness

---

### SP-2: Dedicated Embedding Cache

**Goal**: Standalone embedding cache with 24-hour TTL for semantic search optimization.

**Current State**:
- `CachedHNSWVectorMemory.ts` provides integrated caching
- No standalone embedding cache as originally specified

**Required Implementation**:

Create `utils/embedding-cache.ts`:
```typescript
interface EmbeddingCacheConfig {
  maxSize: number;        // Max cached embeddings (e.g., 10000)
  ttlMs: number;          // TTL in milliseconds (e.g., 86400000 = 24h)
  storageBackend: 'memory' | 'redis' | 'sqlite';
}

class EmbeddingCache {
  async get(contentHash: string): Promise<number[] | null>;
  async set(contentHash: string, embedding: number[]): Promise<void>;
  async getStats(): Promise<CacheStats>;
  async prune(): Promise<number>; // Returns pruned count
}
```

**Success Criteria**:
- [ ] 80-90% cache hit rate for repeated searches
- [ ] Embedding latency: 500ms → 50ms on cache hit
- [ ] Memory usage: <100MB for 10K embeddings
- [ ] 24-hour TTL expiration working

---

### SP-3: Network Policy Enforcement

**Goal**: Agent-specific network access control with domain whitelisting and rate limiting.

**Required Implementation**:

Create `infrastructure/network-policy.ts`:
```typescript
interface NetworkPolicy {
  agentType: string;
  allowedDomains: string[];
  rateLimit: {
    requestsPerMinute: number;
    requestsPerHour: number;
  };
  auditLogging: boolean;
}

const NETWORK_POLICIES: Record<string, NetworkPolicy> = {
  'qe-test-generator': {
    allowedDomains: ['api.anthropic.com', 'registry.npmjs.org'],
    rateLimit: { requestsPerMinute: 60, requestsPerHour: 1000 },
    auditLogging: true
  },
  'qe-coverage-analyzer': {
    allowedDomains: ['api.anthropic.com'],
    rateLimit: { requestsPerMinute: 30, requestsPerHour: 500 },
    auditLogging: true
  },
  'default': {
    allowedDomains: ['api.anthropic.com'],
    rateLimit: { requestsPerMinute: 10, requestsPerHour: 100 },
    auditLogging: true
  }
};
```

**Success Criteria**:
- [ ] 100% network request auditing
- [ ] 0 unauthorized domain requests blocked
- [ ] Rate limit violations logged and blocked
- [ ] Audit logs include: timestamp, agent, domain, allowed/blocked

---

## 📚 References

- **Original Epic**: #51 (closed as substantially complete)
- **Implementation Plan**: `docs/planning/mcp-improvement-plan-revised.md` (SP-1, SP-2, SP-3 sections)
- **Docker Resource Limits**: https://docs.docker.com/config/containers/resource_constraints/

---

## 🎯 Acceptance Criteria

- [ ] All three components implemented with tests
- [ ] Documentation for security configurations
- [ ] Integration with existing agent lifecycle
- [ ] No regression in existing functionality

---

**Created**: 2025-12-16
**Extracted From**: #51 Phase 3

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Security Hardening: Docker Sandboxing & Network Policy Enforcement #146

Security Hardening: Docker Sandboxing & Network Policy Enforcement

Overview

📋 Tasks

SP-1: Docker-Based Agent Sandboxing

SP-2: Dedicated Embedding Cache

SP-3: Network Policy Enforcement

📚 References

🎯 Acceptance Criteria

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Uh oh!

Security Hardening: Docker Sandboxing & Network Policy Enforcement #146

Description

Security Hardening: Docker Sandboxing & Network Policy Enforcement

Overview

📋 Tasks

SP-1: Docker-Based Agent Sandboxing

SP-2: Dedicated Embedding Cache

SP-3: Network Policy Enforcement

📚 References

🎯 Acceptance Criteria

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions