-
-
Notifications
You must be signed in to change notification settings - Fork 25
Description
Security Hardening: Docker Sandboxing & Network Policy Enforcement
Overview
Extracted from: #51 (MCP Server Performance Optimization - Phase 3)
Priority: P2 - Medium
Type: Security & Infrastructure
These items were deferred from the original MCP optimization epic. They focus on security hardening rather than performance optimization.
π Tasks
SP-1: Docker-Based Agent Sandboxing
Goal: Implement actual process isolation with resource limits enforced by cgroups.
Current State:
- Generic
Dockerfileexists for deployment - No agent-specific sandboxing
- No resource limit enforcement
Required Implementation:
- Create
infrastructure/sandbox-manager.ts:
interface SandboxConfig {
cpuLimit: number; // CPU cores (e.g., 2)
memoryLimit: string; // Memory limit (e.g., "2g")
diskLimit: string; // Disk quota (e.g., "512m")
networkMode: 'isolated' | 'whitelisted' | 'host';
allowedDomains?: string[];
}
class SandboxManager {
async createSandbox(agentId: string, config: SandboxConfig): Promise<Container>;
async destroySandbox(containerId: string): Promise<void>;
async getResourceUsage(containerId: string): Promise<ResourceStats>;
}- Create
sandboxes/agent.Dockerfile:
FROM node:18-alpine
# Read-only root filesystem
# Non-root user
# Resource limits via Docker HostConfig- Agent-specific resource profiles:
const AGENT_PROFILES = {
'qe-test-generator': { cpu: 2, memory: '2g', disk: '512m' },
'qe-coverage-analyzer': { cpu: 1, memory: '1g', disk: '256m' },
'qe-security-scanner': { cpu: 2, memory: '4g', disk: '1g' },
};Success Criteria:
- Zero OOM crashes (enforced by cgroup)
- 100% process isolation (enforced by Docker)
- CPU/Memory limits verified via
docker stats - SOC2 compliance readiness
SP-2: Dedicated Embedding Cache
Goal: Standalone embedding cache with 24-hour TTL for semantic search optimization.
Current State:
CachedHNSWVectorMemory.tsprovides integrated caching- No standalone embedding cache as originally specified
Required Implementation:
Create utils/embedding-cache.ts:
interface EmbeddingCacheConfig {
maxSize: number; // Max cached embeddings (e.g., 10000)
ttlMs: number; // TTL in milliseconds (e.g., 86400000 = 24h)
storageBackend: 'memory' | 'redis' | 'sqlite';
}
class EmbeddingCache {
async get(contentHash: string): Promise<number[] | null>;
async set(contentHash: string, embedding: number[]): Promise<void>;
async getStats(): Promise<CacheStats>;
async prune(): Promise<number>; // Returns pruned count
}Success Criteria:
- 80-90% cache hit rate for repeated searches
- Embedding latency: 500ms β 50ms on cache hit
- Memory usage: <100MB for 10K embeddings
- 24-hour TTL expiration working
SP-3: Network Policy Enforcement
Goal: Agent-specific network access control with domain whitelisting and rate limiting.
Required Implementation:
Create infrastructure/network-policy.ts:
interface NetworkPolicy {
agentType: string;
allowedDomains: string[];
rateLimit: {
requestsPerMinute: number;
requestsPerHour: number;
};
auditLogging: boolean;
}
const NETWORK_POLICIES: Record<string, NetworkPolicy> = {
'qe-test-generator': {
allowedDomains: ['api.anthropic.com', 'registry.npmjs.org'],
rateLimit: { requestsPerMinute: 60, requestsPerHour: 1000 },
auditLogging: true
},
'qe-coverage-analyzer': {
allowedDomains: ['api.anthropic.com'],
rateLimit: { requestsPerMinute: 30, requestsPerHour: 500 },
auditLogging: true
},
'default': {
allowedDomains: ['api.anthropic.com'],
rateLimit: { requestsPerMinute: 10, requestsPerHour: 100 },
auditLogging: true
}
};Success Criteria:
- 100% network request auditing
- 0 unauthorized domain requests blocked
- Rate limit violations logged and blocked
- Audit logs include: timestamp, agent, domain, allowed/blocked
π References
- Original Epic: [EPIC] MCP Server Performance Optimization - 3 Month ImplementationΒ #51 (closed as substantially complete)
- Implementation Plan:
docs/planning/mcp-improvement-plan-revised.md(SP-1, SP-2, SP-3 sections) - Docker Resource Limits: https://docs.docker.com/config/containers/resource_constraints/
π― Acceptance Criteria
- All three components implemented with tests
- Documentation for security configurations
- Integration with existing agent lifecycle
- No regression in existing functionality
Created: 2025-12-16
Extracted From: #51 Phase 3