@@ -15,6 +15,11 @@ MCP DevBench is a Docker container management server that implements the Model C
1515- ** Configuration Management** : Environment-based configuration with Pydantic Settings
1616- ** Structured Logging** : JSON-formatted logging for production observability
1717- ** Docker Integration** : Secure Docker daemon communication with connection pooling
18+ - ** Audit Logging** : Complete audit trail for all operations with sensitive data redaction
19+ - ** Prometheus Metrics** : Built-in metrics collection for monitoring and alerting
20+ - ** Admin Tools** : System health status, container/exec listing, garbage collection, and reconciliation
21+ - ** Graceful Shutdown** : Drains active operations before shutdown
22+ - ** Automatic Recovery** : Reconciles Docker state with database on startup
1823
1924## Requirements
2025
@@ -228,7 +233,7 @@ src/mcp_devbench/
228233
229234## Project Status
230235
231- This project has completed ** Epic 1: Foundation Layer** , ** Epic 2: Command Execution Engine** , ** Epic 3: Filesystem Operations** , ** Epic 4: MCP Protocol Integration** , and ** Epic 5: Image & Security Management** :
236+ This project has completed ** Epic 1: Foundation Layer** , ** Epic 2: Command Execution Engine** , ** Epic 3: Filesystem Operations** , ** Epic 4: MCP Protocol Integration** , ** Epic 5: Image & Security Management** , and ** Epic 6: State Management & Recovery ** :
232237
233238### Epic 1: Foundation Layer ✅
234239- [x] Feature 1.1: Project Scaffold & Configuration
@@ -324,6 +329,55 @@ This project has completed **Epic 1: Foundation Layer**, **Epic 2: Command Execu
324329 - Workspace cleanup between uses
325330 - Configurable via MCP_WARM_POOL_ENABLED
326331
332+ ### Epic 6: State Management & Recovery ✅
333+ - [x] Feature 6.1: Graceful Shutdown
334+ - ShutdownCoordinator for handling SIGTERM/SIGINT
335+ - Drains active operations with configurable grace period (MCP_DRAIN_GRACE_S)
336+ - Stops transient containers while preserving persistent ones
337+ - Ensures state is flushed to disk
338+ - Integrated into server lifespan
339+
340+ - [x] Feature 6.2: Boot Recovery & Reconciliation
341+ - ReconciliationManager for container discovery and adoption
342+ - Discovers containers with com.mcp.devbench label on startup
343+ - Adopts running containers not in database
344+ - Cleans up orphaned transient containers based on MCP_TRANSIENT_GC_DAYS
345+ - ` reconcile ` tool for manual reconciliation
346+ - Handles Docker daemon restarts gracefully
347+
348+ - [x] Feature 6.3: Background Maintenance
349+ - MaintenanceManager for periodic tasks
350+ - Hourly garbage collection of old transients
351+ - Cleanup of completed execs older than 24h
352+ - Periodic state sync with Docker
353+ - Database vacuuming for optimization
354+ - Health monitoring and metrics collection
355+
356+ ### Epic 7: Observability & Operations ✅
357+ - [x] Feature 7.1: Structured Audit Logging
358+ - AuditLogger with JSON structured logging for all operations
359+ - Complete audit trail for container, exec, filesystem, security, and transfer events
360+ - Automatic sensitive data redaction (passwords, tokens, keys, secrets)
361+ - ISO8601 timestamps and correlation IDs
362+ - Configurable detail level
363+ - 17 unit tests covering audit functionality
364+
365+ - [x] Feature 7.2: Metrics & Monitoring
366+ - Prometheus metrics collection via MetricsCollector
367+ - Counter metrics: container_spawns_total, exec_total, fs_operations_total
368+ - Histogram metrics: exec_duration_seconds, output_bytes
369+ - Gauge metrics: active_containers, active_attachments, memory_usage_bytes
370+ - ` metrics ` tool to expose Prometheus-formatted metrics
371+ - 14 unit tests covering metrics collection
372+
373+ - [x] Feature 7.3: Debug & Admin Tools
374+ - ` system_status ` tool for overall system health
375+ - ` list_containers ` tool for detailed container information
376+ - ` list_execs ` tool for active execution listing
377+ - ` garbage_collect ` tool for manual cleanup
378+ - ` reconcile ` tool with audit logging (from Epic 6)
379+ - Docker connectivity and database status monitoring
380+
327381### Current Status
328382The project now has:
329383- Full container lifecycle management with image policy enforcement
@@ -333,7 +387,13 @@ The project now has:
333387- Image allow-list validation and resolution with digest pinning
334388- Comprehensive security hardening (capability dropping, resource limits, audit logging)
335389- Warm container pool for fast provisioning (<1s attach time)
336- - 150 unit and integration tests passing (100% success rate)
390+ - Graceful shutdown with operation draining
391+ - Boot recovery and automatic reconciliation
392+ - Background maintenance and health monitoring
393+ - ** Complete audit logging for all operations with sensitive data redaction**
394+ - ** Prometheus metrics collection and exposure**
395+ - ** Admin tools for system status, container/exec listing, and manual operations**
396+ - 201 unit and integration tests passing (100% success rate)
337397- Comprehensive error handling and resource management
338398
339399## MCP Tools Reference
@@ -502,6 +562,124 @@ List directory contents.
502562- ` path ` (string): Listed directory
503563- ` entries ` (array): File/directory entries with metadata
504564
565+ ### Maintenance Tools
566+
567+ #### ` reconcile `
568+ Run container reconciliation to sync Docker state with database.
569+
570+ This tool performs:
571+ - Discovery of containers with com.mcp.devbench label
572+ - Adoption of running containers not in database
573+ - Cleanup of stopped containers
574+ - Removal of orphaned transient containers
575+ - Cleanup of incomplete exec entries
576+
577+ ** Input:** None
578+
579+ ** Output:**
580+ - ` discovered ` (integer): Containers found with MCP label
581+ - ` adopted ` (integer): Containers added to database
582+ - ` cleaned_up ` (integer): Missing containers marked stopped
583+ - ` orphaned ` (integer): Old transients removed
584+ - ` errors ` (integer): Errors encountered
585+
586+ ** Example:**
587+ ``` json
588+ {
589+ "discovered" : 5 ,
590+ "adopted" : 1 ,
591+ "cleaned_up" : 2 ,
592+ "orphaned" : 1 ,
593+ "errors" : 0
594+ }
595+ ```
596+
597+ ### Observability & Admin Tools
598+
599+ #### ` metrics `
600+ Get Prometheus metrics for monitoring.
601+
602+ Returns current metrics including:
603+ - Container spawn counts by image
604+ - Execution counts and durations
605+ - Filesystem operation counts
606+ - Active container and attachment gauges
607+ - Memory usage by container
608+
609+ ** Input:** None
610+
611+ ** Output:**
612+ - ` metrics ` (string): Prometheus-formatted metrics
613+
614+ ** Example metrics output:**
615+ ```
616+ # HELP mcp_devbench_container_spawns_total Total number of container spawns
617+ # TYPE mcp_devbench_container_spawns_total counter
618+ mcp_devbench_container_spawns_total{image="python:3.11"} 5.0
619+ # HELP mcp_devbench_exec_total Total number of command executions
620+ # TYPE mcp_devbench_exec_total counter
621+ mcp_devbench_exec_total{container_id="c_123",status="success"} 10.0
622+ # HELP mcp_devbench_active_containers Number of active containers
623+ # TYPE mcp_devbench_active_containers gauge
624+ mcp_devbench_active_containers 3.0
625+ ```
626+
627+ #### ` system_status `
628+ Get system health and status information.
629+
630+ ** Input:** None
631+
632+ ** Output:**
633+ - ` status ` (string): Overall system status (healthy, degraded)
634+ - ` docker_connected ` (boolean): Docker daemon connectivity
635+ - ` database_initialized ` (boolean): Database initialization status
636+ - ` active_containers ` (integer): Number of active containers
637+ - ` active_attachments ` (integer): Number of active client attachments
638+ - ` version ` (string): Server version
639+
640+ ** Example:**
641+ ``` json
642+ {
643+ "status" : " healthy" ,
644+ "docker_connected" : true ,
645+ "database_initialized" : true ,
646+ "active_containers" : 3 ,
647+ "active_attachments" : 2 ,
648+ "version" : " 0.1.0"
649+ }
650+ ```
651+
652+ #### ` garbage_collect `
653+ Trigger manual garbage collection.
654+
655+ Cleans up:
656+ - Orphaned transient containers
657+ - Old completed exec records (>24h)
658+ - Abandoned attachments
659+
660+ ** Input:** None
661+
662+ ** Output:**
663+ - ` containers_removed ` (integer): Number of containers removed
664+ - ` execs_cleaned ` (integer): Number of exec records cleaned
665+ - ` attachments_cleaned ` (integer): Number of attachments cleaned
666+
667+ #### ` list_containers `
668+ List all containers with detailed information.
669+
670+ ** Input:** None
671+
672+ ** Output:**
673+ - ` containers ` (array): List of container objects with id, docker_id, alias, image, status, persistent, created_at, last_seen
674+
675+ #### ` list_execs `
676+ List active command executions.
677+
678+ ** Input:** None
679+
680+ ** Output:**
681+ - ` execs ` (array): List of execution objects with exec_id, container_id, cmd, as_root, started_at, status
682+
505683See [ mcp-devbench-work-breakdown.md] ( mcp-devbench-work-breakdown.md ) for the complete implementation roadmap.
506684
507685## License
0 commit comments