Skip to content

Commit beeb23e

Browse files
committed
Merge branch 'main' of https://github.com/pvliesdonk/mcp-devbench into copilot/add-authentication-feature
2 parents 71a610b + 12f06af commit beeb23e

18 files changed

+2880
-5
lines changed

README.md

Lines changed: 180 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -15,6 +15,11 @@ MCP DevBench is a Docker container management server that implements the Model C
1515
- **Configuration Management**: Environment-based configuration with Pydantic Settings
1616
- **Structured Logging**: JSON-formatted logging for production observability
1717
- **Docker Integration**: Secure Docker daemon communication with connection pooling
18+
- **Audit Logging**: Complete audit trail for all operations with sensitive data redaction
19+
- **Prometheus Metrics**: Built-in metrics collection for monitoring and alerting
20+
- **Admin Tools**: System health status, container/exec listing, garbage collection, and reconciliation
21+
- **Graceful Shutdown**: Drains active operations before shutdown
22+
- **Automatic Recovery**: Reconciles Docker state with database on startup
1823

1924
## Requirements
2025

@@ -228,7 +233,7 @@ src/mcp_devbench/
228233

229234
## Project Status
230235

231-
This project has completed **Epic 1: Foundation Layer**, **Epic 2: Command Execution Engine**, **Epic 3: Filesystem Operations**, **Epic 4: MCP Protocol Integration**, and **Epic 5: Image & Security Management**:
236+
This project has completed **Epic 1: Foundation Layer**, **Epic 2: Command Execution Engine**, **Epic 3: Filesystem Operations**, **Epic 4: MCP Protocol Integration**, **Epic 5: Image & Security Management**, and **Epic 6: State Management & Recovery**:
232237

233238
### Epic 1: Foundation Layer ✅
234239
- [x] Feature 1.1: Project Scaffold & Configuration
@@ -324,6 +329,55 @@ This project has completed **Epic 1: Foundation Layer**, **Epic 2: Command Execu
324329
- Workspace cleanup between uses
325330
- Configurable via MCP_WARM_POOL_ENABLED
326331

332+
### Epic 6: State Management & Recovery ✅
333+
- [x] Feature 6.1: Graceful Shutdown
334+
- ShutdownCoordinator for handling SIGTERM/SIGINT
335+
- Drains active operations with configurable grace period (MCP_DRAIN_GRACE_S)
336+
- Stops transient containers while preserving persistent ones
337+
- Ensures state is flushed to disk
338+
- Integrated into server lifespan
339+
340+
- [x] Feature 6.2: Boot Recovery & Reconciliation
341+
- ReconciliationManager for container discovery and adoption
342+
- Discovers containers with com.mcp.devbench label on startup
343+
- Adopts running containers not in database
344+
- Cleans up orphaned transient containers based on MCP_TRANSIENT_GC_DAYS
345+
- `reconcile` tool for manual reconciliation
346+
- Handles Docker daemon restarts gracefully
347+
348+
- [x] Feature 6.3: Background Maintenance
349+
- MaintenanceManager for periodic tasks
350+
- Hourly garbage collection of old transients
351+
- Cleanup of completed execs older than 24h
352+
- Periodic state sync with Docker
353+
- Database vacuuming for optimization
354+
- Health monitoring and metrics collection
355+
356+
### Epic 7: Observability & Operations ✅
357+
- [x] Feature 7.1: Structured Audit Logging
358+
- AuditLogger with JSON structured logging for all operations
359+
- Complete audit trail for container, exec, filesystem, security, and transfer events
360+
- Automatic sensitive data redaction (passwords, tokens, keys, secrets)
361+
- ISO8601 timestamps and correlation IDs
362+
- Configurable detail level
363+
- 17 unit tests covering audit functionality
364+
365+
- [x] Feature 7.2: Metrics & Monitoring
366+
- Prometheus metrics collection via MetricsCollector
367+
- Counter metrics: container_spawns_total, exec_total, fs_operations_total
368+
- Histogram metrics: exec_duration_seconds, output_bytes
369+
- Gauge metrics: active_containers, active_attachments, memory_usage_bytes
370+
- `metrics` tool to expose Prometheus-formatted metrics
371+
- 14 unit tests covering metrics collection
372+
373+
- [x] Feature 7.3: Debug & Admin Tools
374+
- `system_status` tool for overall system health
375+
- `list_containers` tool for detailed container information
376+
- `list_execs` tool for active execution listing
377+
- `garbage_collect` tool for manual cleanup
378+
- `reconcile` tool with audit logging (from Epic 6)
379+
- Docker connectivity and database status monitoring
380+
327381
### Current Status
328382
The project now has:
329383
- Full container lifecycle management with image policy enforcement
@@ -333,7 +387,13 @@ The project now has:
333387
- Image allow-list validation and resolution with digest pinning
334388
- Comprehensive security hardening (capability dropping, resource limits, audit logging)
335389
- Warm container pool for fast provisioning (<1s attach time)
336-
- 150 unit and integration tests passing (100% success rate)
390+
- Graceful shutdown with operation draining
391+
- Boot recovery and automatic reconciliation
392+
- Background maintenance and health monitoring
393+
- **Complete audit logging for all operations with sensitive data redaction**
394+
- **Prometheus metrics collection and exposure**
395+
- **Admin tools for system status, container/exec listing, and manual operations**
396+
- 201 unit and integration tests passing (100% success rate)
337397
- Comprehensive error handling and resource management
338398

339399
## MCP Tools Reference
@@ -502,6 +562,124 @@ List directory contents.
502562
- `path` (string): Listed directory
503563
- `entries` (array): File/directory entries with metadata
504564

565+
### Maintenance Tools
566+
567+
#### `reconcile`
568+
Run container reconciliation to sync Docker state with database.
569+
570+
This tool performs:
571+
- Discovery of containers with com.mcp.devbench label
572+
- Adoption of running containers not in database
573+
- Cleanup of stopped containers
574+
- Removal of orphaned transient containers
575+
- Cleanup of incomplete exec entries
576+
577+
**Input:** None
578+
579+
**Output:**
580+
- `discovered` (integer): Containers found with MCP label
581+
- `adopted` (integer): Containers added to database
582+
- `cleaned_up` (integer): Missing containers marked stopped
583+
- `orphaned` (integer): Old transients removed
584+
- `errors` (integer): Errors encountered
585+
586+
**Example:**
587+
```json
588+
{
589+
"discovered": 5,
590+
"adopted": 1,
591+
"cleaned_up": 2,
592+
"orphaned": 1,
593+
"errors": 0
594+
}
595+
```
596+
597+
### Observability & Admin Tools
598+
599+
#### `metrics`
600+
Get Prometheus metrics for monitoring.
601+
602+
Returns current metrics including:
603+
- Container spawn counts by image
604+
- Execution counts and durations
605+
- Filesystem operation counts
606+
- Active container and attachment gauges
607+
- Memory usage by container
608+
609+
**Input:** None
610+
611+
**Output:**
612+
- `metrics` (string): Prometheus-formatted metrics
613+
614+
**Example metrics output:**
615+
```
616+
# HELP mcp_devbench_container_spawns_total Total number of container spawns
617+
# TYPE mcp_devbench_container_spawns_total counter
618+
mcp_devbench_container_spawns_total{image="python:3.11"} 5.0
619+
# HELP mcp_devbench_exec_total Total number of command executions
620+
# TYPE mcp_devbench_exec_total counter
621+
mcp_devbench_exec_total{container_id="c_123",status="success"} 10.0
622+
# HELP mcp_devbench_active_containers Number of active containers
623+
# TYPE mcp_devbench_active_containers gauge
624+
mcp_devbench_active_containers 3.0
625+
```
626+
627+
#### `system_status`
628+
Get system health and status information.
629+
630+
**Input:** None
631+
632+
**Output:**
633+
- `status` (string): Overall system status (healthy, degraded)
634+
- `docker_connected` (boolean): Docker daemon connectivity
635+
- `database_initialized` (boolean): Database initialization status
636+
- `active_containers` (integer): Number of active containers
637+
- `active_attachments` (integer): Number of active client attachments
638+
- `version` (string): Server version
639+
640+
**Example:**
641+
```json
642+
{
643+
"status": "healthy",
644+
"docker_connected": true,
645+
"database_initialized": true,
646+
"active_containers": 3,
647+
"active_attachments": 2,
648+
"version": "0.1.0"
649+
}
650+
```
651+
652+
#### `garbage_collect`
653+
Trigger manual garbage collection.
654+
655+
Cleans up:
656+
- Orphaned transient containers
657+
- Old completed exec records (>24h)
658+
- Abandoned attachments
659+
660+
**Input:** None
661+
662+
**Output:**
663+
- `containers_removed` (integer): Number of containers removed
664+
- `execs_cleaned` (integer): Number of exec records cleaned
665+
- `attachments_cleaned` (integer): Number of attachments cleaned
666+
667+
#### `list_containers`
668+
List all containers with detailed information.
669+
670+
**Input:** None
671+
672+
**Output:**
673+
- `containers` (array): List of container objects with id, docker_id, alias, image, status, persistent, created_at, last_seen
674+
675+
#### `list_execs`
676+
List active command executions.
677+
678+
**Input:** None
679+
680+
**Output:**
681+
- `execs` (array): List of execution objects with exec_id, container_id, cmd, as_root, started_at, status
682+
505683
See [mcp-devbench-work-breakdown.md](mcp-devbench-work-breakdown.md) for the complete implementation roadmap.
506684

507685
## License

pyproject.toml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -16,6 +16,7 @@ dependencies = [
1616
"alembic>=1.13.0",
1717
"aiosqlite>=0.19.0",
1818
"python-json-logger>=2.0.7",
19+
"prometheus-client>=0.20.0",
1920
]
2021

2122
[project.optional-dependencies]

0 commit comments

Comments
 (0)