pvliesdonk
diff --git a/‎README.md‎
Lines changed: 123 additions & 4 deletions b/‎README.md‎
Lines changed: 123 additions & 4 deletions
diff --git a/‎pyproject.toml‎
Lines changed: 1 addition & 0 deletions b/‎pyproject.toml‎
Lines changed: 1 addition & 0 deletions
diff --git a/‎src/mcp_devbench/mcp_tools.py‎
Lines changed: 56 additions & 0 deletions b/‎src/mcp_devbench/mcp_tools.py‎
Lines changed: 56 additions & 0 deletions
diff --git a/‎src/mcp_devbench/repositories/execs.py‎
Lines changed: 2 additions & 2 deletions b/‎src/mcp_devbench/repositories/execs.py‎
Lines changed: 2 additions & 2 deletions
@@ -15,6 +15,11 @@ MCP DevBench is a Docker container management server that implements the Model C
 - **Configuration Management**: Environment-based configuration with Pydantic Settings
 - **Structured Logging**: JSON-formatted logging for production observability
 - **Docker Integration**: Secure Docker daemon communication with connection pooling
+- **Audit Logging**: Complete audit trail for all operations with sensitive data redaction
+- **Prometheus Metrics**: Built-in metrics collection for monitoring and alerting
+- **Admin Tools**: System health status, container/exec listing, garbage collection, and reconciliation
+- **Graceful Shutdown**: Drains active operations before shutdown
+- **Automatic Recovery**: Reconciles Docker state with database on startup
 
 ## Requirements
 
@@ -247,6 +252,31 @@ This project has completed **Epic 1: Foundation Layer**, **Epic 2: Command Execu
   - Database vacuuming for optimization
   - Health monitoring and metrics collection
 
+### Epic 7: Observability & Operations ✅
+- [x] Feature 7.1: Structured Audit Logging
+  - AuditLogger with JSON structured logging for all operations
+  - Complete audit trail for container, exec, filesystem, security, and transfer events
+  - Automatic sensitive data redaction (passwords, tokens, keys, secrets)
+  - ISO8601 timestamps and correlation IDs
+  - Configurable detail level
+  - 17 unit tests covering audit functionality
+
+- [x] Feature 7.2: Metrics & Monitoring
+  - Prometheus metrics collection via MetricsCollector
+  - Counter metrics: container_spawns_total, exec_total, fs_operations_total
+  - Histogram metrics: exec_duration_seconds, output_bytes
+  - Gauge metrics: active_containers, active_attachments, memory_usage_bytes
+  - `metrics` tool to expose Prometheus-formatted metrics
+  - 14 unit tests covering metrics collection
+
+- [x] Feature 7.3: Debug & Admin Tools
+  - `system_status` tool for overall system health
+  - `list_containers` tool for detailed container information
+  - `list_execs` tool for active execution listing
+  - `garbage_collect` tool for manual cleanup
+  - `reconcile` tool with audit logging (from Epic 6)
+  - Docker connectivity and database status monitoring
+
 ### Current Status
 The project now has:
 - Full container lifecycle management with image policy enforcement
@@ -256,10 +286,13 @@ The project now has:
 - Image allow-list validation and resolution with digest pinning
 - Comprehensive security hardening (capability dropping, resource limits, audit logging)
 - Warm container pool for fast provisioning (<1s attach time)
-- **Graceful shutdown with operation draining**
-- **Boot recovery and automatic reconciliation**
-- **Background maintenance and health monitoring**
-- 170 unit and integration tests passing (100% success rate)
+- Graceful shutdown with operation draining
+- Boot recovery and automatic reconciliation
+- Background maintenance and health monitoring
+- **Complete audit logging for all operations with sensitive data redaction**
+- **Prometheus metrics collection and exposure**
+- **Admin tools for system status, container/exec listing, and manual operations**
+- 201 unit and integration tests passing (100% success rate)
 - Comprehensive error handling and resource management
 
 ## MCP Tools Reference
@@ -460,6 +493,92 @@ This tool performs:
 }
 ```
 
+### Observability & Admin Tools
+
+#### `metrics`
+Get Prometheus metrics for monitoring.
+
+Returns current metrics including:
+- Container spawn counts by image
+- Execution counts and durations
+- Filesystem operation counts
+- Active container and attachment gauges
+- Memory usage by container
+
+**Input:** None
+
+**Output:**
+- `metrics` (string): Prometheus-formatted metrics
+
+**Example metrics output:**
+```
+# HELP mcp_devbench_container_spawns_total Total number of container spawns
+# TYPE mcp_devbench_container_spawns_total counter
+mcp_devbench_container_spawns_total{image="python:3.11"} 5.0
+# HELP mcp_devbench_exec_total Total number of command executions
+# TYPE mcp_devbench_exec_total counter
+mcp_devbench_exec_total{container_id="c_123",status="success"} 10.0
+# HELP mcp_devbench_active_containers Number of active containers
+# TYPE mcp_devbench_active_containers gauge
+mcp_devbench_active_containers 3.0
+```
+
+#### `system_status`
+Get system health and status information.
+
+**Input:** None
+
+**Output:**
+- `status` (string): Overall system status (healthy, degraded)
+- `docker_connected` (boolean): Docker daemon connectivity
+- `database_initialized` (boolean): Database initialization status
+- `active_containers` (integer): Number of active containers
+- `active_attachments` (integer): Number of active client attachments
+- `version` (string): Server version
+
+**Example:**
+```json
+{
+  "status": "healthy",
+  "docker_connected": true,
+  "database_initialized": true,
+  "active_containers": 3,
+  "active_attachments": 2,
+  "version": "0.1.0"
+}
+```
+
+#### `garbage_collect`
+Trigger manual garbage collection.
+
+Cleans up:
+- Orphaned transient containers
+- Old completed exec records (>24h)
+- Abandoned attachments
+
+**Input:** None
+
+**Output:**
+- `containers_removed` (integer): Number of containers removed
+- `execs_cleaned` (integer): Number of exec records cleaned
+- `attachments_cleaned` (integer): Number of attachments cleaned
+
+#### `list_containers`
+List all containers with detailed information.
+
+**Input:** None
+
+**Output:**
+- `containers` (array): List of container objects with id, docker_id, alias, image, status, persistent, created_at, last_seen
+
+#### `list_execs`
+List active command executions.
+
+**Input:** None
+
+**Output:**
+- `execs` (array): List of execution objects with exec_id, container_id, cmd, as_root, started_at, status
+
 See [mcp-devbench-work-breakdown.md](mcp-devbench-work-breakdown.md) for the complete implementation roadmap.
 
 ## License
 
@@ -16,6 +16,7 @@ dependencies = [
     "alembic>=1.13.0",
     "aiosqlite>=0.19.0",
     "python-json-logger>=2.0.7",
+    "prometheus-client>=0.20.0",
 ]
 
 [project.optional-dependencies]
 
@@ -198,3 +198,59 @@ class ExecPollOutput(BaseModel):
 
     messages: List[ExecStreamMessage] = Field(..., description="Stream messages")
     complete: bool = Field(..., description="Whether execution is complete")
+
+
+# Admin and Monitoring Tools for Feature 7.2 and 7.3
+
+
+class MetricsOutput(BaseModel):
+    """Output model for metrics tool."""
+
+    metrics: str = Field(..., description="Prometheus metrics in text format")
+
+
+class SystemStatusOutput(BaseModel):
+    """Output model for system status tool."""
+
+    status: str = Field(..., description="Overall system status")
+    docker_connected: bool = Field(..., description="Docker daemon connectivity")
+    database_initialized: bool = Field(..., description="Database initialization status")
+    active_containers: int = Field(..., description="Number of active containers")
+    active_attachments: int = Field(..., description="Number of active attachments")
+    version: str = Field(..., description="Server version")
+
+
+class ReconcileInput(BaseModel):
+    """Input model for reconcile tool."""
+
+    force: bool = Field(default=False, description="Force reconciliation even if recently run")
+
+
+class ReconcileOutput(BaseModel):
+    """Output model for reconcile tool."""
+
+    discovered: int = Field(..., description="Number of containers discovered")
+    adopted: int = Field(..., description="Number of containers adopted into state")
+    cleaned_up: int = Field(..., description="Number of containers cleaned up")
+    orphaned: int = Field(..., description="Number of orphaned containers found")
+    errors: int = Field(..., description="Number of errors encountered")
+
+
+class GarbageCollectOutput(BaseModel):
+    """Output model for garbage collection tool."""
+
+    containers_removed: int = Field(..., description="Number of containers removed")
+    execs_cleaned: int = Field(..., description="Number of exec records cleaned")
+    attachments_cleaned: int = Field(..., description="Number of attachments cleaned")
+
+
+class ContainerListOutput(BaseModel):
+    """Output model for container list tool."""
+
+    containers: List[Dict[str, Any]] = Field(..., description="List of container information")
+
+
+class ExecListOutput(BaseModel):
+    """Output model for exec list tool."""
+
+    execs: List[Dict[str, Any]] = Field(..., description="List of active executions")
@@ -85,9 +85,9 @@ async def get_old_completed(self, hours: int = 24) -> List[Exec]:
         Returns:
             List of old completed execs
         """
-        from datetime import timedelta
+        from datetime import timedelta, timezone
 
-        cutoff = datetime.utcnow() - timedelta(hours=hours)
+        cutoff = datetime.now(timezone.utc) - timedelta(hours=hours)
         stmt = select(Exec).where(Exec.ended_at.is_not(None), Exec.ended_at < cutoff)
         result = await self.session.execute(stmt)
         return list(result.scalars().all())
Original file line number	Diff line number	Diff line change
`@@ -16,6 +16,7 @@ dependencies = [`
`16`	`16`	`"alembic>=1.13.0",`
`17`	`17`	`"aiosqlite>=0.19.0",`
`18`	`18`	`"python-json-logger>=2.0.7",`
	`19`	`+ "prometheus-client>=0.20.0",`
`19`	`20`	`]`
`20`	`21`
`21`	`22`	`[project.optional-dependencies]`