rustyrazorblade
diff --git a/‎docs/integrations/server.md‎
Lines changed: 16 additions & 0 deletions b/‎docs/integrations/server.md‎
Lines changed: 16 additions & 0 deletions
diff --git a/‎openspec/changes/server-auto-shutdown-on-infra-removal/.openspec.yaml‎
Lines changed: 2 additions & 0 deletions b/‎openspec/changes/server-auto-shutdown-on-infra-removal/.openspec.yaml‎
Lines changed: 2 additions & 0 deletions
diff --git a/‎openspec/changes/server-auto-shutdown-on-infra-removal/design.md‎
Lines changed: 60 additions & 0 deletions b/‎openspec/changes/server-auto-shutdown-on-infra-removal/design.md‎
Lines changed: 60 additions & 0 deletions
diff --git a/‎openspec/changes/server-auto-shutdown-on-infra-removal/proposal.md‎
Lines changed: 27 additions & 0 deletions b/‎openspec/changes/server-auto-shutdown-on-infra-removal/proposal.md‎
Lines changed: 27 additions & 0 deletions
diff --git a/‎openspec/changes/server-auto-shutdown-on-infra-removal/specs/server-infra-watchdog/spec.md‎
Lines changed: 34 additions & 0 deletions b/‎openspec/changes/server-auto-shutdown-on-infra-removal/specs/server-infra-watchdog/spec.md‎
Lines changed: 34 additions & 0 deletions
diff --git a/‎openspec/changes/server-auto-shutdown-on-infra-removal/specs/server/spec.md‎
Lines changed: 13 additions & 0 deletions b/‎openspec/changes/server-auto-shutdown-on-infra-removal/specs/server/spec.md‎
Lines changed: 13 additions & 0 deletions
diff --git a/‎openspec/changes/server-auto-shutdown-on-infra-removal/tasks.md‎
Lines changed: 30 additions & 0 deletions b/‎openspec/changes/server-auto-shutdown-on-infra-removal/tasks.md‎
Lines changed: 30 additions & 0 deletions
diff --git a/‎src/main/kotlin/com/rustyrazorblade/easydblab/commands/Server.kt‎
Lines changed: 14 additions & 1 deletion b/‎src/main/kotlin/com/rustyrazorblade/easydblab/commands/Server.kt‎
Lines changed: 14 additions & 1 deletion
diff --git a/‎src/main/kotlin/com/rustyrazorblade/easydblab/commands/cassandra/Restart.kt‎
Lines changed: 1 addition & 1 deletion b/‎src/main/kotlin/com/rustyrazorblade/easydblab/commands/cassandra/Restart.kt‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎src/main/kotlin/com/rustyrazorblade/easydblab/commands/cassandra/Start.kt‎
Lines changed: 1 addition & 1 deletion b/‎src/main/kotlin/com/rustyrazorblade/easydblab/commands/cassandra/Start.kt‎
Lines changed: 1 addition & 1 deletion
@@ -236,6 +236,22 @@ Published every 5 seconds when the cluster is running Cassandra:
 | `compactionCompletedPerSec` | double | Compactions completed per second |
 | `compactionBytesWrittenPerSec` | double | Compaction write throughput (bytes/sec) |
 
+## Auto-Shutdown on Infrastructure Removal
+
+When running the server in unattended or automated scenarios, you can enable automatic shutdown if the cluster's AWS infrastructure is torn down:
+
+```bash
+easy-db-lab server --auto-shutdown
+```
+
+When `--auto-shutdown` is set, the server checks whether the cluster VPC still exists on each status refresh cycle (controlled by `--refresh`). If the VPC is no longer found, the server emits a shutdown event and exits cleanly with code 0.
+
+This is useful when:
+- Running the server alongside an automated test workflow that tears down infrastructure when done
+- Leaving the server running overnight and wanting it to stop automatically after `easy-db-lab down`
+
+**Note:** The check is skipped if no cluster state exists or the VPC name cannot be determined. AWS API errors during the check are logged and ignored — only a confirmed "VPC not found" result triggers shutdown.
+
 ## Notes
 
 - The server requires Docker to be installed
 
@@ -0,0 +1,2 @@
+schema: spec-driven
+created: 2026-03-26
@@ -0,0 +1,60 @@
+## Context
+
+The `server` command starts a long-lived process that exposes MCP and REST endpoints for cluster management. It is often started in automated or unattended workflows (e.g., running alongside an AI assistant session). When the underlying AWS infrastructure — specifically the VPC — is torn down (via `easy-db-lab down` or external deletion), the server process has no meaningful work to do but continues running indefinitely.
+
+The VPC is used as the sentinel because it is the root of the cluster's AWS infrastructure. If the VPC is gone, all associated EC2 instances, subnets, and security groups are also gone — the cluster no longer exists.
+
+The VPC ID for the current cluster is stored in `ClusterState` and accessible via `ClusterStateManager`.
+
+## Goals / Non-Goals
+
+**Goals:**
+- Provide an opt-in `--auto-shutdown` flag on the `server` command
+- Check whether the cluster's VPC still exists on each status refresh cycle (`--refresh`)
+- Emit a domain event and exit cleanly when the VPC is no longer found
+
+**Non-Goals:**
+- Automatic shutdown without the flag (opt-in only)
+- Monitoring resources other than the VPC (instances, subnets, etc.)
+- Reacting to partial infrastructure removal (only full VPC deletion triggers shutdown)
+- Persistent state or restart behavior after shutdown
+
+## Decisions
+
+### Decision 1: Integrate VPC check into `StatusCache` refresh cycle
+
+The VPC existence check runs inside `StatusCache` on each refresh, reusing the existing `--refresh` interval. When `autoShutdown` is enabled and the VPC is not found, `StatusCache` emits `Event.Server.InfrastructureGone` and calls `exitProcess(0)`. The `Server` command passes `autoShutdown` and the cluster VPC name directly to `StatusCache`.
+
+**Alternative considered**: A separate background service on its own timer. Rejected — `StatusCache` already polls AWS on a schedule; a second thread for VPC checks is redundant overhead with no benefit.
+
+### Decision 2: Opt-in via `--auto-shutdown` flag, not default behavior
+
+Auto-shutdown should not happen unexpectedly during interactive use. An explicit flag makes the intent clear and prevents surprises.
+
+**Alternative considered**: Always-on with a `--no-auto-shutdown` flag. Rejected — fail-fast-on-default would surprise users who run the server interactively.
+
+### Decision 3: Use `VpcDiscoveryOperations.findVpcByName()` as the existence check
+
+`findVpcByName()` returns `null` when the VPC doesn't exist. This is already implemented in `VpcService` / `VpcInfrastructure`. The cluster's VPC name is derivable from `ClusterState`.
+
+**Alternative considered**: Describe VPC by ID. Also valid, but the name-based lookup already has the right semantics (null = not found) and is used elsewhere.
+
+### Decision 4: Single consecutive miss triggers shutdown (no retry dampening)
+
+If the VPC is gone, it is gone. VPC deletion in AWS is not transient. There is no need to wait for N consecutive failures before shutting down.
+
+**Alternative considered**: Require 2–3 consecutive misses. Rejected — AWS VPC existence checks are reliable; the complexity of dampening is not warranted.
+
+### Decision 5: Emit a domain event, then exit the JVM
+
+The watchdog emits `Event.Server.InfrastructureGone` and then calls `exitProcess(0)`. This ensures MCP clients and REST consumers see a structured event before the process ends.
+
+## Risks / Trade-offs
+
+- **False positive on transient AWS API error** → The check should distinguish "not found" (null) from an AWS API exception. API exceptions should be logged and the watchdog should continue polling rather than triggering shutdown. Only a confirmed null result (VPC not found) causes shutdown.
+- **VPC ID not present in ClusterState** → If the cluster was provisioned before VPC tracking was introduced, `ClusterState.vpcId` may be null. The watchdog should skip polling and log a warning rather than crashing or shutting down.
+- **No clean shutdown hook** → The `exitProcess(0)` approach bypasses Ktor's graceful shutdown. This is acceptable for an infra-gone scenario since the cluster is already gone. A future improvement could use a coroutine cancellation signal instead.
+
+## Open Questions
+
+- Should the watchdog also check instance existence as a secondary signal, or is VPC-only sufficient? (Current proposal: VPC-only)
@@ -0,0 +1,27 @@
+## Why
+
+When `easy-db-lab server` is running and the underlying AWS infrastructure is torn down (e.g., the VPC is deleted), the server process continues running indefinitely with no meaningful work to do. This wastes resources and leaves the user with a stale, misleading server process — especially important in automated or unattended scenarios where no human is watching the terminal.
+
+## What Changes
+
+- Add an optional `--auto-shutdown` flag to the `server` command that enables infrastructure watchdog behavior.
+- When enabled, a background service checks whether the cluster's VPC still exists in AWS on each status refresh cycle.
+- If the VPC is no longer found, the server logs a final shutdown event and exits cleanly.
+
+## Capabilities
+
+### New Capabilities
+
+- `server-infra-watchdog`: Background watchdog that monitors AWS infrastructure health (VPC existence) while the server is running and triggers a clean shutdown if the infrastructure is gone.
+
+### Modified Capabilities
+
+- `server`: The `server` command gains a new `--auto-shutdown` flag and a new background service lifecycle hook.
+
+## Impact
+
+- `commands/Server.kt` — new CLI options
+- New background service class (e.g., `InfraWatchdogService`) in `services/` or similar
+- AWS VPC existence check via existing EC2 provider
+- `events/Event.kt` — new domain event for watchdog shutdown
+- `openspec/specs/server/spec.md` — new requirements for auto-shutdown behavior
@@ -0,0 +1,34 @@
+# Server Infrastructure Watchdog
+
+A background service that monitors whether the cluster's AWS VPC still exists while the server is running, and triggers a clean shutdown if the infrastructure has been removed.
+
+## Requirements
+
+### Requirement: Watchdog monitors VPC existence
+The watchdog service SHALL periodically check whether the cluster's VPC still exists in AWS and trigger server shutdown when it is no longer found.
+
+#### Scenario: VPC is present
+- **WHEN** the watchdog polls AWS and the cluster VPC is found
+- **THEN** the server continues running normally
+
+#### Scenario: VPC is gone
+- **WHEN** the watchdog polls AWS and the cluster VPC is not found
+- **THEN** the server emits an `Event.Server.InfrastructureGone` event and exits cleanly
+
+#### Scenario: AWS API error during check
+- **WHEN** the watchdog poll encounters an AWS API exception (not a not-found result)
+- **THEN** the exception is logged, the watchdog continues polling on the next interval, and the server does not shut down
+
+### Requirement: VPC ID not available
+The watchdog SHALL gracefully handle the case where no VPC ID is recorded in cluster state.
+
+#### Scenario: No VPC ID in cluster state
+- **WHEN** the watchdog starts and `ClusterState.vpcId` is null or blank
+- **THEN** the watchdog logs a warning and skips all polling without triggering shutdown
+
+### Requirement: Check runs on status refresh cycle
+The VPC existence check SHALL run on the same cadence as the existing status cache refresh (`--refresh` interval).
+
+#### Scenario: Check frequency matches refresh
+- **WHEN** the server is running with `--auto-shutdown` and `--refresh 30`
+- **THEN** the VPC check runs every 30 seconds alongside the status cache refresh
@@ -0,0 +1,13 @@
+## ADDED Requirements
+
+### Requirement: Auto-shutdown CLI option
+The server command SHALL accept an `--auto-shutdown` flag that enables infrastructure watchdog behavior.
+
+#### Scenario: Flag not provided
+- **WHEN** the user starts the server without `--auto-shutdown`
+- **THEN** no watchdog is started and the server runs indefinitely
+
+#### Scenario: Flag provided
+- **WHEN** the user starts the server with `--auto-shutdown`
+- **THEN** the infrastructure watchdog service is started as a background service
+
@@ -0,0 +1,30 @@
+## 1. Events
+
+- [x] 1.1 Add `Event.Server` sealed interface to `Event.kt` with `InfrastructureGone` data class (includes vpcId and a message field)
+
+## 2. StatusCache Integration
+
+- [x] 2.1 Add `autoShutdown: Boolean` and `vpcName: String?` parameters to `StatusCache`
+- [x] 2.2 On each refresh, if `autoShutdown` is true and `vpcName` is non-null, call `findVpcByName()`
+- [x] 2.3 If VPC is not found (null result), emit `Event.Server.InfrastructureGone` and call `exitProcess(0)`
+- [x] 2.4 Handle AWS API exceptions by logging and continuing (no shutdown on exception)
+- [x] 2.5 Skip the check with a logged warning if `vpcName` is null or blank
+
+## 3. Server Command
+
+- [x] 3.1 Add `--auto-shutdown` boolean flag to `Server.kt`
+- [x] 3.2 Pass `autoShutdown` and the cluster VPC name through to `StatusCache`
+
+## 4. Console Output
+
+- [x] 4.1 Add a `ConsoleEventListener` handler for `Event.Server.InfrastructureGone` that prints a message before exit
+
+## 5. Tests
+
+- [x] 5.1 Unit test `StatusCache`: verify it emits `InfrastructureGone` and calls `exitProcess` when VPC not found
+- [x] 5.2 Unit test: verify no shutdown when `findVpcByName` throws an exception
+- [x] 5.3 Unit test: verify check is skipped when `vpcName` is null or blank
+
+## 6. Documentation
+
+- [x] 6.1 Update `docs/` server reference page to document the `--auto-shutdown` option
@@ -41,6 +41,12 @@ class Server : PicoBaseCommand() {
     )
     var refreshInterval: Long = Constants.Time.DEFAULT_STATUS_REFRESH_SECONDS
 
+    @Option(
+        names = ["--auto-shutdown"],
+        description = ["Shut down the server if the cluster VPC is removed"],
+    )
+    var autoShutdown: Boolean = false
+
     companion object {
         private val log = KotlinLogging.logger {}
     }
@@ -62,8 +68,15 @@ class Server : PicoBaseCommand() {
 
         log.info { "Starting easy-db-lab server..." }
 
+        val vpcName =
+            if (autoShutdown && clusterStateManager.exists()) {
+                "easy-db-lab-${clusterStateManager.load().name}"
+            } else {
+                null
+            }
+
         try {
-            val server = McpServer(refreshInterval)
+            val server = McpServer(refreshInterval, autoShutdown, vpcName)
             server.start(port, bind) { actualPort ->
                 // Generate .mcp.json with actual port
                 val config =
 
@@ -48,7 +48,7 @@ class Restart : PicoBaseCommand() {
         sidecarService
             .restart(controlHost)
             .onFailure { e ->
-                eventBus.emit(Event.Cassandra.SidecarRestartFailed("", "${e.message}"))
+                eventBus.emit(Event.Cassandra.SidecarRestartFailed("${e.message}"))
             }
         eventBus.emit(Event.Cassandra.SidecarRestarted)
     }
 
@@ -66,7 +66,7 @@ class Start : PicoBaseCommand() {
             sidecarService
                 .deploy(controlHost, sidecarImage)
                 .onFailure { e ->
-                    eventBus.emit(Event.Cassandra.SidecarStartFailed("", "${e.message}"))
+                    eventBus.emit(Event.Cassandra.SidecarStartFailed("${e.message}"))
                 }
         }
Original file line number	Diff line number	Diff line change
`@@ -0,0 +1,2 @@`
	`1`	`+schema: spec-driven`
	`2`	`+created: 2026-03-26`
Original file line number	Diff line number	Diff line change
`@@ -48,7 +48,7 @@ class Restart : PicoBaseCommand() {`
`48`	`48`	`sidecarService`
`49`	`49`	`.restart(controlHost)`
`50`	`50`	`.onFailure { e ->`
`51`		`- eventBus.emit(Event.Cassandra.SidecarRestartFailed("", "${e.message}"))`
	`51`	`+ eventBus.emit(Event.Cassandra.SidecarRestartFailed("${e.message}"))`
`52`	`52`	`}`
`53`	`53`	`eventBus.emit(Event.Cassandra.SidecarRestarted)`
`54`	`54`	`}`
Original file line number	Diff line number	Diff line change
`@@ -66,7 +66,7 @@ class Start : PicoBaseCommand() {`
`66`	`66`	`sidecarService`
`67`	`67`	`.deploy(controlHost, sidecarImage)`
`68`	`68`	`.onFailure { e ->`
`69`		`- eventBus.emit(Event.Cassandra.SidecarStartFailed("", "${e.message}"))`
	`69`	`+ eventBus.emit(Event.Cassandra.SidecarStartFailed("${e.message}"))`
`70`	`70`	`}`
`71`	`71`	`}`
`72`	`72`