Skip to content

Commit c8b017c

Browse files
Spark refactor (#554)
* Add live metrics streaming via Redis pub/sub and archive completed changes * Fix all 68 detekt violations and split DefaultK8sService into focused classes - Rename metrics events: SystemSnapshot→System, CassandraSnapshot→Cassandra, NodeMetrics→Node - Split DefaultK8sService (1138 lines) into 4 delegation classes following SRP: K8sClientProvider, DefaultK8sManifestOperations, DefaultK8sJobOperations, DefaultK8sNamespaceOperations, DefaultK8sStorageOperations - Extract extension functions: AWSIamExtensions, AWSS3Extensions, ClusterStateExtensions, ClusterS3PathExtensions, ClusterS3PathConfigExtensions - Introduce data classes to reduce parameter counts: EmrClusterProvisioningConfig, ProvisioningCallbacks, InfrastructureContext, SparkJobRequest, StressJobConfig, RequirementCheckDeps, BackupTargetResult - Extract helpers to reduce method length/complexity across McpServer, Up, K3sClusterService, EMRSparkService, OpenSearchStart, StatusCache, and others - Tune detekt config: ignorePrivate/ignoreOverridden for TooManyFunctions, ignoreDefaultParameters for LongParameterList - Fix EmptyFunctionBlock, MagicNumber, ComplexCondition, InstanceOfCheckForException, NestedBlockDepth, LoopWithTooManyJumpStatements violations * Simplify: consolidate HTTP_OK constant, fix redundant null-check, optimize metrics collection - Move duplicated HTTP_OK=200 from 3 Victoria services into Constants.HttpStatus.OK - Fix redundant ?. operator after guaranteed non-null assignment in McpServer - Convert per-call buildSectionExtractors() into reusable sectionSerializers property in StatusCache - Replace O(n×m) findValueForHost() linear search with O(1) indexByHost() map lookups in MetricsCollector * Structural improvements: shared K8sClientProvider, extract K8sPodUtils, consolidate delete pattern - Share single K8sClientProvider across all 4 K8s operation classes via Koin singleton - Extract checkForPodFailure into K8sPodUtils, removing circular dependency from DefaultK8sService - Replace 120 lines of copy-paste K8s resource deletion with generic deleteMatchingResources helper - Collect system metrics independently of db node presence in MetricsCollector - Replace KoinComponent service-locator with constructor injection in StressJobService * Rename "MCP Server" to "Server" across docs, specs, and code comments The server command provides more than just MCP — it includes REST status endpoints, background StatusCache, and optional MetricsCollector. Renamed all user-facing references from "MCP Server" to "Server" to reflect the full scope. Kotlin class names (McpServer, McpToolRegistry) and the mcp/ package stay unchanged since they specifically implement the MCP protocol. * Reorganize Spark modules under unified spark/ parent with shared config Move 4 scattered Spark submodules (spark-shared, bulk-writer, connector-writer, spark-connector-test1) into spark/ with nested Gradle structure. Split bulk-writer into bulk-writer-sidecar and bulk-writer-s3. Extract shared SparkJobConfig into spark/common for unified spark.easydblab.* configuration across all modules. Delete AbstractBulkWriter in favor of standalone main classes. Update all bin/ scripts, e2e tests, CI workflow, and documentation for new module paths. * Deduplicate Spark module config and fix compiler warnings Gradle: Extract shared deps and shadow config into parent spark/build.gradle.kts using scoped configure() blocks for bulk-writer-* and connector-* modules. Each subproject build file now only declares mainClass and archiveBaseName. Fail fast at compileJava if cassandra-analytics is not built. Java: Add helper methods to SparkJobConfig (generateTestData, buildBulkWriteOptions, configureCassandraConnector) and constants for write option keys, connector format, and connector config strings. All writers now use shared helpers instead of duplicated inline code. Fix 3 compiler warnings: redundant elvis operator in AWSResourceSetupService, redundant toString() calls on ClusterS3Path.toUri() in EMRSparkService. * Archive spark-refactor change and sync specs Archive completed spark-refactor change (37/37 tasks). Add new spark-modules spec (unified config, module organization, deployable JARs). Merge reorganized JAR path scenario into existing spark-emr spec. * Add Spark integration test harness, skip S3 bulk writer, patch storage_compatibility_mode - Add TestContainers-based SparkWriterIntegrationTest for local verification - Skip bulk-writer-s3 e2e step until DATA_TRANSPORT_EXTENSION_CLASS is implemented - Patch storage_compatibility_mode: NONE before starting Cassandra 5 * Archive rename-mcp-server-to-server change * Address code review: replace System.exit with exceptions, add unit tests, fail-fast on skipDdl - SparkJobConfig throws IllegalArgumentException instead of System.exit(1) - S3BulkWriter does the same for missing s3.bucket - Add SparkJobConfigTest covering defaults, parsing, and missing property errors - setupSchema validates table exists when skipDdl=true to fail fast - Add storage_compatibility_mode comment explaining why NONE is needed - Expand comment on unpublished cassandra-analytics JARs - Remove unused LocalStackContainer import * Address code review: remove reflection, add isRunningCassandra(), fix e2e test gaps - Remove step_bulk_writer_s3 from e2e step list (stub gives false confidence) - Replace reflection in MetricsCollectorTest with internal collect() method - Add ClusterState.isRunningCassandra() to replace fragile clickHouseConfig null check - Document detekt.yml threshold relaxations with triggering classes - Add JAR existence check to submit_spark_writer before submission - Deduplicate hostNames with .toSet() in collectSystemMetrics() - Update skipDdl docs to mention validation behavior * Verify spark writer row counts at LOCAL_QUORUM in e2e test - Parse COUNT(*) output and assert it matches expected 10000 rows - Use CONSISTENCY LOCAL_QUORUM for verification queries - Bump replicationFactor from 1 to 3 to support LOCAL_QUORUM reads * Address code review: strengthen test assertions, thread-safe MetricsCollector, portable grep - SparkSubmitTest: replace bare any() with argThat to verify clusterId, jarPath, and mainClass are correctly forwarded through execute() - MetricsCollector: add @synchronized to start()/stop() to prevent double-start from concurrent calls - bin/end-to-end-test: replace grep -oP (Linux PCRE) with portable grep -Eo - bin/end-to-end-test: add design doc reference to step_bulk_writer_s3 TODO * Widen CI artifact paths to capture all modules, add jacoco for spark - Test results: use ** globs to capture root + spark submodule JUnit XML - Quote dorny/test-reporter path to fix "No test report files found" error - Upload jacoco reports from spark modules alongside Kover coverage - Add jacoco plugin to spark subprojects for Java coverage reporting * Clean up tests: remove mock-echo test, fix container lifecycle, use project.root - Delete SparkSubmitTest `command validates required parameters` (only checked defaults) - SparkWriterIntegrationTest: make spark container @Container-managed so TestContainers handles full lifecycle, preventing leaks on setup failure - Replace fragile getParent().getParent() path resolution with Gradle project.root system property (with fallback for IDE runs) - Add dependsOn(:spark:connector-writer:shadowJar) so integration test JAR is always built before tests run * Use LOCAL_QUORUM via Java driver instead of cqlsh CONSISTENCY syntax - DefaultCqlSessionService: execute all queries at LOCAL_QUORUM via SimpleStatement.setConsistencyLevel() instead of default CL - Remove invalid cqlsh CONSISTENCY syntax from e2e test shell script (the cql command uses the Java driver, not cqlsh) * Formatting fix
1 parent 4b4622f commit c8b017c

File tree

202 files changed

+7714
-5582
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

202 files changed

+7714
-5582
lines changed

.claude/commands/test-workflow.md

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1,12 +1,12 @@
11
---
22
command: "/test-workflow"
33
category: "Testing & Demonstration"
4-
purpose: "End-to-end Cassandra cluster workflow demonstration using MCP servers"
4+
purpose: "End-to-end Cassandra cluster workflow demonstration using server tools"
55
---
66

77
# Cassandra Cluster Test Workflow
88

9-
Execute a complete end-to-end workflow demonstrating easy-db-lab capabilities using only MCP server tools.
9+
Execute a complete end-to-end workflow demonstrating easy-db-lab capabilities using only server tools.
1010

1111
## Workflow Steps
1212

@@ -38,7 +38,7 @@ Use `mcp__easy-db-lab__use` to set Cassandra version:
3838
Use `mcp__easy-db-lab__start` to start all services:
3939
- Starts Cassandra on database nodes
4040
- Starts cassandra-easy-stress on stress nodes
41-
- Starts monitoring and MCP servers on control nodes
41+
- Starts monitoring and servers on control nodes
4242
- Report when services are ready
4343

4444
### 5. Get Cassandra Host IP
@@ -49,8 +49,8 @@ Use `mcp__easy-db-lab__hosts` to retrieve host information:
4949
### 6. Wait for Cluster Readiness
5050
Wait ~30 seconds for Cassandra to fully initialize and be ready to accept connections.
5151

52-
### 7. Check MCP Server Status
53-
Use `mcp__easy-db-lab__get_server_status` to check if MCP servers are running:
52+
### 7. Check Server Status
53+
Use `mcp__easy-db-lab__get_server_status` to check if servers are running:
5454
- Verify easy-cass-mcp is accessible
5555
- Verify cassandra-easy-stress is accessible
5656
- If any are disconnected, report but continue (manual reconnection may be needed)

.github/workflows/pr-checks.yml

Lines changed: 6 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -27,25 +27,25 @@ jobs:
2727
cache-read-only: ${{ github.ref != 'refs/heads/main' }}
2828

2929
- name: Run tests
30-
# Exclude bulk-writer module - requires cassandra-analytics SNAPSHOTs from local Maven
31-
run: ./gradlew test koverXmlReport --no-daemon -x :bulk-writer:compileJava -x :bulk-writer:test
30+
# Exclude bulk-writer modules - require cassandra-analytics SNAPSHOTs from local Maven
31+
run: ./gradlew test koverXmlReport --no-daemon -x :spark:bulk-writer-sidecar:compileJava -x :spark:bulk-writer-sidecar:test -x :spark:bulk-writer-s3:compileJava -x :spark:bulk-writer-s3:test
3232

3333
- name: Upload test results
3434
if: always()
3535
uses: actions/upload-artifact@v6
3636
with:
3737
name: test-results
3838
path: |
39-
build/reports/tests/test/
40-
build/test-results/test/
39+
**/build/reports/tests/test/
40+
**/build/test-results/test/
4141
retention-days: 30
4242

4343
- name: Publish Test Results
4444
uses: dorny/test-reporter@v2
4545
if: always()
4646
with:
4747
name: Test Results
48-
path: build/test-results/test/*.xml
48+
path: '**/build/test-results/test/*.xml'
4949
reporter: java-junit
5050
fail-on-error: false
5151

@@ -66,6 +66,7 @@ jobs:
6666
name: coverage-reports
6767
path: |
6868
build/reports/kover/
69+
**/build/reports/jacoco/
6970
retention-days: 30
7071

7172
# Run linting and static analysis in parallel with tests

.gitignore

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -155,7 +155,8 @@ site/
155155

156156
# Cassandra Analytics build artifacts (see bin/build-cassandra-analytics)
157157
.cassandra-analytics/
158-
bulk-writer/libs/*.jar
158+
spark/bulk-writer-sidecar/libs/*.jar
159+
spark/bulk-writer-s3/libs/*.jar
159160
clusters/
160161

161162
# Claude Code config (mounted in devcontainer)

CLAUDE.md

Lines changed: 9 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -13,14 +13,17 @@ The project follows a layered architecture:
1313

1414
- **Commands (PicoCLI)** delegate to **Services**, which interact with **External Systems** (K8s, AWS, Filesystem)
1515
- Commands and Services emit events via the **EventBus**
16-
- **Listeners** consume events: `ConsoleEventListener` (stdout/stderr), `McpEventListener` (MCP server status), `RedisEventListener` (pub/sub, optional)
16+
- **Listeners** consume events: `ConsoleEventListener` (stdout/stderr), `McpEventListener` (server MCP status), `RedisEventListener` (pub/sub, optional)
1717

1818
### Project Modules
1919

2020
The Gradle project has multiple modules:
2121
- **Root module** (`:`) — the main CLI application
22-
- **`bulk-writer`** — Cassandra bulk writer (requires cassandra-analytics built with JDK 11)
23-
- **`spark-shared`** — shared Spark utilities
22+
- **`spark/common`** — shared Spark config (`SparkJobConfig`), data generation, CQL setup
23+
- **`spark/bulk-writer-sidecar`** — Cassandra Analytics bulk writer, direct sidecar transport (requires cassandra-analytics built with JDK 11)
24+
- **`spark/bulk-writer-s3`** — Cassandra Analytics bulk writer, S3 staging transport (requires cassandra-analytics built with JDK 11)
25+
- **`spark/connector-writer`** — Standard Spark Cassandra Connector writer
26+
- **`spark/connector-read-write`** — Read→transform→write example using Spark Cassandra Connector
2427

2528
### Layer Responsibilities
2629

@@ -40,10 +43,10 @@ See [`commands/CLAUDE.md`](src/main/kotlin/com/rustyrazorblade/easydblab/command
4043

4144
See [`providers/CLAUDE.md`](src/main/kotlin/com/rustyrazorblade/easydblab/providers/CLAUDE.md) for AWS/SSH/Docker patterns and retry logic.
4245

43-
### MCP Server & REPL
46+
### Server & REPL
4447

4548
Two commands run as long-lived processes instead of the typical run-and-exit pattern:
46-
- **`Server`** — starts an MCP server (Ktor + SSE) for AI agent integration. See [`mcp/CLAUDE.md`](src/main/kotlin/com/rustyrazorblade/easydblab/mcp/CLAUDE.md).
49+
- **`Server`** — starts a hybrid HTTP server with MCP protocol support (Ktor + SSE), REST status endpoints, and background services. See [`mcp/CLAUDE.md`](src/main/kotlin/com/rustyrazorblade/easydblab/mcp/CLAUDE.md).
4750
- **`Repl`** — starts an interactive REPL to reduce typing for repeated commands.
4851

4952
### Dependency Injection
@@ -297,6 +300,6 @@ Detailed patterns live in package-level CLAUDE.md files:
297300
- [`services/aws/CLAUDE.md`](src/main/kotlin/com/rustyrazorblade/easydblab/services/aws/CLAUDE.md) — AWS service classes (AMI, EC2, EMR, OpenSearch, S3)
298301
- [`providers/CLAUDE.md`](src/main/kotlin/com/rustyrazorblade/easydblab/providers/CLAUDE.md) — AWS SDK wrappers, SSH/Docker patterns, retry logic
299302
- [`configuration/CLAUDE.md`](src/main/kotlin/com/rustyrazorblade/easydblab/configuration/CLAUDE.md) — cluster state, templates, K8s manifest builders, observability stack details
300-
- [`mcp/CLAUDE.md`](src/main/kotlin/com/rustyrazorblade/easydblab/mcp/CLAUDE.md) — MCP server architecture
303+
- [`mcp/CLAUDE.md`](src/main/kotlin/com/rustyrazorblade/easydblab/mcp/CLAUDE.md)Server architecture (MCP, REST, background services)
301304
- [`kubernetes/CLAUDE.md`](src/main/kotlin/com/rustyrazorblade/easydblab/kubernetes/CLAUDE.md) — K8s client patterns
302305
- [`src/test/.../CLAUDE.md`](src/test/kotlin/com/rustyrazorblade/easydblab/CLAUDE.md) — test infrastructure

README.md

Lines changed: 10 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -28,7 +28,7 @@ A tool for creating database lab environments in AWS. Designed for testing, benc
2828

2929
### Developer Experience
3030

31-
- **MCP Server** - AI assistant integration for cluster management
31+
- **Server** - AI assistant integration, REST status endpoints, live metrics
3232
- **Interactive CLI** - REPL mode for reduced typing
3333
- **Homebrew and Docker** - Multiple installation options
3434

@@ -470,33 +470,33 @@ bcc-tools is a useful package of tools
470470

471471
https://rustyrazorblade.com/post/2023/2023-11-14-bcc-tools/
472472

473-
## MCP Server Integration
473+
## Server Integration
474474

475-
easy-db-lab includes a Model Context Protocol (MCP) server that enables AI assistants like Claude Code to interact directly with your database clusters.
475+
easy-db-lab includes a server mode that enables AI assistants like Claude Code to interact directly with your database clusters via MCP (Model Context Protocol), and provides REST status endpoints for programmatic access.
476476

477-
### Starting the MCP Server
477+
### Starting the Server
478478

479-
To start the MCP server:
479+
To start the server:
480480

481481
```shell
482482
easy-db-lab server --port 8888
483483
```
484484

485-
This starts the MCP server on port 8888 (you can use any available port).
485+
This starts the server on port 8888 (you can use any available port).
486486

487487
### Integrating with Claude Code
488488

489-
Once the MCP server is running, add it to Claude Code:
489+
Once the server is running, add it to Claude Code:
490490

491491
```shell
492492
claude mcp add --transport sse easy-db-lab http://127.0.0.1:8888/sse
493493
```
494494

495-
This establishes a Server-Sent Events (SSE) connection between Claude Code and your easy-db-lab MCP server.
495+
This establishes a Server-Sent Events (SSE) connection between Claude Code and the server.
496496

497497
### What You Can Do
498498

499-
With MCP integration, Claude Code can:
499+
With the server integration, Claude Code can:
500500

501501
* Manage and provision clusters directly
502502
* Configure and deploy Cassandra, ClickHouse, and OpenSearch
@@ -505,7 +505,7 @@ With MCP integration, Claude Code can:
505505
* Troubleshoot issues by analyzing logs and metrics
506506
* Automate complex multi-step cluster operations
507507

508-
For detailed documentation, see the [MCP Integration section in the user manual](https://rustyrazorblade.github.io/easy-db-lab/integrations/mcp-server/).
508+
For detailed documentation, see the [Server section in the user manual](https://rustyrazorblade.github.io/easy-db-lab/integrations/server/).
509509

510510
## Sanity Check Test
511511

bin/dev

Lines changed: 5 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -60,22 +60,15 @@ SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
6060
PROJECT_ROOT="$(cd "$SCRIPT_DIR/.." && pwd)"
6161
WORKSPACE_FOLDER="$PROJECT_ROOT"
6262

63-
# Build mount args for Claude credentials if ENABLE_CLAUDE=1
64-
CLAUDE_MOUNT_ARGS=()
65-
if [[ "${ENABLE_CLAUDE:-}" == "1" ]]; then
66-
if [[ -f "$HOME/.claude/.credentials.json" ]]; then
67-
CLAUDE_MOUNT_ARGS=(--mount "type=bind,source=$HOME/.claude/.credentials.json,target=/home/node/.claude/.credentials.json")
68-
else
69-
echo -e "${YELLOW}Warning: ~/.claude/.credentials.json not found. Claude auth will not persist.${NC}"
70-
fi
71-
fi
72-
7363
# Colors for output
7464
RED='\033[0;31m'
7565
GREEN='\033[0;32m'
7666
YELLOW='\033[0;33m'
7767
NC='\033[0m' # No Color
7868

69+
# Ensure Claude config directory exists (mounted by devcontainer.json)
70+
mkdir -p "$PROJECT_ROOT/.devcontainer/claude"
71+
7972
usage() {
8073
cat <<EOF
8174
Usage: $(basename "$0") <command> [args...]
@@ -101,12 +94,8 @@ Commands:
10194
clean Full cleanup (containers, volumes)
10295
help Show this help message
10396
104-
Environment Variables:
105-
ENABLE_CLAUDE=1 Mount ~/.claude config directory (for Claude Code users)
106-
10797
Examples:
10898
$(basename "$0") start
109-
ENABLE_CLAUDE=1 $(basename "$0") start
11099
$(basename "$0") claude
111100
$(basename "$0") shell
112101
$(basename "$0") test
@@ -156,7 +145,7 @@ get_compose_project() {
156145
cmd_start() {
157146
check_devcontainer
158147
info "Starting dev container..."
159-
devcontainer up --workspace-folder "$WORKSPACE_FOLDER" ${CLAUDE_MOUNT_ARGS[@]+"${CLAUDE_MOUNT_ARGS[@]}"}
148+
devcontainer up --workspace-folder "$WORKSPACE_FOLDER"
160149
}
161150

162151
cmd_stop() {
@@ -174,7 +163,7 @@ cmd_stop() {
174163
cmd_rebuild() {
175164
check_devcontainer
176165
info "Rebuilding dev container..."
177-
devcontainer up --workspace-folder "$WORKSPACE_FOLDER" --remove-existing-container ${CLAUDE_MOUNT_ARGS[@]+"${CLAUDE_MOUNT_ARGS[@]}"}
166+
devcontainer up --workspace-folder "$WORKSPACE_FOLDER" --remove-existing-container
178167
}
179168

180169
cmd_build() {

0 commit comments

Comments
 (0)