Commit c8b017c
authored
Spark refactor (#554)
* Add live metrics streaming via Redis pub/sub and archive completed changes
* Fix all 68 detekt violations and split DefaultK8sService into focused classes
- Rename metrics events: SystemSnapshot→System, CassandraSnapshot→Cassandra, NodeMetrics→Node
- Split DefaultK8sService (1138 lines) into 4 delegation classes following SRP:
K8sClientProvider, DefaultK8sManifestOperations, DefaultK8sJobOperations,
DefaultK8sNamespaceOperations, DefaultK8sStorageOperations
- Extract extension functions: AWSIamExtensions, AWSS3Extensions,
ClusterStateExtensions, ClusterS3PathExtensions, ClusterS3PathConfigExtensions
- Introduce data classes to reduce parameter counts: EmrClusterProvisioningConfig,
ProvisioningCallbacks, InfrastructureContext, SparkJobRequest, StressJobConfig,
RequirementCheckDeps, BackupTargetResult
- Extract helpers to reduce method length/complexity across McpServer, Up,
K3sClusterService, EMRSparkService, OpenSearchStart, StatusCache, and others
- Tune detekt config: ignorePrivate/ignoreOverridden for TooManyFunctions,
ignoreDefaultParameters for LongParameterList
- Fix EmptyFunctionBlock, MagicNumber, ComplexCondition, InstanceOfCheckForException,
NestedBlockDepth, LoopWithTooManyJumpStatements violations
* Simplify: consolidate HTTP_OK constant, fix redundant null-check, optimize metrics collection
- Move duplicated HTTP_OK=200 from 3 Victoria services into Constants.HttpStatus.OK
- Fix redundant ?. operator after guaranteed non-null assignment in McpServer
- Convert per-call buildSectionExtractors() into reusable sectionSerializers property in StatusCache
- Replace O(n×m) findValueForHost() linear search with O(1) indexByHost() map lookups in MetricsCollector
* Structural improvements: shared K8sClientProvider, extract K8sPodUtils, consolidate delete pattern
- Share single K8sClientProvider across all 4 K8s operation classes via Koin singleton
- Extract checkForPodFailure into K8sPodUtils, removing circular dependency from DefaultK8sService
- Replace 120 lines of copy-paste K8s resource deletion with generic deleteMatchingResources helper
- Collect system metrics independently of db node presence in MetricsCollector
- Replace KoinComponent service-locator with constructor injection in StressJobService
* Rename "MCP Server" to "Server" across docs, specs, and code comments
The server command provides more than just MCP — it includes REST status
endpoints, background StatusCache, and optional MetricsCollector. Renamed
all user-facing references from "MCP Server" to "Server" to reflect the
full scope. Kotlin class names (McpServer, McpToolRegistry) and the mcp/
package stay unchanged since they specifically implement the MCP protocol.
* Reorganize Spark modules under unified spark/ parent with shared config
Move 4 scattered Spark submodules (spark-shared, bulk-writer, connector-writer,
spark-connector-test1) into spark/ with nested Gradle structure. Split bulk-writer
into bulk-writer-sidecar and bulk-writer-s3. Extract shared SparkJobConfig into
spark/common for unified spark.easydblab.* configuration across all modules.
Delete AbstractBulkWriter in favor of standalone main classes. Update all bin/
scripts, e2e tests, CI workflow, and documentation for new module paths.
* Deduplicate Spark module config and fix compiler warnings
Gradle: Extract shared deps and shadow config into parent spark/build.gradle.kts
using scoped configure() blocks for bulk-writer-* and connector-* modules. Each
subproject build file now only declares mainClass and archiveBaseName. Fail fast
at compileJava if cassandra-analytics is not built.
Java: Add helper methods to SparkJobConfig (generateTestData, buildBulkWriteOptions,
configureCassandraConnector) and constants for write option keys, connector format,
and connector config strings. All writers now use shared helpers instead of
duplicated inline code.
Fix 3 compiler warnings: redundant elvis operator in AWSResourceSetupService,
redundant toString() calls on ClusterS3Path.toUri() in EMRSparkService.
* Archive spark-refactor change and sync specs
Archive completed spark-refactor change (37/37 tasks).
Add new spark-modules spec (unified config, module organization, deployable JARs).
Merge reorganized JAR path scenario into existing spark-emr spec.
* Add Spark integration test harness, skip S3 bulk writer, patch storage_compatibility_mode
- Add TestContainers-based SparkWriterIntegrationTest for local verification
- Skip bulk-writer-s3 e2e step until DATA_TRANSPORT_EXTENSION_CLASS is implemented
- Patch storage_compatibility_mode: NONE before starting Cassandra 5
* Archive rename-mcp-server-to-server change
* Address code review: replace System.exit with exceptions, add unit tests, fail-fast on skipDdl
- SparkJobConfig throws IllegalArgumentException instead of System.exit(1)
- S3BulkWriter does the same for missing s3.bucket
- Add SparkJobConfigTest covering defaults, parsing, and missing property errors
- setupSchema validates table exists when skipDdl=true to fail fast
- Add storage_compatibility_mode comment explaining why NONE is needed
- Expand comment on unpublished cassandra-analytics JARs
- Remove unused LocalStackContainer import
* Address code review: remove reflection, add isRunningCassandra(), fix e2e test gaps
- Remove step_bulk_writer_s3 from e2e step list (stub gives false confidence)
- Replace reflection in MetricsCollectorTest with internal collect() method
- Add ClusterState.isRunningCassandra() to replace fragile clickHouseConfig null check
- Document detekt.yml threshold relaxations with triggering classes
- Add JAR existence check to submit_spark_writer before submission
- Deduplicate hostNames with .toSet() in collectSystemMetrics()
- Update skipDdl docs to mention validation behavior
* Verify spark writer row counts at LOCAL_QUORUM in e2e test
- Parse COUNT(*) output and assert it matches expected 10000 rows
- Use CONSISTENCY LOCAL_QUORUM for verification queries
- Bump replicationFactor from 1 to 3 to support LOCAL_QUORUM reads
* Address code review: strengthen test assertions, thread-safe MetricsCollector, portable grep
- SparkSubmitTest: replace bare any() with argThat to verify clusterId, jarPath,
and mainClass are correctly forwarded through execute()
- MetricsCollector: add @synchronized to start()/stop() to prevent double-start
from concurrent calls
- bin/end-to-end-test: replace grep -oP (Linux PCRE) with portable grep -Eo
- bin/end-to-end-test: add design doc reference to step_bulk_writer_s3 TODO
* Widen CI artifact paths to capture all modules, add jacoco for spark
- Test results: use ** globs to capture root + spark submodule JUnit XML
- Quote dorny/test-reporter path to fix "No test report files found" error
- Upload jacoco reports from spark modules alongside Kover coverage
- Add jacoco plugin to spark subprojects for Java coverage reporting
* Clean up tests: remove mock-echo test, fix container lifecycle, use project.root
- Delete SparkSubmitTest `command validates required parameters` (only checked defaults)
- SparkWriterIntegrationTest: make spark container @Container-managed so
TestContainers handles full lifecycle, preventing leaks on setup failure
- Replace fragile getParent().getParent() path resolution with Gradle
project.root system property (with fallback for IDE runs)
- Add dependsOn(:spark:connector-writer:shadowJar) so integration test
JAR is always built before tests run
* Use LOCAL_QUORUM via Java driver instead of cqlsh CONSISTENCY syntax
- DefaultCqlSessionService: execute all queries at LOCAL_QUORUM via
SimpleStatement.setConsistencyLevel() instead of default CL
- Remove invalid cqlsh CONSISTENCY syntax from e2e test shell script
(the cql command uses the Java driver, not cqlsh)
* Formatting fix1 parent 4b4622f commit c8b017c
File tree
202 files changed
+7714
-5582
lines changed- .claude/commands
- .github/workflows
- bin
- bulk-writer
- src/main/java/com/rustyrazorblade/easydblab/spark
- config/detekt
- connector-writer
- src/main/java/com/rustyrazorblade/easydblab/spark
- docs
- development
- integrations
- plans
- reference
- user-guide
- openspec
- changes
- archive
- 2026-03-08-live-stream-metrics
- specs/live-stream-metrics
- 2026-03-08-log-investigation-dashboard
- specs/log-investigation-dashboard
- 2026-03-08-spark-stop-job
- specs
- spark-emr
- spark-job-cancellation
- 2026-03-10-rename-mcp-server-to-server
- specs/server
- 2026-03-10-spark-refactor
- specs
- spark-emr
- spark-modules
- otel-log-parsing
- specs
- observability
- otel-log-parsing
- specs
- cassandra
- end-to-end-testing
- live-stream-metrics
- log-investigation-dashboard
- mcp-server
- networking
- server
- spark-emr
- spark-modules
- spark-connector-test1
- src/main/java/com/rustyrazorblade/easydblab/spark
- spark
- bulk-writer-s3
- src/main/java/com/rustyrazorblade/easydblab/spark
- bulk-writer-sidecar
- src/main/java/com/rustyrazorblade/easydblab/spark
- common
- src
- main/java/com/rustyrazorblade/easydblab/spark
- test/java/com/rustyrazorblade/easydblab/spark
- connector-read-write
- src/main/java/com/rustyrazorblade/easydblab/spark
- connector-writer
- src/main/java/com/rustyrazorblade/easydblab/spark
- src
- main
- kotlin/com/rustyrazorblade/easydblab
- annotations
- commands
- cassandra/stress
- clickhouse
- exec
- grafana
- logs
- metrics
- opensearch
- spark
- tailscale
- configuration
- clickhouse
- di
- events
- mcp
- output
- providers/aws
- services
- aws
- resources/com/rustyrazorblade/mcp
- test/kotlin/com/rustyrazorblade/easydblab
- commands
- cassandra/stress
- clickhouse
- logs
- metrics
- events
- mcp
- providers
- services
- aws
Some content is hidden
Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.
202 files changed
+7714
-5582
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1 | 1 | | |
2 | 2 | | |
3 | 3 | | |
4 | | - | |
| 4 | + | |
5 | 5 | | |
6 | 6 | | |
7 | 7 | | |
8 | 8 | | |
9 | | - | |
| 9 | + | |
10 | 10 | | |
11 | 11 | | |
12 | 12 | | |
| |||
38 | 38 | | |
39 | 39 | | |
40 | 40 | | |
41 | | - | |
| 41 | + | |
42 | 42 | | |
43 | 43 | | |
44 | 44 | | |
| |||
49 | 49 | | |
50 | 50 | | |
51 | 51 | | |
52 | | - | |
53 | | - | |
| 52 | + | |
| 53 | + | |
54 | 54 | | |
55 | 55 | | |
56 | 56 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
27 | 27 | | |
28 | 28 | | |
29 | 29 | | |
30 | | - | |
31 | | - | |
| 30 | + | |
| 31 | + | |
32 | 32 | | |
33 | 33 | | |
34 | 34 | | |
35 | 35 | | |
36 | 36 | | |
37 | 37 | | |
38 | 38 | | |
39 | | - | |
40 | | - | |
| 39 | + | |
| 40 | + | |
41 | 41 | | |
42 | 42 | | |
43 | 43 | | |
44 | 44 | | |
45 | 45 | | |
46 | 46 | | |
47 | 47 | | |
48 | | - | |
| 48 | + | |
49 | 49 | | |
50 | 50 | | |
51 | 51 | | |
| |||
66 | 66 | | |
67 | 67 | | |
68 | 68 | | |
| 69 | + | |
69 | 70 | | |
70 | 71 | | |
71 | 72 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
155 | 155 | | |
156 | 156 | | |
157 | 157 | | |
158 | | - | |
| 158 | + | |
| 159 | + | |
159 | 160 | | |
160 | 161 | | |
161 | 162 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
13 | 13 | | |
14 | 14 | | |
15 | 15 | | |
16 | | - | |
| 16 | + | |
17 | 17 | | |
18 | 18 | | |
19 | 19 | | |
20 | 20 | | |
21 | 21 | | |
22 | | - | |
23 | | - | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
24 | 27 | | |
25 | 28 | | |
26 | 29 | | |
| |||
40 | 43 | | |
41 | 44 | | |
42 | 45 | | |
43 | | - | |
| 46 | + | |
44 | 47 | | |
45 | 48 | | |
46 | | - | |
| 49 | + | |
47 | 50 | | |
48 | 51 | | |
49 | 52 | | |
| |||
297 | 300 | | |
298 | 301 | | |
299 | 302 | | |
300 | | - | |
| 303 | + | |
301 | 304 | | |
302 | 305 | | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
28 | 28 | | |
29 | 29 | | |
30 | 30 | | |
31 | | - | |
| 31 | + | |
32 | 32 | | |
33 | 33 | | |
34 | 34 | | |
| |||
470 | 470 | | |
471 | 471 | | |
472 | 472 | | |
473 | | - | |
| 473 | + | |
474 | 474 | | |
475 | | - | |
| 475 | + | |
476 | 476 | | |
477 | | - | |
| 477 | + | |
478 | 478 | | |
479 | | - | |
| 479 | + | |
480 | 480 | | |
481 | 481 | | |
482 | 482 | | |
483 | 483 | | |
484 | 484 | | |
485 | | - | |
| 485 | + | |
486 | 486 | | |
487 | 487 | | |
488 | 488 | | |
489 | | - | |
| 489 | + | |
490 | 490 | | |
491 | 491 | | |
492 | 492 | | |
493 | 493 | | |
494 | 494 | | |
495 | | - | |
| 495 | + | |
496 | 496 | | |
497 | 497 | | |
498 | 498 | | |
499 | | - | |
| 499 | + | |
500 | 500 | | |
501 | 501 | | |
502 | 502 | | |
| |||
505 | 505 | | |
506 | 506 | | |
507 | 507 | | |
508 | | - | |
| 508 | + | |
509 | 509 | | |
510 | 510 | | |
511 | 511 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
60 | 60 | | |
61 | 61 | | |
62 | 62 | | |
63 | | - | |
64 | | - | |
65 | | - | |
66 | | - | |
67 | | - | |
68 | | - | |
69 | | - | |
70 | | - | |
71 | | - | |
72 | | - | |
73 | 63 | | |
74 | 64 | | |
75 | 65 | | |
76 | 66 | | |
77 | 67 | | |
78 | 68 | | |
| 69 | + | |
| 70 | + | |
| 71 | + | |
79 | 72 | | |
80 | 73 | | |
81 | 74 | | |
| |||
101 | 94 | | |
102 | 95 | | |
103 | 96 | | |
104 | | - | |
105 | | - | |
106 | | - | |
107 | 97 | | |
108 | 98 | | |
109 | | - | |
110 | 99 | | |
111 | 100 | | |
112 | 101 | | |
| |||
156 | 145 | | |
157 | 146 | | |
158 | 147 | | |
159 | | - | |
| 148 | + | |
160 | 149 | | |
161 | 150 | | |
162 | 151 | | |
| |||
174 | 163 | | |
175 | 164 | | |
176 | 165 | | |
177 | | - | |
| 166 | + | |
178 | 167 | | |
179 | 168 | | |
180 | 169 | | |
| |||
0 commit comments