Add benchmark tests for ScyllaDB migrator#300
Draft
dkropachev wants to merge 16 commits intomasterfrom
Draft
Conversation
Introduce JMH microbenchmarks for CPU-bound transformations (explodeRow, convertValue, createSelection) and integration throughput benchmarks for end-to-end migration paths (Cassandra→Scylla, Scylla→Scylla) at 100K and 500K row scales. - Refactor convertRowTypes closure into public Cassandra.convertValue - Add sbt-jmh plugin and benchmarks module - Add Benchmark munit tag, excluded from regular test-integration runs - Add Makefile targets: benchmark-jmh, benchmark-jmh-quick, benchmark-integration, benchmark
361f2e6 to
420ee28
Compare
Set Jmh/baseDirectory to the project root so that relative output paths in Makefile targets resolve correctly from the forked JVM.
Cover per-row hot paths that were missing benchmark coverage: - DdbValue.from (flat, nested, set items) - AttributeValueUtils.fromV1 (SDK v1→v2 conversion) - DynamoDBS3Export.itemDecoder (simple through wide/deeply-nested JSON) - compareCassandraRows and compareDynamoDBRows (with and without timestamps) - stripTrailingZeros mapping (BigDecimal and mixed-type rows) - DdbValue Java serialization roundtrip (serialize, deserialize, roundtrip) - Cassandra.convertValue per type (UTF8String, Map, List, Set, ArrayBuffer) - Wide-row explodeRow (50 columns vs existing 3-column benchmarks) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add TargetSettings.Parquet config (path + compression), a thin writers.Parquet wrapper around Spark's native df.write.parquet, and wire up the Cassandra → Parquet route in Migrator.scala.
Add TargetSettings.DynamoDBS3Export config (path), a writer that produces gzipped DynamoDB JSON files with manifest-summary.json and manifest-files.json compatible with the existing S3 Export reader, and wire up the DynamoDB → S3 Export route in Migrator.scala. Includes 14 roundtrip unit tests verifying all DynamoDB attribute types (S, N, B, BOOL, NULL, SS, NS, BS, L, M) encode correctly and decode back to identical values via the existing itemDecoder.
Two-leg benchmark: Scylla -> Parquet (export) then Parquet -> Scylla (import) using the same dataset. Row count configurable via E2E_CQL_ROWS (default 5M). Parquet files written to Docker volume at /app/parquet/bench_e2e.
Split ParquetE2EBenchmark into ScyllaToParquetE2EBenchmark and
ParquetToScyllaE2EBenchmark (each runnable independently). Rename
all Makefile targets to test-benchmark-e2e-{source}-{target}:
test-benchmark-e2e-cassandra-scylla
test-benchmark-e2e-scylla-scylla
test-benchmark-e2e-dynamodb-alternator
test-benchmark-e2e-scylla-parquet
test-benchmark-e2e-parquet-scylla
Remove self-seeding fallback from ParquetToScyllaE2EBenchmark. The test now fails with a clear message if Parquet files are missing, requiring test-benchmark-e2e-scylla-parquet to run first.
Two-leg benchmark mirroring the Parquet pattern: test-benchmark-e2e-dynamodb-s3export: Seeds DynamoDB Local, exports to S3 Export format on local filesystem, verifies files exist. test-benchmark-e2e-s3export-alternator: Uploads export files to LocalStack S3, imports to Alternator, verifies row count. Requires running dynamodb-s3export first. Row count configurable via E2E_DDB_ROWS (default 500K).
…ucture - Add E2E benchmarks: Cassandra->Scylla, Scylla->Scylla, DynamoDB->Alternator, Cassandra->Parquet - Refactor benchmark utilities into shared traits (E2EBenchmarkSuite, ThroughputBenchmarkSupport) - Add E2E test category and TestFileUtils for config file management - Add DynamoDBBenchmarkDataGenerator for Alternator E2E tests - Add unit tests for Parquet and DynamoDB S3Export writers - Add ParquetTargetValidationTest for config validation - Refactor Makefile: sequential E2E execution, dependency targets, remove old benchmark-integration - Extract version constants in build.sbt, forward e2e system properties to test JVM - Refactor DynamoDB S3Export writer for improved encoding - Remove old non-E2E benchmark infrastructure (BenchmarkSuite, Benchmark category)
- Add 5-minute timeout to COUNT(*) queries in ThroughputBenchmarkSupport and ParquetToScyllaE2EBenchmark to prevent read timeouts on large tables - Set Parquet write mode to 'overwrite' in benchmark configs to handle pre-existing output directories from previous runs - Add docker compose exec fallback in TestFileUtils.deleteRecursive for cleaning up root-owned files created by Docker containers
Add test-benchmark-e2e-sanity Makefile target that runs all E2E migration path tests with minimal row counts (1000 CQL, 100 DynamoDB) for fast CI validation (~2 min). Integrated into the existing integration test job in the GitHub Actions workflow. Also fix stop-services to run unconditionally (if: always()) so Docker containers are cleaned up even when tests fail.
Rewrite the testing section to cover all test categories (unit, integration, AWS, E2E benchmarks, JMH), migration paths, row count configuration, CI pipeline, and the new E2E sanity suite.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
explodeRow,convertValue,createSelection) in a newbenchmarkssbt moduleconvertRowTypesclosure into publicCassandra.convertValueso JMH can call it directlyBenchmarkmunit tag excluded from regulartest-integrationrunsbenchmark-jmh,benchmark-jmh-quick,benchmark-integration,benchmarkTest plan
sbt migrator/compile— migrator compiles after refactorsbt benchmarks/compile— JMH benchmarks compilesbt scalafmtCheckAll— formatting passesmake test-integration— existing integration tests still pass, benchmarks excludedmake benchmark-integration— integration benchmarks run against Docker services