Add benchmark tests for ScyllaDB migrator by dkropachev · Pull Request #300 · scylladb/scylla-migrator

dkropachev · 2026-02-27T00:07:26Z

Summary

Add JMH microbenchmarks for CPU-bound transformations (explodeRow, convertValue, createSelection) in a new benchmarks sbt module
Add integration throughput benchmarks for Cassandra→Scylla and Scylla→Scylla migration paths at 100K and 500K row scales
Refactor convertRowTypes closure into public Cassandra.convertValue so JMH can call it directly
Add Benchmark munit tag excluded from regular test-integration runs
Add Makefile targets: benchmark-jmh, benchmark-jmh-quick, benchmark-integration, benchmark

Test plan

sbt migrator/compile — migrator compiles after refactor
sbt benchmarks/compile — JMH benchmarks compile
sbt scalafmtCheckAll — formatting passes
JMH smoke test runs successfully (ExplodeRowBenchmark)
All 91 unit tests pass (no regression)
make test-integration — existing integration tests still pass, benchmarks excluded
make benchmark-integration — integration benchmarks run against Docker services

Introduce JMH microbenchmarks for CPU-bound transformations (explodeRow, convertValue, createSelection) and integration throughput benchmarks for end-to-end migration paths (Cassandra→Scylla, Scylla→Scylla) at 100K and 500K row scales. - Refactor convertRowTypes closure into public Cassandra.convertValue - Add sbt-jmh plugin and benchmarks module - Add Benchmark munit tag, excluded from regular test-integration runs - Add Makefile targets: benchmark-jmh, benchmark-jmh-quick, benchmark-integration, benchmark

Set Jmh/baseDirectory to the project root so that relative output paths in Makefile targets resolve correctly from the forked JVM.

Cover per-row hot paths that were missing benchmark coverage: - DdbValue.from (flat, nested, set items) - AttributeValueUtils.fromV1 (SDK v1→v2 conversion) - DynamoDBS3Export.itemDecoder (simple through wide/deeply-nested JSON) - compareCassandraRows and compareDynamoDBRows (with and without timestamps) - stripTrailingZeros mapping (BigDecimal and mixed-type rows) - DdbValue Java serialization roundtrip (serialize, deserialize, roundtrip) - Cassandra.convertValue per type (UTF8String, Map, List, Set, ArrayBuffer) - Wide-row explodeRow (50 columns vs existing 3-column benchmarks) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Add TargetSettings.Parquet config (path + compression), a thin writers.Parquet wrapper around Spark's native df.write.parquet, and wire up the Cassandra → Parquet route in Migrator.scala.

Add TargetSettings.DynamoDBS3Export config (path), a writer that produces gzipped DynamoDB JSON files with manifest-summary.json and manifest-files.json compatible with the existing S3 Export reader, and wire up the DynamoDB → S3 Export route in Migrator.scala. Includes 14 roundtrip unit tests verifying all DynamoDB attribute types (S, N, B, BOOL, NULL, SS, NS, BS, L, M) encode correctly and decode back to identical values via the existing itemDecoder.

Two-leg benchmark: Scylla -> Parquet (export) then Parquet -> Scylla (import) using the same dataset. Row count configurable via E2E_CQL_ROWS (default 5M). Parquet files written to Docker volume at /app/parquet/bench_e2e.

Split ParquetE2EBenchmark into ScyllaToParquetE2EBenchmark and ParquetToScyllaE2EBenchmark (each runnable independently). Rename all Makefile targets to test-benchmark-e2e-{source}-{target}: test-benchmark-e2e-cassandra-scylla test-benchmark-e2e-scylla-scylla test-benchmark-e2e-dynamodb-alternator test-benchmark-e2e-scylla-parquet test-benchmark-e2e-parquet-scylla

Remove self-seeding fallback from ParquetToScyllaE2EBenchmark. The test now fails with a clear message if Parquet files are missing, requiring test-benchmark-e2e-scylla-parquet to run first.

Two-leg benchmark mirroring the Parquet pattern: test-benchmark-e2e-dynamodb-s3export: Seeds DynamoDB Local, exports to S3 Export format on local filesystem, verifies files exist. test-benchmark-e2e-s3export-alternator: Uploads export files to LocalStack S3, imports to Alternator, verifies row count. Requires running dynamodb-s3export first. Row count configurable via E2E_DDB_ROWS (default 500K).

…ucture - Add E2E benchmarks: Cassandra->Scylla, Scylla->Scylla, DynamoDB->Alternator, Cassandra->Parquet - Refactor benchmark utilities into shared traits (E2EBenchmarkSuite, ThroughputBenchmarkSupport) - Add E2E test category and TestFileUtils for config file management - Add DynamoDBBenchmarkDataGenerator for Alternator E2E tests - Add unit tests for Parquet and DynamoDB S3Export writers - Add ParquetTargetValidationTest for config validation - Refactor Makefile: sequential E2E execution, dependency targets, remove old benchmark-integration - Extract version constants in build.sbt, forward e2e system properties to test JVM - Refactor DynamoDB S3Export writer for improved encoding - Remove old non-E2E benchmark infrastructure (BenchmarkSuite, Benchmark category)

- Add 5-minute timeout to COUNT(*) queries in ThroughputBenchmarkSupport and ParquetToScyllaE2EBenchmark to prevent read timeouts on large tables - Set Parquet write mode to 'overwrite' in benchmark configs to handle pre-existing output directories from previous runs - Add docker compose exec fallback in TestFileUtils.deleteRecursive for cleaning up root-owned files created by Docker containers

Add test-benchmark-e2e-sanity Makefile target that runs all E2E migration path tests with minimal row counts (1000 CQL, 100 DynamoDB) for fast CI validation (~2 min). Integrated into the existing integration test job in the GitHub Actions workflow. Also fix stop-services to run unconditionally (if: always()) so Docker containers are cleaned up even when tests fail.

Rewrite the testing section to cover all test categories (unit, integration, AWS, E2E benchmarks, JMH), migration paths, row count configuration, CI pipeline, and the new E2E sanity suite.

dkropachev added 2 commits February 26, 2026 21:08

Rename benchmark Makefile targets to test-benchmark-* prefix

420ee28

dkropachev force-pushed the add-benchmark-tests branch from 361f2e6 to 420ee28 Compare February 27, 2026 01:08

dkropachev and others added 14 commits February 26, 2026 21:28

Fix JMH result file paths to be relative to project root

d695eae

Set Jmh/baseDirectory to the project root so that relative output paths in Makefile targets resolve correctly from the forked JVM.

Increase JMH benchmark duration to 15 warmup + 30 measurement iterations

b3ab979

Add Parquet target writer and Cassandra-to-Parquet migration path

bd1f1a3

Add TargetSettings.Parquet config (path + compression), a thin writers.Parquet wrapper around Spark's native df.write.parquet, and wire up the Cassandra → Parquet route in Migrator.scala.

Add Scylla<->Parquet E2E benchmarks

fcc0434

Two-leg benchmark: Scylla -> Parquet (export) then Parquet -> Scylla (import) using the same dataset. Row count configurable via E2E_CQL_ROWS (default 5M). Parquet files written to Docker volume at /app/parquet/bench_e2e.

Require Scylla->Parquet run before Parquet->Scylla benchmark

8d4f559

Remove self-seeding fallback from ParquetToScyllaE2EBenchmark. The test now fails with a clear message if Parquet files are missing, requiring test-benchmark-e2e-scylla-parquet to run first.

Document all test types in CONTRIBUTING.md

62534b4

Rewrite the testing section to cover all test categories (unit, integration, AWS, E2E benchmarks, JMH), migration paths, row count configuration, CI pipeline, and the new E2E sanity suite.

Fix scalafmt formatting in TargetSettings and DynamoDBS3Export

aefedcb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add benchmark tests for ScyllaDB migrator#300

Add benchmark tests for ScyllaDB migrator#300
dkropachev wants to merge 16 commits intomasterfrom
add-benchmark-tests

dkropachev commented Feb 27, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

dkropachev commented Feb 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

dkropachev commented Feb 27, 2026 •

edited

Loading