Skip to content

Mysql to ScyllaDB Migrator and validator #307

Open
vigneshkumarak wants to merge 4 commits intoscylladb:masterfrom
vigneshkumarak:master
Open

Mysql to ScyllaDB Migrator and validator #307
vigneshkumarak wants to merge 4 commits intoscylladb:masterfrom
vigneshkumarak:master

Conversation

@vigneshkumarak
Copy link

Add MySQL as a migration and validation source

Summary

Adds support for migrating data from MySQL to ScyllaDB and validating the migrated data. This enables using the scylla-migrator for MySQL-to-ScyllaDB migrations with the same Spark-based workflow already supported for Cassandra and DynamoDB sources.

Migration

  • New MySQL source type that reads tables via Spark JDBC with configurable partitioned parallel reads, cursor-based fetching, and optional WHERE filtering
  • MySQLSchemaMapper handles MySQL-specific type conversions (e.g., TINYINT(1) / BIT(1) to Boolean, unsigned BIGINT to Decimal)
  • Wired into Migrator to write to ScyllaDB via the existing Spark ScyllaDB connector

Validation

  • MySQLToScyllaValidator performs row-by-row comparison between MySQL source and ScyllaDB target by joining on the configured primary key
  • Supports hash-based comparison via a new hashColumns config option: large text/blob columns are replaced by an MD5 hash computed server-side in MySQL and client-side in Spark for ScyllaDB, dramatically reducing network data transfer during validation
  • Comparison supports configurable tolerances for floats, timestamps, and BigDecimal values
  • Validation summary now reports a breakdown by failure type (e.g., "1 missing target row(s), 1 differing field value(s)")

Configuration

  • New SourceSettings.MySQL with fields: host, port, database, table, credentials, primaryKey, partitionColumn, numPartitions, lowerBound, upperBound, fetchSize, where, connectionProperties
  • New optional hashColumns field in Validation config (backward-compatible — defaults to None, existing configs are unaffected)
  • Added mysql-connector-j 8.3.0 dependency

Test configs

  • mysql-to-scylla-basic.yaml — basic MySQL migration with validation
  • mysql-to-scylla-large-blobs.yaml — large BLOB migration with hash-based validation

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant