You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+36-28Lines changed: 36 additions & 28 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -6,28 +6,21 @@ Migrate and Validate Tables between Origin and Target Cassandra Clusters.
6
6
7
7
> :warning: Please note this job has been tested with spark version [3.3.1](https://archive.apache.org/dist/spark/spark-3.3.1/)
8
8
9
-
## Container Image
9
+
## Install as a Container
10
10
- Get the latest image that includes all dependencies from [DockerHub](https://hub.docker.com/r/datastax/cassandra-data-migrator)
11
-
- If you use this route, all migration tools (`cassandra-data-migrator` + `dsbulk` + `cqlsh`) would be available in the `/assets/` folder of the container
12
-
- OR follow the below build steps (and Prerequisite) to build the jar locally
11
+
- All migration tools (`cassandra-data-migrator` + `dsbulk` + `cqlsh`) would be available in the `/assets/` folder of the container
13
12
14
-
### Prerequisite
13
+
## Install as a JAR file
14
+
- Download the latest jar file from the GitHub [packages area here](https://github.com/orgs/datastax/packages?repo_name=cassandra-data-migrator)
15
15
16
+
### Prerequisite
16
17
- Install Java8 as spark binaries are compiled with it.
17
-
- Install Maven 3.8.x
18
-
- Install single instance of spark on a node where you want to run this job. Spark can be installed by running the following: -
19
-
18
+
- Install Spark version [3.3.1](https://archive.apache.org/dist/spark/spark-3.3.1/) on a single VM (no cluster necessary) where you want to run this job. Spark can be installed by running the following: -
@@ -64,19 +63,20 @@ Note: Above command also generates a log file `logfile_name.txt` to avoid log ou
64
63
```
65
64
66
65
- Please grep for all `ERROR` from the output log files to get the list of missing and mismatched records.
67
-
- Note that it lists differences by partition key values.
66
+
- Note that it lists differences by primary-key values.
68
67
- The Validation job can also be run in an AutoCorrect mode. This mode can
69
68
- Add any missing records from origin to target
70
-
-Fix any inconsistencies between origin and target (makes target same as origin).
69
+
-Update any mismatched records between origin and target (makes target same as origin).
71
70
- Enable/disable this feature using one or both of the below setting in the config file
72
-
73
71
```
74
72
spark.target.autocorrect.missing true|false
75
73
spark.target.autocorrect.mismatch true|false
76
74
```
75
+
Note:
76
+
- The validation job will never delete records from target i.e. it only adds or updates data on target
77
77
78
78
# Migrating specific partition ranges
79
-
- You can also use the tool to migrate specific partition ranges, use class option `--class datastax.astra.migrate.MigratePartitionsFromFile` as shown below
79
+
- You can also use the tool to migrate specific partition ranges using class option `--class datastax.astra.migrate.MigratePartitionsFromFile` as shown below
@@ -90,18 +90,26 @@ When running in above mode the tool assumes a `partitions.csv` file to be presen
90
90
2637884402540451982,4638499294009575633
91
91
798869613692279889,8699484505161403540
92
92
```
93
-
This mode is specifically useful to processes a subset of partition-ranges that may have generated errors as a result of a previous long-running job to migrate a large table.
- Preserve [writetimes](https://docs.datastax.com/en/dse/6.8/cql/cql/cql_reference/cql_commands/cqlSelect.html#cqlSelect__retrieving-the-datetime-a-write-occurred-p) and [TTL](https://docs.datastax.com/en/dse/6.8/cql/cql/cql_reference/cql_commands/cqlSelect.html#cqlSelect__ref-select-ttl-p)
- Filter records from origin using writetimes, CQL conditions, token-ranges
93
+
This mode is specifically useful to processes a subset of partition-ranges that may have failed during a previous run.
94
+
95
+
# Features
96
+
- Supports migration/validation of [Counter tables](https://docs.datastax.com/en/dse/6.8/cql/cql/cql_using/useCountersConcept.html)
97
+
- Preserve [writetimes](https://docs.datastax.com/en/dse/6.8/cql/cql/cql_reference/cql_commands/cqlSelect.html#cqlSelect__retrieving-the-datetime-a-write-occurred-p) and [TTLs](https://docs.datastax.com/en/dse/6.8/cql/cql/cql_reference/cql_commands/cqlSelect.html#cqlSelect__ref-select-ttl-p)
98
+
- Supports migration/validation of advanced DataTypes ([Sets](https://docs.datastax.com/en/dse/6.8/cql/cql/cql_reference/refDataTypes.html#refDataTypes__set), [Lists](https://docs.datastax.com/en/dse/6.8/cql/cql/cql_reference/refDataTypes.html#refDataTypes__list), [Maps](https://docs.datastax.com/en/dse/6.8/cql/cql/cql_reference/refDataTypes.html#refDataTypes__map), [UDTs](https://docs.datastax.com/en/dse/6.8/cql/cql/cql_reference/refDataTypes.html#refDataTypes__udt))
99
+
- Filter records from `Origin` using `writetimes` and/or CQL conditions and/or min/max token-range
100
+
- Supports adding `constants` as new columns on `Target`
100
101
- Fully containerized (Docker and K8s friendly)
101
102
- SSL Support (including custom cipher algorithms)
102
-
- Migrate from any Cassandra origin ([Apache Cassandra®](https://cassandra.apache.org) / [DataStax Enterprise™](https://www.datastax.com/products/datastax-enterprise) / [DataStax Astra DB™](https://www.datastax.com/products/datastax-astra)) to any Cassandra target ([Apache Cassandra®](https://cassandra.apache.org) / [DataStax Enterprise™](https://www.datastax.com/products/datastax-enterprise) / [DataStax Astra DB™](https://www.datastax.com/products/datastax-astra))
103
+
- Migrate from any Cassandra `Origin` ([Apache Cassandra®](https://cassandra.apache.org) / [DataStax Enterprise™](https://www.datastax.com/products/datastax-enterprise) / [DataStax Astra DB™](https://www.datastax.com/products/datastax-astra)) to any Cassandra `Target` ([Apache Cassandra®](https://cassandra.apache.org) / [DataStax Enterprise™](https://www.datastax.com/products/datastax-enterprise) / [DataStax Astra DB™](https://www.datastax.com/products/datastax-astra))
104
+
- Supports migration/validation from and to [Azure Cosmos Cassandra](https://learn.microsoft.com/en-us/azure/cosmos-db/cassandra)
103
105
- Validate migration accuracy and performance using a smaller randomized data-set
104
-
- Custom writetime
106
+
- Supports adding custom fixed `writetime`
107
+
108
+
# Building Jar for local development
109
+
1. Clone this repo
110
+
2. Move to the repo folder `cd cassandra-data-migrator`
111
+
3. Run the build `mvn clean package` (Needs Maven 3.8.x)
112
+
4. The fat jar (`cassandra-data-migrator-3.x.x.jar`) file should now be present in the `target` folder
105
113
106
114
# Contributors
107
115
Checkout all our wonderful contributors [here](./CONTRIBUTING.md#contributors).
0 commit comments