Skip to content

Commit 0338a9e

Browse files
authored
Merge pull request #47 from datastax/feature/cdm-docker-image
Follow semver format
2 parents 4f25eaa + b9773f4 commit 0338a9e

File tree

3 files changed

+24
-17
lines changed

3 files changed

+24
-17
lines changed

README.md

Lines changed: 19 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -4,22 +4,28 @@ Migrate and Validate Tables between Origin and Target Cassandra Clusters.
44

55
> :warning: Please note this job has been tested with spark version [2.4.8](https://archive.apache.org/dist/spark/spark-2.4.8/)
66
7-
## Build
8-
1. Clone this repo
9-
2. Move to the repo folder `cd cassandra-data-migrator`
10-
3. Run the build `mvn clean package`
11-
4. The fat jar (`cassandra-data-migrator-2.x.jar`) file should now be present in the `target` folder
7+
## Container Image
8+
- Get the latest image that includes all dependencies from [DockerHub](https://hub.docker.com/r/datastax/cassandra-data-migrator)
9+
- If you use this route, all migration tools (`cassandra-data-migrator` + `dsbulk` + `cqlsh`) would be available in the `/assets/` folder of the container
10+
- OR follow the below build steps (and Prerequisite) to build the jar locally
1211

13-
## Prerequisite
12+
### Prerequisite
1413

15-
Install Java8 as spark binaries are compiled with it.
16-
Install single instance of spark on a node where you want to run this job. Spark can be installed by running the following: -
14+
- Install Java8 as spark binaries are compiled with it.
15+
- Install Maven 3.8.x
16+
- Install single instance of spark on a node where you want to run this job. Spark can be installed by running the following: -
1717

1818
```
1919
wget https://downloads.apache.org/spark/spark-2.4.8/
2020
tar -xvzf <spark downloaded file name>
2121
```
2222

23+
### Build
24+
1. Clone this repo
25+
2. Move to the repo folder `cd cassandra-data-migrator`
26+
3. Run the build `mvn clean package`
27+
4. The fat jar (`cassandra-data-migrator-2.x.x.jar`) file should now be present in the `target` folder
28+
2329
# Steps for Data-Migration:
2430

2531
1. `sparkConf.properties` file needs to be configured as applicable for the environment
@@ -30,7 +36,7 @@ tar -xvzf <spark downloaded file name>
3036
```
3137
./spark-submit --properties-file sparkConf.properties /
3238
--master "local[*]" /
33-
--class datastax.astra.migrate.Migrate cassandra-data-migrator-2.x.jar &> logfile_name.txt
39+
--class datastax.astra.migrate.Migrate cassandra-data-migrator-2.x.x.jar &> logfile_name.txt
3440
```
3541

3642
Note: Above command also generates a log file `logfile_name.txt` to avoid log output on the console.
@@ -43,7 +49,7 @@ Note: Above command also generates a log file `logfile_name.txt` to avoid log ou
4349
```
4450
./spark-submit --properties-file sparkConf.properties /
4551
--master "local[*]" /
46-
--class datastax.astra.migrate.DiffData cassandra-data-migrator-2.x.jar &> logfile_name.txt
52+
--class datastax.astra.migrate.DiffData cassandra-data-migrator-2.x.x.jar &> logfile_name.txt
4753
```
4854

4955
- Validation job will report differences as “ERRORS” in the log file as shown below
@@ -72,7 +78,7 @@ spark.target.autocorrect.mismatch true|false
7278
```
7379
./spark-submit --properties-file sparkConf.properties /
7480
--master "local[*]" /
75-
--class datastax.astra.migrate.MigratePartitionsFromFile cassandra-data-migrator-2.x.jar &> logfile_name.txt
81+
--class datastax.astra.migrate.MigratePartitionsFromFile cassandra-data-migrator-2.x.x.jar &> logfile_name.txt
7682
```
7783

7884
When running in above mode the tool assumes a `partitions.csv` file to be present in the current folder in the below format, where each line (`min,max`) represents a partition-range
@@ -88,7 +94,8 @@ This mode is specifically useful to processes a subset of partition-ranges that
8894
- [Counter tables](https://docs.datastax.com/en/dse/6.8/cql/cql/cql_using/useCountersConcept.html)
8995
- Preserve [writetimes](https://docs.datastax.com/en/dse/6.8/cql/cql/cql_reference/cql_commands/cqlSelect.html#cqlSelect__retrieving-the-datetime-a-write-occurred-p) and [TTL](https://docs.datastax.com/en/dse/6.8/cql/cql/cql_reference/cql_commands/cqlSelect.html#cqlSelect__ref-select-ttl-p)
9096
- Advanced DataTypes ([Sets](https://docs.datastax.com/en/dse/6.8/cql/cql/cql_reference/refDataTypes.html#refDataTypes__set), [Lists](https://docs.datastax.com/en/dse/6.8/cql/cql/cql_reference/refDataTypes.html#refDataTypes__list), [Maps](https://docs.datastax.com/en/dse/6.8/cql/cql/cql_reference/refDataTypes.html#refDataTypes__map), [UDTs](https://docs.datastax.com/en/dse/6.8/cql/cql/cql_reference/refDataTypes.html#refDataTypes__udt))
91-
- Filter records from origin using writetime
97+
- Filter records from origin using writetimes, CQL conditions, token-ranges
98+
- Fully containerized (Docker and K8s friendly)
9299
- SSL Support (including custom cipher algorithms)
93100
- Migrate from any Cassandra origin ([Apache Cassandra](https://cassandra.apache.org) / [DataStax Enterprise](https://www.datastax.com/products/datastax-enterprise) / [DataStax Astra DB](https://www.datastax.com/products/datastax-astra)) to any Cassandra target ([Apache Cassandra](https://cassandra.apache.org) / [DataStax Enterprise](https://www.datastax.com/products/datastax-enterprise) / [DataStax Astra DB](https://www.datastax.com/products/datastax-astra))
94101
- Validate migration accuracy and performance using a smaller randomized data-set

pom.xml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@
33

44
<groupId>datastax.astra.migrate</groupId>
55
<artifactId>cassandra-data-migrator</artifactId>
6-
<version>2.10</version>
6+
<version>2.10.1</version>
77
<packaging>jar</packaging>
88

99
<properties>

src/resources/runCommands.txt

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -4,11 +4,11 @@ curl -OL https://downloads.datastax.com/enterprise/cqlsh-astra.tar.gz
44
wget https://archive.apache.org/dist/spark/spark-2.4.8/spark-2.4.8-bin-hadoop2.7.tgz
55

66
// Migrate
7-
spark-submit --properties-file /<path>/sparkConf.properties --verbose --master "local[8]" --class datastax.astra.migrate.Migrate /<path>/cassandra-data-migrator-2.x.jar
8-
spark-submit --properties-file /<path>/sparkConf.properties --master "local[8]" --driver-memory 25G --executor-memory 25G --class datastax.astra.migrate.Migrate /<path>/cassandra-data-migrator-2.x.jar &> table_out.log
7+
spark-submit --properties-file /<path>/sparkConf.properties --verbose --master "local[8]" --class datastax.astra.migrate.Migrate /<path>/cassandra-data-migrator-2.*.jar
8+
spark-submit --properties-file /<path>/sparkConf.properties --master "local[8]" --driver-memory 25G --executor-memory 25G --class datastax.astra.migrate.Migrate /<path>/cassandra-data-migrator-2.*.jar &> table_out.log
99

1010
// Random Partitioner Run Command
11-
spark-submit --properties-file /<path>/sparkConf.properties --verbose --master "local[8]" --conf spark.origin.minPartition=-1 --conf spark.origin.maxPartition=170141183460469231731687303715884105728 --class datastax.astra.migrate.Migrate /<path>/cassandra-data-migrator-2.x.jar
11+
spark-submit --properties-file /<path>/sparkConf.properties --verbose --master "local[8]" --conf spark.origin.minPartition=-1 --conf spark.origin.maxPartition=170141183460469231731687303715884105728 --class datastax.astra.migrate.Migrate /<path>/cassandra-data-migrator-2.*.jar
1212

1313
// Validate
14-
spark-submit --properties-file /<path>/sparkConf.properties --master "local[8]" --driver-memory 25G --executor-memory 25G --class datastax.astra.migrate.DiffData /<path>/cassandra-data-migrator-2.x.jar &> table_out.log
14+
spark-submit --properties-file /<path>/sparkConf.properties --master "local[8]" --driver-memory 25G --executor-memory 25G --class datastax.astra.migrate.DiffData /<path>/cassandra-data-migrator-2.*.jar &> table_out.log

0 commit comments

Comments
 (0)