@@ -4,6 +4,12 @@ Spark jobs in this repo can be used for data migration and data validation.
4
4
5
5
> :warning : Please note this job has been tested with spark version [ 2.4.8] ( https://archive.apache.org/dist/spark/spark-2.4.8/ )
6
6
7
+ ## Build
8
+ 1 . Clone this repo
9
+ 2 . Move to the repo folder ` cd cassandra-data-migrator `
10
+ 3 . Run the build ` mvn clean package `
11
+ 4 . The fat jar (` cassandra-data-migrator-2.x.jar ` ) file should now be present in the ` target ` folder
12
+
7
13
## Prerequisite
8
14
9
15
Install Java8 as spark binaries are compiled with it.
@@ -19,13 +25,12 @@ tar -xvzf <spark downloaded file name>
19
25
1 . ` sparkConf.properties ` file needs to be configured as applicable for the environment
20
26
> A sample Spark conf file configuration can be [ found here] ( ./src/resources/sparkConf.properties )
21
27
2 . Place the conf file where it can be accessed while running the job via spark-submit.
22
- 3 . Generate a fat jar (` cassandra-data-migrator-1.x.jar ` ) using command ` mvn clean package `
23
- 4 . Run the 'Data Migration' job using ` spark-submit ` command as shown below:
28
+ 3 . Run the 'Data Migration' job using ` spark-submit ` command as shown below:
24
29
25
30
```
26
31
./spark-submit --properties-file sparkConf.properties /
27
32
--master "local[*]" /
28
- --class datastax.astra.migrate.Migrate cassandra-data-migrator-1 .x.jar &> logfile_name.txt
33
+ --class datastax.astra.migrate.Migrate cassandra-data-migrator-2 .x.jar &> logfile_name.txt
29
34
```
30
35
31
36
Note: Above command also generates a log file ` logfile_name.txt ` to avoid log output on the console.
@@ -38,7 +43,7 @@ Note: Above command also generates a log file `logfile_name.txt` to avoid log ou
38
43
```
39
44
./spark-submit --properties-file sparkConf.properties /
40
45
--master "local[*]" /
41
- --class datastax.astra.migrate.DiffData cassandra-data-migrator-1 .x.jar &> logfile_name.txt
46
+ --class datastax.astra.migrate.DiffData cassandra-data-migrator-2 .x.jar &> logfile_name.txt
42
47
```
43
48
44
49
- Validation job will report differences as “ERRORS” in the log file as shown below
@@ -67,7 +72,7 @@ spark.target.autocorrect.mismatch true|false
67
72
```
68
73
./spark-submit --properties-file sparkConf.properties /
69
74
--master "local[*]" /
70
- --class datastax.astra.migrate.MigratePartitionsFromFile cassandra-data-migrator-1 .x.jar &> logfile_name.txt
75
+ --class datastax.astra.migrate.MigratePartitionsFromFile cassandra-data-migrator-2 .x.jar &> logfile_name.txt
71
76
```
72
77
73
78
When running in above mode the tool assumes a ` partitions.csv ` file to be present in the current folder in the below format, where each line (` min,max ` ) represents a partition-range
0 commit comments