You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+17-14Lines changed: 17 additions & 14 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -24,15 +24,17 @@ tar -xvzf spark-3.3.1-bin-hadoop3.tgz
24
24
25
25
# Steps for Data-Migration:
26
26
27
-
1.`sparkConf.properties` file needs to be configured as applicable for the environment
27
+
> :warning: Note that Version 4 of the tool is not backward-compatible with .properties files created in previous versions, and that package names have changed.
28
+
29
+
1.`sparkConf.properties` file needs to be configured as applicable for the environment. Parameter descriptions and defaults are described in the file.
28
30
> A sample Spark conf file configuration can be [found here](./src/resources/sparkConf.properties)
29
31
2. Place the conf file where it can be accessed while running the job via spark-submit.
30
32
3. Run the below job using `spark-submit` command as shown below:
22/10/27 23:25:30 ERROR DiffJobSession: Mismatch row found for key: Grapes %% 1 %% 2020-05-22 %% 2020-05-23T00:05:09.353Z %% skuid %% augue odio at quam Data: (Index: 8 Origin: Hello 3 Target: Hello 2 )
63
-
22/10/27 23:25:30 ERROR DiffJobSession: Updated mismatch row in target: Grapes %% 1 %% 2020-05-22 %% 2020-05-23T00:05:09.353Z %% skuid %% augue odio at quam
62
+
23/04/06 08:43:06 ERROR DiffJobSession: Mismatch row found for key: [key3] Mismatch: Target Index: 1 Origin: valueC Target: value999)
63
+
23/04/06 08:43:06 ERROR DiffJobSession: Corrected mismatch row in target: [key3]
64
+
23/04/06 08:43:06 ERROR DiffJobSession: Missing target row found for key: [key2]
65
+
23/04/06 08:43:06 ERROR DiffJobSession: Inserted missing row in target: [key2]
64
66
```
65
67
66
68
- Please grep for all `ERROR` from the output log files to get the list of missing and mismatched records.
@@ -70,18 +72,18 @@ Note:
70
72
- Update any mismatched records between origin and target (makes target same as origin).
71
73
- Enable/disable this feature using one or both of the below setting in the config file
72
74
```
73
-
spark.target.autocorrect.missing true|false
74
-
spark.target.autocorrect.mismatch true|false
75
+
spark.cdm.autocorrect.missing false|true
76
+
spark.cdm.autocorrect.mismatch false|true
75
77
```
76
78
Note:
77
79
- The validation job will never delete records from target i.e. it only adds or updates data on target
78
80
79
81
# Migrating specific partition ranges
80
-
- You can also use the tool to migrate specific partition ranges using class option `--class datastax.astra.migrate.MigratePartitionsFromFile` as shown below
82
+
- You can also use the tool to migrate specific partition ranges using class option `--class datastax.cdm.job.MigratePartitionsFromFile` as shown below
When running in above mode the tool assumes a `partitions.csv` file to be present in the current folder in the below format, where each line (`min,max`) represents a partition-range
@@ -99,6 +101,7 @@ This mode is specifically useful to processes a subset of partition-ranges that
99
101
- Supports migration/validation of advanced DataTypes ([Sets](https://docs.datastax.com/en/dse/6.8/cql/cql/cql_reference/refDataTypes.html#refDataTypes__set), [Lists](https://docs.datastax.com/en/dse/6.8/cql/cql/cql_reference/refDataTypes.html#refDataTypes__list), [Maps](https://docs.datastax.com/en/dse/6.8/cql/cql/cql_reference/refDataTypes.html#refDataTypes__map), [UDTs](https://docs.datastax.com/en/dse/6.8/cql/cql/cql_reference/refDataTypes.html#refDataTypes__udt))
100
102
- Filter records from `Origin` using `writetimes` and/or CQL conditions and/or min/max token-range
101
103
- Supports adding `constants` as new columns on `Target`
104
+
- Supports expanding `Map` columns on `Origin` into multiple records on `Target`
102
105
- Fully containerized (Docker and K8s friendly)
103
106
- SSL Support (including custom cipher algorithms)
104
107
- Migrate from any Cassandra `Origin` ([Apache Cassandra®](https://cassandra.apache.org) / [DataStax Enterprise™](https://www.datastax.com/products/datastax-enterprise) / [DataStax Astra DB™](https://www.datastax.com/products/datastax-astra)) to any Cassandra `Target` ([Apache Cassandra®](https://cassandra.apache.org) / [DataStax Enterprise™](https://www.datastax.com/products/datastax-enterprise) / [DataStax Astra DB™](https://www.datastax.com/products/datastax-astra))
@@ -110,7 +113,7 @@ This mode is specifically useful to processes a subset of partition-ranges that
110
113
1. Clone this repo
111
114
2. Move to the repo folder `cd cassandra-data-migrator`
112
115
3. Run the build `mvn clean package` (Needs Maven 3.8.x)
113
-
4. The fat jar (`cassandra-data-migrator-3.x.x.jar`) file should now be present in the `target` folder
116
+
4. The fat jar (`cassandra-data-migrator-4.x.x.jar`) file should now be present in the `target` folder
114
117
115
118
# Contributors
116
119
Checkout all our wonderful contributors [here](./CONTRIBUTING.md#contributors).
0 commit comments