Skip to content

Commit 8140df0

Browse files
authored
Merge pull request #20 from datastax/feature/zdm-complaint-naming
Make config properties naming to be ZDM complaint
2 parents 9fd1e22 + 989e3db commit 8140df0

File tree

13 files changed

+176
-155
lines changed

13 files changed

+176
-155
lines changed

README.md

Lines changed: 10 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -44,22 +44,22 @@ Note: Above command also generates a log file `logfile_name.txt` to avoid log ou
4444
- Validation job will report differences as “ERRORS” in the log file as shown below
4545

4646
```
47-
22/09/27 11:21:24 ERROR DiffJobSession: Data mismatch found - Key: ek-1 %% mn1 %% c1 %% true Data: (Index: 4 Source: 30 Astra: 20 )
48-
22/09/27 11:21:24 ERROR DiffJobSession: Corrected mismatch data in Astra: ek-1 %% mn1 %% c1 %% true
49-
22/09/27 11:21:24 ERROR DiffJobSession: Data is missing in Astra: ek-2 %% mn2 %% c2 %% true
50-
22/09/27 11:21:24 ERROR DiffJobSession: Corrected missing data in Astra: ek-2 %% mn2 %% c2 %% true
47+
22/10/27 23:25:29 ERROR DiffJobSession: Missing target row found for key: Grapes %% 1 %% 2020-05-22 %% 2020-05-23T00:05:09.353Z %% skuid %% Aliquam faucibus
48+
22/10/27 23:25:29 ERROR DiffJobSession: Inserted missing row in target: Grapes %% 1 %% 2020-05-22 %% 2020-05-23T00:05:09.353Z %% skuid %% Aliquam faucibus
49+
22/10/27 23:25:30 ERROR DiffJobSession: Mismatch row found for key: Grapes %% 1 %% 2020-05-22 %% 2020-05-23T00:05:09.353Z %% skuid %% augue odio at quam Data: (Index: 8 Origin: Hello 3 Target: Hello 2 )
50+
22/10/27 23:25:30 ERROR DiffJobSession: Updated mismatch row in target: Grapes %% 1 %% 2020-05-22 %% 2020-05-23T00:05:09.353Z %% skuid %% augue odio at quam
5151
```
5252

5353
- Please grep for all `ERROR` from the output log files to get the list of missing and mismatched records.
5454
- Note that it lists differences by partition key values.
5555
- The Validation job can also be run in an AutoCorrect mode. This mode can
56-
- Add any missing records from source to target
57-
- Fix any inconsistencies between source and target (makes target same as source).
56+
- Add any missing records from origin to target
57+
- Fix any inconsistencies between origin and target (makes target same as origin).
5858
- Enable/disable this feature using one or both of the below setting in the config file
5959

6060
```
61-
spark.destination.autocorrect.missing true|false
62-
spark.destination.autocorrect.mismatch true|false
61+
spark.target.autocorrect.missing true|false
62+
spark.target.autocorrect.mismatch true|false
6363
```
6464

6565
# Migrating specific partition ranges
@@ -83,8 +83,8 @@ This mode is specifically useful to processes a subset of partition-ranges that
8383
- [Counter tables](https://docs.datastax.com/en/dse/6.8/cql/cql/cql_using/useCountersConcept.html)
8484
- Preserve [writetimes](https://docs.datastax.com/en/dse/6.8/cql/cql/cql_reference/cql_commands/cqlSelect.html#cqlSelect__retrieving-the-datetime-a-write-occurred-p) and [TTL](https://docs.datastax.com/en/dse/6.8/cql/cql/cql_reference/cql_commands/cqlSelect.html#cqlSelect__ref-select-ttl-p)
8585
- Advanced DataTypes ([Sets](https://docs.datastax.com/en/dse/6.8/cql/cql/cql_reference/refDataTypes.html#refDataTypes__set), [Lists](https://docs.datastax.com/en/dse/6.8/cql/cql/cql_reference/refDataTypes.html#refDataTypes__list), [Maps](https://docs.datastax.com/en/dse/6.8/cql/cql/cql_reference/refDataTypes.html#refDataTypes__map), [UDTs](https://docs.datastax.com/en/dse/6.8/cql/cql/cql_reference/refDataTypes.html#refDataTypes__udt))
86-
- Filter records from source using writetime
86+
- Filter records from origin using writetime
8787
- SSL Support (including custom cipher algorithms)
88-
- Migrate from any Cassandra source ([Apache Cassandra](https://cassandra.apache.org)/[DataStax Enterprise (DSE)](https://www.datastax.com/products/datastax-enterprise)/[DataStax Astra DB](https://www.datastax.com/products/datastax-astra)) to any Cassandra target ([Apache Cassandra](https://cassandra.apache.org)/[DataStax Enterprise (DSE)](https://www.datastax.com/products/datastax-enterprise)/[DataStax Astra DB](https://www.datastax.com/products/datastax-astra))
88+
- Migrate from any Cassandra origin ([Apache Cassandra](https://cassandra.apache.org)/[DataStax Enterprise (DSE)](https://www.datastax.com/products/datastax-enterprise)/[DataStax Astra DB](https://www.datastax.com/products/datastax-astra)) to any Cassandra target ([Apache Cassandra](https://cassandra.apache.org)/[DataStax Enterprise (DSE)](https://www.datastax.com/products/datastax-enterprise)/[DataStax Astra DB](https://www.datastax.com/products/datastax-astra))
8989
- Validate migration accuracy and performance using a smaller randomized data-set
9090
- Custom writetime

pom.xml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@
33

44
<groupId>datastax.astra.migrate</groupId>
55
<artifactId>cassandra-data-migrator</artifactId>
6-
<version>2.0</version>
6+
<version>2.1</version>
77
<packaging>jar</packaging>
88

99
<properties>

src/main/java/datastax/astra/migrate/AbstractJobSession.java

Lines changed: 25 additions & 25 deletions
Original file line numberDiff line numberDiff line change
@@ -20,57 +20,57 @@ public class AbstractJobSession extends BaseJobSession {
2020

2121
public Logger logger = LoggerFactory.getLogger(this.getClass().getName());
2222

23-
protected AbstractJobSession(CqlSession sourceSession, CqlSession astraSession, SparkConf sparkConf) {
23+
protected AbstractJobSession(CqlSession sourceSession, CqlSession astraSession, SparkConf sc) {
2424
this.sourceSession = sourceSession;
2525
this.astraSession = astraSession;
2626

27-
batchSize = new Integer(sparkConf.get("spark.batchSize", "1"));
28-
printStatsAfter = new Integer(sparkConf.get("spark.printStatsAfter", "100000"));
27+
batchSize = new Integer(Util.getSparkPropOr(sc, "spark.batchSize", "1"));
28+
printStatsAfter = new Integer(Util.getSparkPropOr(sc, "spark.printStatsAfter", "100000"));
2929
if (printStatsAfter < 1) {
3030
printStatsAfter = 100000;
3131
}
3232

33-
readLimiter = RateLimiter.create(new Integer(sparkConf.get("spark.readRateLimit", "20000")));
34-
writeLimiter = RateLimiter.create(new Integer(sparkConf.get("spark.writeRateLimit", "40000")));
35-
maxRetries = Integer.parseInt(sparkConf.get("spark.maxRetries", "10"));
33+
readLimiter = RateLimiter.create(new Integer(Util.getSparkPropOr(sc, "spark.readRateLimit", "20000")));
34+
writeLimiter = RateLimiter.create(new Integer(Util.getSparkPropOr(sc, "spark.writeRateLimit", "40000")));
35+
maxRetries = Integer.parseInt(sc.get("spark.maxRetries", "10"));
3636

37-
sourceKeyspaceTable = sparkConf.get("spark.source.keyspaceTable");
38-
astraKeyspaceTable = sparkConf.get("spark.destination.keyspaceTable");
37+
sourceKeyspaceTable = Util.getSparkProp(sc, "spark.origin.keyspaceTable");
38+
astraKeyspaceTable = Util.getSparkProp(sc, "spark.target.keyspaceTable");
3939

40-
String ttlColsStr = sparkConf.get("spark.query.ttl.cols", "");
40+
String ttlColsStr = Util.getSparkPropOrEmpty(sc, "spark.query.ttl.cols");
4141
if (null != ttlColsStr && ttlColsStr.trim().length() > 0) {
4242
for (String ttlCol : ttlColsStr.split(",")) {
4343
ttlCols.add(Integer.parseInt(ttlCol));
4444
}
4545
}
4646

47-
String writeTimestampColsStr = sparkConf.get("spark.query.writetime.cols", "");
47+
String writeTimestampColsStr = Util.getSparkPropOrEmpty(sc, "spark.query.writetime.cols");
4848
if (null != writeTimestampColsStr && writeTimestampColsStr.trim().length() > 0) {
4949
for (String writeTimeStampCol : writeTimestampColsStr.split(",")) {
5050
writeTimeStampCols.add(Integer.parseInt(writeTimeStampCol));
5151
}
5252
}
5353

5454
writeTimeStampFilter = Boolean
55-
.parseBoolean(sparkConf.get("spark.source.writeTimeStampFilter", "false"));
55+
.parseBoolean(Util.getSparkPropOr(sc, "spark.origin.writeTimeStampFilter", "false"));
5656
// batchsize set to 1 if there is a writeFilter
5757
if (writeTimeStampFilter) {
5858
batchSize = 1;
5959
}
6060

6161
String minWriteTimeStampFilterStr =
62-
sparkConf.get("spark.source.minWriteTimeStampFilter", "0");
62+
Util.getSparkPropOr(sc, "spark.origin.minWriteTimeStampFilter", "0");
6363
if (null != minWriteTimeStampFilterStr && minWriteTimeStampFilterStr.trim().length() > 1) {
6464
minWriteTimeStampFilter = Long.parseLong(minWriteTimeStampFilterStr);
6565
}
6666
String maxWriteTimeStampFilterStr =
67-
sparkConf.get("spark.source.maxWriteTimeStampFilter", "0");
67+
Util.getSparkPropOr(sc, "spark.origin.maxWriteTimeStampFilter", "0");
6868
if (null != maxWriteTimeStampFilterStr && maxWriteTimeStampFilterStr.trim().length() > 1) {
6969
maxWriteTimeStampFilter = Long.parseLong(maxWriteTimeStampFilterStr);
7070
}
7171

7272
String customWriteTimeStr =
73-
sparkConf.get("spark.destination.custom.writeTime", "0");
73+
Util.getSparkPropOr(sc, "spark.target.custom.writeTime", "0");
7474
if (null != customWriteTimeStr && customWriteTimeStr.trim().length() > 1 && StringUtils.isNumeric(customWriteTimeStr.trim())) {
7575
customWritetime = Long.parseLong(customWriteTimeStr);
7676
}
@@ -84,9 +84,9 @@ protected AbstractJobSession(CqlSession sourceSession, CqlSession astraSession,
8484
logger.info("PARAM -- WriteTimestampFilterCols: " + writeTimeStampCols);
8585
logger.info("PARAM -- WriteTimestampFilter: " + writeTimeStampFilter);
8686

87-
String selectCols = sparkConf.get("spark.query.source");
88-
String partionKey = sparkConf.get("spark.query.source.partitionKey");
89-
String sourceSelectCondition = sparkConf.get("spark.query.condition", "");
87+
String selectCols = Util.getSparkProp(sc, "spark.query.origin");
88+
String partionKey = Util.getSparkProp(sc, "spark.query.origin.partitionKey");
89+
String sourceSelectCondition = Util.getSparkPropOrEmpty(sc, "spark.query.condition");
9090

9191
final StringBuilder selectTTLWriteTimeCols = new StringBuilder();
9292
String[] allCols = selectCols.split(",");
@@ -96,16 +96,16 @@ protected AbstractJobSession(CqlSession sourceSession, CqlSession astraSession,
9696
writeTimeStampCols.forEach(col -> {
9797
selectTTLWriteTimeCols.append(",writetime(" + allCols[col] + ")");
9898
});
99-
String fullSelectQuery = "select " + selectCols + selectTTLWriteTimeCols.toString() + " from " + sourceKeyspaceTable + " where token(" + partionKey.trim()
99+
String fullSelectQuery = "select " + selectCols + selectTTLWriteTimeCols + " from " + sourceKeyspaceTable + " where token(" + partionKey.trim()
100100
+ ") >= ? and token(" + partionKey.trim() + ") <= ? " + sourceSelectCondition + " ALLOW FILTERING";
101101
sourceSelectStatement = sourceSession.prepare(fullSelectQuery);
102102
logger.info("PARAM -- Query used: " + fullSelectQuery);
103103

104-
selectColTypes = getTypes(sparkConf.get("spark.query.types"));
105-
String idCols = sparkConf.get("spark.query.destination.id", "");
104+
selectColTypes = getTypes(Util.getSparkProp(sc, "spark.query.types"));
105+
String idCols = Util.getSparkPropOrEmpty(sc, "spark.query.target.id");
106106
idColTypes = selectColTypes.subList(0, idCols.split(",").length);
107107

108-
String insertCols = sparkConf.get("spark.query.destination", "");
108+
String insertCols = Util.getSparkPropOrEmpty(sc, "spark.query.target");
109109
if (null == insertCols || insertCols.trim().isEmpty()) {
110110
insertCols = selectCols;
111111
}
@@ -121,15 +121,15 @@ protected AbstractJobSession(CqlSession sourceSession, CqlSession astraSession,
121121
"select " + insertCols + " from " + astraKeyspaceTable
122122
+ " where " + insertBinds);
123123

124-
hasRandomPartitioner = Boolean.parseBoolean(sparkConf.get("spark.source.hasRandomPartitioner", "false"));
125-
isCounterTable = Boolean.parseBoolean(sparkConf.get("spark.counterTable", "false"));
124+
hasRandomPartitioner = Boolean.parseBoolean(Util.getSparkPropOr(sc, "spark.origin.hasRandomPartitioner", "false"));
125+
isCounterTable = Boolean.parseBoolean(Util.getSparkPropOr(sc, "spark.counterTable", "false"));
126126
if (isCounterTable) {
127-
String updateSelectMappingStr = sparkConf.get("spark.counterTable.cql.index", "0");
127+
String updateSelectMappingStr = Util.getSparkPropOr(sc, "spark.counterTable.cql.index", "0");
128128
for (String updateSelectIndex : updateSelectMappingStr.split(",")) {
129129
updateSelectMapping.add(Integer.parseInt(updateSelectIndex));
130130
}
131131

132-
String counterTableUpdate = sparkConf.get("spark.counterTable.cql");
132+
String counterTableUpdate = Util.getSparkProp(sc, "spark.counterTable.cql");
133133
astraInsertStatement = astraSession.prepare(counterTableUpdate);
134134
} else {
135135
insertBinds = "";

src/main/java/datastax/astra/migrate/CopyJobSession.java

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -20,15 +20,15 @@ public class CopyJobSession extends AbstractJobSession {
2020
protected AtomicLong readCounter = new AtomicLong(0);
2121
protected AtomicLong writeCounter = new AtomicLong(0);
2222

23-
protected CopyJobSession(CqlSession sourceSession, CqlSession astraSession, SparkConf sparkConf) {
24-
super(sourceSession, astraSession, sparkConf);
23+
protected CopyJobSession(CqlSession sourceSession, CqlSession astraSession, SparkConf sc) {
24+
super(sourceSession, astraSession, sc);
2525
}
2626

27-
public static CopyJobSession getInstance(CqlSession sourceSession, CqlSession astraSession, SparkConf sparkConf) {
27+
public static CopyJobSession getInstance(CqlSession sourceSession, CqlSession astraSession, SparkConf sc) {
2828
if (copyJobSession == null) {
2929
synchronized (CopyJobSession.class) {
3030
if (copyJobSession == null) {
31-
copyJobSession = new CopyJobSession(sourceSession, astraSession, sparkConf);
31+
copyJobSession = new CopyJobSession(sourceSession, astraSession, sc);
3232
}
3333
}
3434
}

src/main/java/datastax/astra/migrate/DiffJobSession.java

Lines changed: 9 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -31,13 +31,13 @@ public class DiffJobSession extends CopyJobSession {
3131
private AtomicLong validCounter = new AtomicLong(0);
3232
private AtomicLong skippedCounter = new AtomicLong(0);
3333

34-
private DiffJobSession(CqlSession sourceSession, CqlSession astraSession, SparkConf sparkConf) {
35-
super(sourceSession, astraSession, sparkConf);
34+
private DiffJobSession(CqlSession sourceSession, CqlSession astraSession, SparkConf sc) {
35+
super(sourceSession, astraSession, sc);
3636

37-
autoCorrectMissing = Boolean.parseBoolean(sparkConf.get("spark.destination.autocorrect.missing", "false"));
37+
autoCorrectMissing = Boolean.parseBoolean(Util.getSparkPropOr(sc, "spark.target.autocorrect.missing", "false"));
3838
logger.info("PARAM -- Autocorrect Missing: " + autoCorrectMissing);
3939

40-
autoCorrectMismatch = Boolean.parseBoolean(sparkConf.get("spark.destination.autocorrect.mismatch", "false"));
40+
autoCorrectMismatch = Boolean.parseBoolean(Util.getSparkPropOr(sc, "spark.target.autocorrect.mismatch", "false"));
4141
logger.info("PARAM -- Autocorrect Mismatch: " + autoCorrectMismatch);
4242
}
4343

@@ -130,13 +130,13 @@ public void printCounts(String finalStr) {
130130
private void diff(Row sourceRow, Row astraRow) {
131131
if (astraRow == null) {
132132
missingCounter.incrementAndGet();
133-
logger.error("Data is missing in Astra: " + getKey(sourceRow));
133+
logger.error("Missing target row found for key: " + getKey(sourceRow));
134134
//correct data
135135

136136
if (autoCorrectMissing) {
137137
astraSession.execute(bindInsert(astraInsertStatement, sourceRow, null));
138138
correctedMissingCounter.incrementAndGet();
139-
logger.error("Corrected missing data in Astra: " + getKey(sourceRow));
139+
logger.error("Inserted missing row in target: " + getKey(sourceRow));
140140
}
141141

142142
return;
@@ -145,7 +145,7 @@ private void diff(Row sourceRow, Row astraRow) {
145145
String diffData = isDifferent(sourceRow, astraRow);
146146
if (!diffData.isEmpty()) {
147147
mismatchCounter.incrementAndGet();
148-
logger.error("Data mismatch found - Key: " + getKey(sourceRow) + " Data: " + diffData);
148+
logger.error("Mismatch row found for key: " + getKey(sourceRow) + " Mismatch: " + diffData);
149149

150150
if (autoCorrectMismatch) {
151151
if (isCounterTable) {
@@ -154,7 +154,7 @@ private void diff(Row sourceRow, Row astraRow) {
154154
astraSession.execute(bindInsert(astraInsertStatement, sourceRow, null));
155155
}
156156
correctedMismatchCounter.incrementAndGet();
157-
logger.error("Corrected mismatch data in Astra: " + getKey(sourceRow));
157+
logger.error("Updated mismatch row in target: " + getKey(sourceRow));
158158
}
159159

160160
return;
@@ -172,7 +172,7 @@ private String isDifferent(Row sourceRow, Row astraRow) {
172172

173173
boolean isDiff = dataType.diff(source, astra);
174174
if (isDiff) {
175-
diffData.append(" (Index: " + index + " Source: " + source + " Astra: " + astra + " ) ");
175+
diffData.append("(Index: " + index + " Origin: " + source + " Target: " + astra + " ) ");
176176
}
177177
});
178178

Lines changed: 31 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,31 @@
1+
package datastax.astra.migrate;
2+
3+
import org.apache.spark.SparkConf;
4+
5+
import java.util.NoSuchElementException;
6+
7+
public class Util {
8+
9+
public static String getSparkProp(SparkConf sc, String prop) {
10+
try {
11+
return sc.get(prop);
12+
} catch (NoSuchElementException nse) {
13+
String newProp = prop.replace("origin", "source").replace("target", "destination");
14+
return sc.get(newProp);
15+
}
16+
}
17+
18+
public static String getSparkPropOr(SparkConf sc, String prop, String defaultVal) {
19+
try {
20+
return sc.get(prop);
21+
} catch (NoSuchElementException nse) {
22+
String newProp = prop.replace("origin", "source").replace("target", "destination");
23+
return sc.get(newProp, defaultVal);
24+
}
25+
}
26+
27+
public static String getSparkPropOrEmpty(SparkConf sc, String prop) {
28+
return getSparkPropOr(sc, prop, "");
29+
}
30+
31+
}

src/main/scala/datastax/astra/migrate/AbstractJob.scala

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -26,7 +26,7 @@ class AbstractJob extends BaseJob {
2626
if ("true".equals(isAstra)) {
2727
abstractLogger.info(connType + ": Connected to Astra using SCB: " + scbPath);
2828

29-
return CassandraConnector(sc.getConf
29+
return CassandraConnector(sc
3030
.set("spark.cassandra.auth.username", username)
3131
.set("spark.cassandra.auth.password", password)
3232
.set("spark.cassandra.input.consistency.level", readConsistencyLevel)
@@ -40,7 +40,7 @@ class AbstractJob extends BaseJob {
4040
enabledAlgorithmsVar = "TLS_RSA_WITH_AES_128_CBC_SHA, TLS_RSA_WITH_AES_256_CBC_SHA"
4141
}
4242

43-
return CassandraConnector(sc.getConf
43+
return CassandraConnector(sc
4444
.set("spark.cassandra.auth.username", username)
4545
.set("spark.cassandra.auth.password", password)
4646
.set("spark.cassandra.input.consistency.level", readConsistencyLevel)
@@ -57,7 +57,7 @@ class AbstractJob extends BaseJob {
5757
} else {
5858
abstractLogger.info(connType + ": Connected to Cassandra (or DSE) host: " + host);
5959

60-
return CassandraConnector(sc.getConf.set("spark.cassandra.auth.username", username)
60+
return CassandraConnector(sc.set("spark.cassandra.auth.username", username)
6161
.set("spark.cassandra.auth.password", password)
6262
.set("spark.cassandra.input.consistency.level", readConsistencyLevel)
6363
.set("spark.cassandra.connection.host", host))

0 commit comments

Comments
 (0)