Skip to content

Commit 27af3fe

Browse files
authored
Merge pull request #289 from datastax/feature/transform-fields
Feature/transform fields
2 parents 49d7bb4 + d7a543b commit 27af3fe

File tree

17 files changed

+792
-312
lines changed

17 files changed

+792
-312
lines changed

Dockerfile

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -9,9 +9,9 @@ RUN mkdir -p /assets/ && cd /assets && \
99
curl -OL https://downloads.datastax.com/enterprise/cqlsh-astra.tar.gz && \
1010
tar -xzf ./cqlsh-astra.tar.gz && \
1111
rm ./cqlsh-astra.tar.gz && \
12-
curl -OL https://archive.apache.org/dist/spark/spark-3.5.1/spark-3.5.1-bin-hadoop3-scala2.13.tgz && \
13-
tar -xzf ./spark-3.5.1-bin-hadoop3-scala2.13.tgz && \
14-
rm ./spark-3.5.1-bin-hadoop3-scala2.13.tgz
12+
curl -OL https://archive.apache.org/dist/spark/spark-3.5.2/spark-3.5.2-bin-hadoop3-scala2.13.tgz && \
13+
tar -xzf ./spark-3.5.2-bin-hadoop3-scala2.13.tgz && \
14+
rm ./spark-3.5.2-bin-hadoop3-scala2.13.tgz
1515

1616
RUN apt-get update && apt-get install -y openssh-server vim python3 --no-install-recommends && \
1717
rm -rf /var/lib/apt/lists/* && \
@@ -44,7 +44,7 @@ RUN chmod +x ./get-latest-maven-version.sh && \
4444
rm -rf "$USER_HOME_DIR/.m2"
4545

4646
# Add all migration tools to path
47-
ENV PATH="${PATH}:/assets/dsbulk/bin/:/assets/cqlsh-astra/bin/:/assets/spark-3.5.1-bin-hadoop3-scala2.13/bin/"
47+
ENV PATH="${PATH}:/assets/dsbulk/bin/:/assets/cqlsh-astra/bin/:/assets/spark-3.5.2-bin-hadoop3-scala2.13/bin/"
4848

4949
EXPOSE 22
5050

README.md

Lines changed: 5 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@
77

88
Migrate and Validate Tables between Origin and Target Cassandra Clusters.
99

10-
> :warning: Please note this job has been tested with spark version [3.5.1](https://archive.apache.org/dist/spark/spark-3.5.1/)
10+
> :warning: Please note this job has been tested with spark version [3.5.2](https://archive.apache.org/dist/spark/spark-3.5.2/)
1111
1212
## Install as a Container
1313
- Get the latest image that includes all dependencies from [DockerHub](https://hub.docker.com/r/datastax/cassandra-data-migrator)
@@ -18,10 +18,10 @@ Migrate and Validate Tables between Origin and Target Cassandra Clusters.
1818

1919
### Prerequisite
2020
- Install **Java11** (minimum) as Spark binaries are compiled with it.
21-
- Install Spark version [`3.5.1`](https://archive.apache.org/dist/spark/spark-3.5.1/spark-3.5.1-bin-hadoop3-scala2.13.tgz) on a single VM (no cluster necessary) where you want to run this job. Spark can be installed by running the following: -
21+
- Install Spark version [`3.5.2`](https://archive.apache.org/dist/spark/spark-3.5.2/spark-3.5.2-bin-hadoop3-scala2.13.tgz) on a single VM (no cluster necessary) where you want to run this job. Spark can be installed by running the following: -
2222
```
23-
wget https://archive.apache.org/dist/spark/spark-3.5.1/spark-3.5.1-bin-hadoop3-scala2.13.tgz
24-
tar -xvzf spark-3.5.1-bin-hadoop3-scala2.13.tgz
23+
wget https://archive.apache.org/dist/spark/spark-3.5.2/spark-3.5.2-bin-hadoop3-scala2.13.tgz
24+
tar -xvzf spark-3.5.2-bin-hadoop3-scala2.13.tgz
2525
```
2626

2727
> :warning: If the above Spark and Scala version is not properly installed, you'll then see a similar exception like below when running the CDM jobs,
@@ -123,6 +123,7 @@ Note:
123123
- Perform guardrail checks (identify large fields)
124124
- Supports adding `constants` as new columns on `Target`
125125
- Supports expanding `Map` columns on `Origin` into multiple records on `Target`
126+
- Supports extracting value from a JSON column in `Origin` and map it to a specific field on `Target`
126127
- Fully containerized (Docker and K8s friendly)
127128
- SSL Support (including custom cipher algorithms)
128129
- Migrate from any Cassandra `Origin` ([Apache Cassandra®](https://cassandra.apache.org) / [DataStax Enterprise™](https://www.datastax.com/products/datastax-enterprise) / [DataStax Astra DB™](https://www.datastax.com/products/datastax-astra)) to any Cassandra `Target` ([Apache Cassandra®](https://cassandra.apache.org) / [DataStax Enterprise™](https://www.datastax.com/products/datastax-enterprise) / [DataStax Astra DB™](https://www.datastax.com/products/datastax-astra))

RELEASE.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,8 @@
11
# Release Notes
2+
## [4.3.5] - 2024-08-23
3+
- Added feature `spark.cdm.feature.extractJson` which allows you to extract a json value from a column with json content in an Origin table and map it to a column in the Target table.
4+
- Upgraded to use Spark `3.5.2`.
5+
26
## [4.3.4] - 2024-07-31
37
- Use `spark.cdm.schema.origin.keyspaceTable` when `spark.cdm.schema.target.keyspaceTable` is missing. Fixes [bug introduced in prior version](https://github.com/datastax/cassandra-data-migrator/issues/284).
48

pom.xml

Lines changed: 16 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,6 @@
1-
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
1+
<project xmlns="http://maven.apache.org/POM/4.0.0"
2+
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
3+
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
24
<modelVersion>4.0.0</modelVersion>
35

46
<groupId>datastax.cdm</groupId>
@@ -10,9 +12,9 @@
1012
<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
1113
<scala.version>2.13.14</scala.version>
1214
<scala.main.version>2.13</scala.main.version>
13-
<spark.version>3.5.1</spark.version>
15+
<spark.version>3.5.2</spark.version>
1416
<connector.version>3.5.1</connector.version>
15-
<cassandra.version>5.0-beta1</cassandra.version>
17+
<cassandra.version>5.0-rc1</cassandra.version>
1618
<junit.version>5.9.1</junit.version>
1719
<mockito.version>4.11.0</mockito.version>
1820
<java-driver.version>4.18.1</java-driver.version>
@@ -102,6 +104,11 @@
102104
<artifactId>java-driver-query-builder</artifactId>
103105
<version>${java-driver.version}</version>
104106
</dependency>
107+
<dependency>
108+
<groupId>com.fasterxml.jackson.core</groupId>
109+
<artifactId>jackson-databind</artifactId>
110+
<version>2.15.2</version>
111+
</dependency>
105112

106113
<dependency>
107114
<groupId>org.apache.logging.log4j</groupId>
@@ -123,6 +130,12 @@
123130
<groupId>com.esri.geometry</groupId>
124131
<artifactId>esri-geometry-api</artifactId>
125132
<version>2.2.4</version>
133+
<exclusions>
134+
<exclusion>
135+
<groupId>com.fasterxml.jackson.core</groupId>
136+
<artifactId>jackson-core</artifactId>
137+
</exclusion>
138+
</exclusions>
126139
</dependency>
127140

128141
<!-- Test Dependencies -->

src/main/java/com/datastax/cdm/cql/statement/TargetInsertStatement.java

Lines changed: 20 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -15,18 +15,19 @@
1515
*/
1616
package com.datastax.cdm.cql.statement;
1717

18+
import java.time.Duration;
19+
import java.util.ArrayList;
20+
import java.util.List;
21+
22+
import org.slf4j.Logger;
23+
import org.slf4j.LoggerFactory;
24+
1825
import com.datastax.cdm.cql.EnhancedSession;
1926
import com.datastax.cdm.properties.IPropertyHelper;
2027
import com.datastax.cdm.properties.KnownProperties;
2128
import com.datastax.cdm.properties.PropertyHelper;
2229
import com.datastax.oss.driver.api.core.cql.BoundStatement;
2330
import com.datastax.oss.driver.api.core.cql.Row;
24-
import org.slf4j.Logger;
25-
import org.slf4j.LoggerFactory;
26-
27-
import java.util.ArrayList;
28-
import java.util.List;
29-
import java.time.Duration;
3031

3132
public class TargetInsertStatement extends TargetUpsertStatement {
3233
public final Logger logger = LoggerFactory.getLogger(this.getClass().getName());
@@ -61,22 +62,27 @@ protected BoundStatement bind(Row originRow, Row targetRow, Integer ttl, Long wr
6162
try {
6263
if (targetIndex== explodeMapKeyIndex) {
6364
bindValue = explodeMapKey;
64-
}
65-
else if (targetIndex== explodeMapValueIndex) {
65+
} else if (targetIndex== explodeMapValueIndex) {
6666
bindValue = explodeMapValue;
67-
}
68-
else {
67+
} else if (targetIndex == extractJsonFeature.getTargetColumnIndex()) {
68+
int originIndex = extractJsonFeature.getOriginColumnIndex();
69+
bindValue = extractJsonFeature.extract(originRow.getString(originIndex));
70+
} else {
6971
int originIndex = cqlTable.getCorrespondingIndex(targetIndex);
7072
if (originIndex < 0) // we don't have data to bind for this column; continue to the next targetIndex
7173
continue;
7274
bindValue = cqlTable.getOtherCqlTable().getAndConvertData(originIndex, originRow);
7375
}
7476

7577
boundStatement = boundStatement.set(currentBindIndex++, bindValue, cqlTable.getBindClass(targetIndex));
76-
}
77-
catch (Exception e) {
78-
logger.error("Error trying to bind value:" + bindValue + " of class:" +(null==bindValue?"unknown":bindValue.getClass().getName())+ " to column:" + targetColumnNames.get(targetIndex) + " of targetDataType:" + targetColumnTypes.get(targetIndex)+ "/" + cqlTable.getBindClass(targetIndex).getName() + " at column index:" + targetIndex + " and bind index: "+ (currentBindIndex-1) + " of statement:" + this.getCQL());
79-
throw e;
78+
} catch (Exception e) {
79+
logger.error(
80+
"Error trying to bind value: {} of class: {} to column: {} of targetDataType: {}/{} at column index: {} and bind index: {} of statement: {}",
81+
bindValue, (null == bindValue ? "unknown" : bindValue.getClass().getName()),
82+
targetColumnNames.get(targetIndex), targetColumnTypes.get(targetIndex),
83+
cqlTable.getBindClass(targetIndex).getName(), targetIndex, (currentBindIndex - 1),
84+
this.getCQL());
85+
throw new RuntimeException("Error trying to bind value: ", e);
8086
}
8187
}
8288

src/main/java/com/datastax/cdm/cql/statement/TargetUpdateStatement.java

Lines changed: 5 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -74,9 +74,11 @@ protected BoundStatement bind(Row originRow, Row targetRow, Integer ttl, Long wr
7474
}
7575
else if (targetIndex== explodeMapKeyIndex) {
7676
bindValueTarget = explodeMapKey;
77-
}
78-
else if (targetIndex== explodeMapValueIndex) {
77+
} else if (targetIndex== explodeMapValueIndex) {
7978
bindValueTarget = explodeMapValue;
79+
} else if (targetIndex == extractJsonFeature.getTargetColumnIndex()) {
80+
originIndex = extractJsonFeature.getOriginColumnIndex();
81+
bindValueTarget = extractJsonFeature.extract(originRow.getString(originIndex));
8082
} else {
8183
if (originIndex < 0)
8284
// we don't have data to bind for this column; continue to the next targetIndex
@@ -89,7 +91,7 @@ else if (targetIndex== explodeMapValueIndex) {
8991
logger.error("Error trying to bind value:" + bindValueTarget + " to column:" +
9092
targetColumnNames.get(targetIndex) + " of targetDataType:" + targetColumnTypes.get(targetIndex) + "/"
9193
+ cqlTable.getBindClass(targetIndex).getName() + " at column index:" + targetIndex);
92-
throw e;
94+
throw new RuntimeException("Error trying to bind value: ", e);
9395
}
9496
}
9597

src/main/java/com/datastax/cdm/cql/statement/TargetUpsertStatement.java

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -20,6 +20,7 @@
2020
import com.datastax.cdm.data.Record;
2121
import com.datastax.cdm.feature.ConstantColumns;
2222
import com.datastax.cdm.feature.ExplodeMap;
23+
import com.datastax.cdm.feature.ExtractJson;
2324
import com.datastax.cdm.feature.Featureset;
2425
import com.datastax.cdm.feature.WritetimeTTL;
2526
import com.datastax.cdm.properties.IPropertyHelper;
@@ -53,6 +54,8 @@ public abstract class TargetUpsertStatement extends BaseCdmStatement {
5354
protected int explodeMapValueIndex = -1;
5455
private Boolean haveCheckedBindInputsOnce = false;
5556

57+
protected ExtractJson extractJsonFeature;
58+
5659
protected abstract String buildStatement();
5760
protected abstract BoundStatement bind(Row originRow, Row targetRow, Integer ttl, Long writeTime, Object explodeMapKey, Object explodeMapValue);
5861

@@ -61,6 +64,7 @@ public TargetUpsertStatement(IPropertyHelper propertyHelper, EnhancedSession ses
6164

6265
constantColumnFeature = (ConstantColumns) cqlTable.getFeature(Featureset.CONSTANT_COLUMNS);
6366
explodeMapFeature = (ExplodeMap) cqlTable.getFeature(Featureset.EXPLODE_MAP);
67+
extractJsonFeature = (ExtractJson) cqlTable.getFeature(Featureset.EXTRACT_JSON);
6468

6569
setTTLAndWriteTimeBooleans();
6670
targetColumnNames.addAll(cqlTable.getColumnNames(true));
Lines changed: 153 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,153 @@
1+
/*
2+
* Copyright DataStax, Inc.
3+
*
4+
* Licensed under the Apache License, Version 2.0 (the "License");
5+
* you may not use this file except in compliance with the License.
6+
* You may obtain a copy of the License at
7+
*
8+
* http://www.apache.org/licenses/LICENSE-2.0
9+
*
10+
* Unless required by applicable law or agreed to in writing, software
11+
* distributed under the License is distributed on an "AS IS" BASIS,
12+
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
13+
* See the License for the specific language governing permissions and
14+
* limitations under the License.
15+
*/
16+
package com.datastax.cdm.feature;
17+
18+
import java.util.Collections;
19+
import java.util.List;
20+
import java.util.Map;
21+
22+
import org.apache.commons.lang3.StringUtils;
23+
import org.slf4j.Logger;
24+
import org.slf4j.LoggerFactory;
25+
26+
import com.datastax.cdm.properties.IPropertyHelper;
27+
import com.datastax.cdm.properties.KnownProperties;
28+
import com.datastax.cdm.schema.CqlTable;
29+
import com.fasterxml.jackson.core.JsonProcessingException;
30+
import com.fasterxml.jackson.databind.JsonMappingException;
31+
import com.fasterxml.jackson.databind.ObjectMapper;
32+
33+
public class ExtractJson extends AbstractFeature {
34+
public Logger logger = LoggerFactory.getLogger(this.getClass().getName());
35+
private ObjectMapper mapper = new ObjectMapper();
36+
37+
private String originColumnName = "";
38+
private String originJsonFieldName = "";
39+
private Integer originColumnIndex = -1;
40+
41+
private String targetColumnName = "";
42+
private Integer targetColumnIndex = -1;
43+
44+
@Override
45+
public boolean loadProperties(IPropertyHelper helper) {
46+
if (null == helper) {
47+
throw new IllegalArgumentException("helper is null");
48+
}
49+
50+
originColumnName = getColumnName(helper, KnownProperties.EXTRACT_JSON_ORIGIN_COLUMN_NAME);
51+
targetColumnName = getColumnName(helper, KnownProperties.EXTRACT_JSON_TARGET_COLUMN_MAPPING);
52+
// Convert columnToFieldMapping to targetColumnName and originJsonFieldName
53+
if (!targetColumnName.isBlank()) {
54+
String[] parts = targetColumnName.split("\\:");
55+
if (parts.length == 2) {
56+
originJsonFieldName = parts[0];
57+
targetColumnName = parts[1];
58+
} else {
59+
originJsonFieldName = targetColumnName;
60+
}
61+
}
62+
63+
isValid = validateProperties();
64+
isEnabled = isValid && !originColumnName.isEmpty() && !targetColumnName.isEmpty();
65+
isLoaded = true;
66+
67+
return isLoaded && isValid;
68+
}
69+
70+
@Override
71+
protected boolean validateProperties() {
72+
if (StringUtils.isBlank(originColumnName) && StringUtils.isBlank(targetColumnName))
73+
return true;
74+
75+
if (StringUtils.isBlank(originColumnName)) {
76+
logger.error("Origin column name is not set when Target ({}) is set", targetColumnName);
77+
return false;
78+
}
79+
80+
if (StringUtils.isBlank(targetColumnName)) {
81+
logger.error("Target column name is not set when Origin ({}) is set", originColumnName);
82+
return false;
83+
}
84+
85+
return true;
86+
}
87+
88+
@Override
89+
public boolean initializeAndValidate(CqlTable originTable, CqlTable targetTable) {
90+
if (null == originTable || null == targetTable) {
91+
throw new IllegalArgumentException("Origin table and/or Target table is null");
92+
}
93+
if (!originTable.isOrigin()) {
94+
throw new IllegalArgumentException(originTable.getKeyspaceTable() + " is not an origin table");
95+
}
96+
if (targetTable.isOrigin()) {
97+
throw new IllegalArgumentException(targetTable.getKeyspaceTable() + " is not a target table");
98+
}
99+
100+
if (!validateProperties()) {
101+
isEnabled = false;
102+
return false;
103+
}
104+
if (!isEnabled)
105+
return true;
106+
107+
// Initialize Origin variables
108+
List<Class> originBindClasses = originTable.extendColumns(Collections.singletonList(originColumnName));
109+
if (null == originBindClasses || originBindClasses.size() != 1 || null == originBindClasses.get(0)) {
110+
throw new IllegalArgumentException("Origin column " + originColumnName
111+
+ " is not found on the origin table " + originTable.getKeyspaceTable());
112+
} else {
113+
this.originColumnIndex = originTable.indexOf(originColumnName);
114+
}
115+
116+
// Initialize Target variables
117+
List<Class> targetBindClasses = targetTable.extendColumns(Collections.singletonList(targetColumnName));
118+
if (null == targetBindClasses || targetBindClasses.size() != 1 || null == targetBindClasses.get(0)) {
119+
throw new IllegalArgumentException("Target column " + targetColumnName
120+
+ " is not found on the target table " + targetTable.getKeyspaceTable());
121+
} else {
122+
this.targetColumnIndex = targetTable.indexOf(targetColumnName);
123+
}
124+
125+
logger.info("Feature {} is {}", this.getClass().getSimpleName(), isEnabled ? "enabled" : "disabled");
126+
return true;
127+
}
128+
129+
public Object extract(String jsonString) throws JsonMappingException, JsonProcessingException {
130+
if (StringUtils.isNotBlank(jsonString)) {
131+
return mapper.readValue(jsonString, Map.class).get(originJsonFieldName);
132+
}
133+
134+
return null;
135+
}
136+
137+
public Integer getOriginColumnIndex() {
138+
return isEnabled ? originColumnIndex : -1;
139+
}
140+
141+
public Integer getTargetColumnIndex() {
142+
return isEnabled ? targetColumnIndex : -1;
143+
}
144+
145+
public String getTargetColumnName() {
146+
return isEnabled ? targetColumnName : "";
147+
}
148+
149+
private String getColumnName(IPropertyHelper helper, String colName) {
150+
String columnName = CqlTable.unFormatName(helper.getString(colName));
151+
return (null == columnName) ? "" : columnName;
152+
}
153+
}

src/main/java/com/datastax/cdm/feature/FeatureFactory.java

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -21,6 +21,7 @@ public static Feature getFeature(Featureset feature) {
2121
case ORIGIN_FILTER: return new OriginFilterCondition();
2222
case CONSTANT_COLUMNS: return new ConstantColumns();
2323
case EXPLODE_MAP: return new ExplodeMap();
24+
case EXTRACT_JSON: return new ExtractJson();
2425
case WRITETIME_TTL: return new WritetimeTTL();
2526
case GUARDRAIL_CHECK: return new Guardrail();
2627
default:

src/main/java/com/datastax/cdm/feature/Featureset.java

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -19,6 +19,7 @@ public enum Featureset {
1919
ORIGIN_FILTER,
2020
CONSTANT_COLUMNS,
2121
EXPLODE_MAP,
22+
EXTRACT_JSON,
2223
WRITETIME_TTL,
2324
GUARDRAIL_CHECK,
2425
TEST_UNIMPLEMENTED_FEATURE

0 commit comments

Comments
 (0)