Skip to content

Commit 0f12303

Browse files
authored
Upgrade constant column feature to support remove & replace column and also resolve any bugs (#268)
* Fixed the mapping issue for the constant-column feature with new PK columns addeds as part of constant column. Also fixed the associated SIT tests to test this feature correctly. * More refactor to streamline constant-column changes * SIT tests for remove and replace functions of constant-column feature * Fix SIT tests
1 parent 18f26a3 commit 0f12303

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

41 files changed

+812
-535
lines changed

.classpath

Lines changed: 3 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -36,18 +36,15 @@
3636
<attribute name="optional" value="true"/>
3737
</attributes>
3838
</classpathentry>
39-
<classpathentry kind="src" output="target/test-classes" path="target/generated-test-sources/test-annotations">
39+
<classpathentry kind="src" path="target/generated-sources/annotations">
4040
<attributes>
4141
<attribute name="optional" value="true"/>
42-
<attribute name="test" value="true"/>
43-
<attribute name="maven.pomderived" value="true"/>
44-
<attribute name="ignore_optional_problems" value="true"/>
45-
<attribute name="m2e-apt" value="true"/>
4642
</attributes>
4743
</classpathentry>
48-
<classpathentry kind="src" path="target/generated-sources/annotations">
44+
<classpathentry kind="src" output="target/test-classes" path="target/generated-test-sources/test-annotations">
4945
<attributes>
5046
<attribute name="optional" value="true"/>
47+
<attribute name="test" value="true"/>
5148
</attributes>
5249
</classpathentry>
5350
<classpathentry kind="output" path="target/classes"/>

README.md

Lines changed: 7 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@
33
![GitHub release (with filter)](https://img.shields.io/github/v/release/datastax/cassandra-data-migrator?label=latest%20release&color=green&link=!%5BGitHub%20release%20(with%20filter)%5D(https%3A%2F%2Fimg.shields.io%2Fgithub%2Fv%2Frelease%2Fdatastax%2Fcassandra-data-migrator%3Flabel%3Dlatest%2520release%26color%3Dgreen))
44
![Docker Pulls](https://img.shields.io/docker/pulls/datastax/cassandra-data-migrator)
55

6-
# cassandra-data-migrator
6+
# cassandra-data-migrator (also known as CDM)
77

88
Migrate and Validate Tables between Origin and Target Cassandra Clusters.
99

@@ -152,8 +152,12 @@ If `spark.cdm.tokenrange.partitionFile.input` or `spark.cdm.tokenrange.partition
152152
- Supports adding custom fixed `writetime`
153153
- Validation - Log partitions range level exceptions, use the exceptions file as input for rerun
154154

155-
# Known Limitations
156-
- This tool does not migrate `ttl` & `writetime` at the field-level (for optimization reasons). It instead finds the field with the highest `ttl` & the field with the highest `writetime` within an `origin` row and uses those values on the entire `target` row.
155+
# Things to know
156+
- CDM does not migrate `ttl` & `writetime` at the field-level (for optimization reasons). It instead finds the field with the highest `ttl` & the field with the highest `writetime` within an `origin` row and uses those values on the entire `target` row.
157+
- CDM ignores `ttl` & `writetime` on collection and UDT fields while computing the highest value
158+
- If a table has only collection and/or UDT non-key columns and not table-level `ttl` configuration, the target will have no `ttl`, which can lead to inconsistencies between `origin` and `target` as rows expire on `origin` due to `ttl` expiry.
159+
- If a table has only collection and/or UDT non-key columns, the `writetime` used on target will be time the job was run. Alternatively if needed, the param `spark.cdm.transform.custom.writetime` can be used to set a static custom value for `writetime`.
160+
- When CDM migration (or validation with autocorrect) is run multiple times on the same table (for whatever reasons), it could lead to duplicate entries in `list` type columns. Note this is [due to a Cassandra/DSE bug](https://issues.apache.org/jira/browse/CASSANDRA-11368) and not a CDM issue. This issue can be addressed by enabling and setting a positive value for `spark.cdm.transform.custom.writetime.incrementBy` param. This param was specifically added to address this issue.
157161

158162
# Building Jar for local development
159163
1. Clone this repo

RELEASE.md

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,9 @@
11
# Release Notes
2+
## [4.2.0] - 2024-07-09
3+
- Upgraded `constant-column` feature to support `replace` and `remove` of constant columns
4+
- Fixed `constant-column` feature to support any data-types within the PK columns
5+
- Added `Things to know` in docs
6+
27
## [4.1.16] - 2024-05-31
38
- Added property to manage null values in Map fields
49
- Allow separate input and output partition CSV files

SIT/features/01_constant_column/breakData.cql

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -12,8 +12,8 @@
1212
limitations under the License.
1313
*/
1414

15-
DELETE FROM target.feature_constant_column WHERE key='key2' AND const1='abcd';
16-
UPDATE target.feature_constant_column SET value='value999' WHERE key='key3' AND const1='abcd';
15+
DELETE FROM target.feature_constant_column WHERE key='key2' AND const1=1;
16+
UPDATE target.feature_constant_column SET value='value999' WHERE key='key3' AND const1=1;
1717

1818
# This upsert to origin will update the writetime on origin to be newer than target
1919
INSERT INTO origin.feature_constant_column(key,value) VALUES ('key1','valueA');
Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,8 @@
11

22
const1 | key | const2 | value
33
--------+------+--------+--------
4-
abcd | key1 | 1234 | valueA
5-
abcd | key2 | 1234 | valueB
6-
abcd | key3 | 1234 | valueC
4+
1 | key1 | 1234 | valueA
5+
1 | key2 | 1234 | valueB
6+
1 | key3 | 1234 | valueC
77

88
(3 rows)

SIT/features/01_constant_column/fix.properties

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -20,7 +20,7 @@ spark.cdm.schema.target.keyspaceTable target.feature_constant_column
2020
spark.cdm.perfops.numParts 1
2121

2222
spark.cdm.feature.constantColumns.names const1,const2
23-
spark.cdm.feature.constantColumns.values 'abcd',1234
23+
spark.cdm.feature.constantColumns.values 1,1234
2424

2525
spark.cdm.autocorrect.missing true
2626
spark.cdm.autocorrect.mismatch true

SIT/features/01_constant_column/migrate.properties

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -20,5 +20,5 @@ spark.cdm.schema.target.keyspaceTable target.feature_constant_column
2020
spark.cdm.perfops.numParts 1
2121

2222
spark.cdm.feature.constantColumns.names const1,const2
23-
spark.cdm.feature.constantColumns.values 'abcd',1234
23+
spark.cdm.feature.constantColumns.values 1,1234
2424

SIT/features/01_constant_column/setup.cql

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -19,4 +19,4 @@ INSERT INTO origin.feature_constant_column(key,value) VALUES ('key2','valueB');
1919
INSERT INTO origin.feature_constant_column(key,value) VALUES ('key3','valueC');
2020

2121
DROP TABLE IF EXISTS target.feature_constant_column;
22-
CREATE TABLE target.feature_constant_column(const1 text, key text, value text, const2 int, PRIMARY KEY (const1, key));
22+
CREATE TABLE target.feature_constant_column(const1 int, key text, value text, const2 int, PRIMARY KEY (const1, key));
Lines changed: 23 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,23 @@
1+
/*
2+
Licensed under the Apache License, Version 2.0 (the "License"); you
3+
may not use this file except in compliance with the License.
4+
You may obtain a copy of the License at
5+
6+
http://www.apache.org/licenses/LICENSE-2.0
7+
8+
Unless required by applicable law or agreed to in writing, software
9+
distributed under the License is distributed on an "AS IS" BASIS,
10+
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
11+
See the License for the specific language governing permissions and
12+
limitations under the License.
13+
*/
14+
15+
DELETE FROM target.feature_constant_column_remove WHERE key='key2';
16+
UPDATE target.feature_constant_column_remove SET value='value999' WHERE key='key3';
17+
18+
# This upsert to origin will update the writetime on origin to be newer than target
19+
INSERT INTO origin.feature_constant_column_remove(const1, key, value, const2) VALUES (1, 'key1','valueA', 21);
20+
INSERT INTO origin.feature_constant_column_remove(const1, key, value, const2) VALUES (1, 'key2','valueB', 22);
21+
INSERT INTO origin.feature_constant_column_remove(const1, key, value, const2) VALUES (1, 'key3','valueC', 23);
22+
23+
SELECT * FROM target.feature_constant_column_remove;
Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
Read Record Count: 3
2+
Mismatch Record Count: 1
3+
Corrected Mismatch Record Count: 1
4+
Missing Record Count: 1
5+
Corrected Missing Record Count: 1
6+
Valid Record Count: 1
7+
Skipped Record Count: 0

0 commit comments

Comments
 (0)