You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
* Made Partition into its own class & refactored stuff to make that work
* Made CounterUnit its own class & refactored JobCounter to work with it.
* Made JobType (Migrate, Validate & Guardrail) independent of track-run feature and renamed slices/partitions to PartitionRanges. Also provided actual jobs access to PartitionRange class.
* Refactored code to be Spark Native, fixed metrics reporting & also improved trackRun feature.
* Updated readme
* Fixed metrics issue when trackRun was disabled.
- If a table has only collection and/or UDT non-key columns, the `writetime` used on target will be time the job was run. If you want to avoid this, we recommend setting `spark.cdm.schema.ttlwritetime.calc.useCollections` param to `true` in such scenarios.
161
161
- When CDM migration (or validation with autocorrect) is run multiple times on the same table (for whatever reasons), it could lead to duplicate entries in `list` type columns. Note this is [due to a Cassandra/DSE bug](https://issues.apache.org/jira/browse/CASSANDRA-11368) and not a CDM issue. This issue can be addressed by enabling and setting a positive value for `spark.cdm.transform.custom.writetime.incrementBy` param. This param was specifically added to address this issue.
162
162
- When you rerun job to resume from a previous run, the run metrics (read, write, skipped, etc.) captured in table `cdm_run_info` will be only for the current run. If the previous run was killed for some reasons, its run metrics may not have been saved. If the previous run did complete (not killed) but with errors, then you will have all run metrics from previous run as well.
163
-
-The Spark Cluster based deployment currently has a bug. It reports '0' for all count metrics, while doing underlying tasks (Migration, Validation, etc.). We are working to address this in the upcoming releases. Also note that this issue is only with the Spark cluster deployment and not with the single VM run.
163
+
-When running on a Spark Cluster (and not a single VM), the rate-limit values (`spark.cdm.perfops.ratelimit.origin` & `spark.cdm.perfops.ratelimit.target`) applies to individual Spark worker nodes. Hence this value should be set to `effective-rate-limit-you-need`/`number-of-spark-worker-nodes` . E.g. If you need an effective rate-limit of 10000, and the number of Spark worker nodes are 4, then you should set the above rate-limit params to a value of 2500.
164
164
165
165
# Performance recommendations
166
166
Below recommendations may only be useful when migrating large tables where the default performance is not good enough
Copy file name to clipboardExpand all lines: RELEASE.md
+6-1Lines changed: 6 additions & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,6 +1,11 @@
1
1
# Release Notes
2
+
## [5.0.0] - 2024-11-08
3
+
- CDM refactored to be fully Spark Native and more performant when deployed on a multi-node Spark Cluster
4
+
-`trackRun` feature has been expanded to record `run-info` for each part in the `CDM_RUN_DETAILS` table. Along with granular metrics, this information can be used to troubleshoot any unbalanced problematic partitions.
5
+
- This release has feature parity with 4.x release and is also backword compatible while adding the above mentioned improvements. However, we are upgrading it to 5.x as its a major rewrite of the code to make it Spark native.
6
+
2
7
## [4.7.0] - 2024-10-25
3
-
- CDM refractored to work when deployed on a Spark Cluster
8
+
- CDM refactored to work when deployed on a Spark Cluster
4
9
- More performant for large migration efforts (multi-terabytes clusters with several billions of rows) using Spark Cluster (instead of individual VMs)
5
10
- No functional changes and fully backward compatible, just refactor to support Spark cluster deployment
0 commit comments