You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
ZDM-602, ZDM-614, ZDM-583, ZDM-576, DOC-5267, and more - Grab bag of small migration doc updates, old tickets, start detailed cleanup of some topics (#201)
* cluster compatibility updates
* add legacy app migration webinar link
* zdm proxy scaling
* system.peers and system.local troubleshooting tip
* token aware routing
* importance of timestamp preservation
* improved wording and some notes
* edit some text
* zdm-583
* ZDM-605
* zdm-613
* remove comment
* fix issues
* sme review round 1
* some async dual read edits
* async dual reads rewrite
* fix tabs
* latency statement
* cqlsh/credential rewrite
Copy file name to clipboardExpand all lines: modules/ROOT/pages/cassandra-data-migrator.adoc
+31-16Lines changed: 31 additions & 16 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -6,25 +6,33 @@
6
6
//This page was an exact duplicate of cdm-overview.adoc and the (now deleted) cdm-steps.adoc, they are just in different parts of the nav.
7
7
8
8
// tag::body[]
9
-
You can use {cass-migrator} ({cass-migrator-short}) to migrate and validate tables between {cass-short}-based clusters.
10
-
It is designed to connect to your target cluster, compare it with the origin cluster, log any differences, and, optionally, automatically reconcile inconsistencies and missing data.
9
+
You can use {cass-migrator} ({cass-migrator-short}) for data migration and validation between {cass-reg}-based databases.
10
+
It supports important {cass} features and offers extensive configuration options:
11
11
12
-
{cass-migrator-short} facilitates data transfer by creating multiple jobs that access the {cass-short} cluster concurrently, making it an ideal choice for migrating large datasets.
13
-
It offers extensive configuration options, including logging, reconciliation, performance optimization, and more.
12
+
* Logging and run tracking
13
+
* Automatic reconciliation
14
+
* Performance tuning
15
+
* Record filtering
16
+
* Support for advanced data types, including sets, lists, maps, and UDTs
17
+
* Support for SSL, including custom cipher algorithms
18
+
* Use `writetime` timestamps to maintain chronological write history
19
+
* Use Time To Live (TTL) values to maintain data lifecycles
14
20
15
-
{cass-migrator-short}features include the following:
21
+
For more information and a complete list of features, see the {cass-migrator-repo}?tab=readme-ov-file#features[{cass-migrator-short}GitHub repository].
16
22
17
-
* Validate migration accuracy and performance using examples that provide a smaller, randomized data set.
18
-
* Preserve internal `writetime` timestamps and Time To Live (TTL) values.
19
-
* Use advanced data types, including sets, lists, maps, and UDTs.
20
-
* Filter records from the origin cluster's data, using {cass-short}'s internal `writetime` timestamp.
21
-
* Use SSL Support, including custom cipher algorithms.
23
+
== {cass-migrator} requirements
22
24
23
-
For more features and information, see the {cass-migrator-repo}?tab=readme-ov-file#features[{cass-migrator-short} GitHub repository].
25
+
To use {cass-migrator-short} successfully, your origin and target clusters must be {cass-short}-based databases with matching schemas.
24
26
25
-
== {cass-migrator} requirements
27
+
== {cass-migrator-short} with {product-proxy}
28
+
29
+
You can use {cass-migrator-short} alone or with {product-proxy}.
30
+
31
+
When using {cass-migrator-short} with {product-proxy}, {cass-short}'s last-write-wins semantics ensure that new, real-time writes accurately take precedence over historical writes.
32
+
33
+
Last-write-wins compares the `writetime` of conflicting records, and then retains the most recent write.
26
34
27
-
To use {cass-migrator-short} successfully, your origin and target clusters must have matching schemas.
35
+
For example, if a new write occurs in your target cluster with a `writetime` of `2023-10-01T12:05:00Z`, and then {cass-migrator-short} migrates a record against the same row with a `writetime` of `2023-10-01T12:00:00Z`, the target cluster retains the data from the new write because it has the most recent `writetime`.
28
36
29
37
== Install {cass-migrator}
30
38
@@ -124,6 +132,10 @@ For example, the 4.x series of {cass-migrator-short} isn't backwards compatible
124
132
[#migrate]
125
133
== Run a {cass-migrator-short} data migration job
126
134
135
+
A data migration job copies data from a table in your origin cluster to a table with the same schema in your target cluster.
136
+
137
+
To optimize large-scale migrations, {cass-migrator-short} can run multiple concurrent migration jobs on the same table.
138
+
127
139
The following `spark-submit` command migrates one table from the origin to the target cluster, using the configuration in your properties file.
128
140
The migration job is specified in the `--class` argument.
129
141
@@ -189,7 +201,9 @@ For additional modifications to this command, see <<advanced>>.
189
201
[#cdm-validation-steps]
190
202
== Run a {cass-migrator-short} data validation job
191
203
192
-
After you migrate data, you can use {cass-migrator-short}'s data validation mode to find inconsistencies between the origin and target tables.
204
+
After migrating data, use {cass-migrator-short}'s data validation mode to identify any inconsistencies between the origin and target tables, such as missing or mismatched records.
205
+
206
+
Optionally, {cass-migrator-short} can automatically correct discrepancies in the target cluster during validation.
193
207
194
208
. Use the following `spark-submit` command to run a data validation job using the configuration in your properties file.
195
209
The data validation job is specified in the `--class` argument.
@@ -276,9 +290,10 @@ Optionally, you can run {cass-migrator-short} validation jobs in **AutoCorrect**
276
290
+
277
291
[IMPORTANT]
278
292
====
279
-
`TIMESTAMP` has an effect on this function.
293
+
Timestamps have an effect on this function.
294
+
295
+
If the `writetime` of the origin record (determined with `.writetime.names`) is before the `writetime` of the corresponding target record, then the original write won't appear in the target cluster.
280
296
281
-
If the `WRITETIME` of the origin record (determined with `.writetime.names`) is earlier than the `WRITETIME` of the target record, then the change doesn't appear in the target cluster.
282
297
This comparative state can be challenging to troubleshoot if individual columns or cells were modified in the target cluster.
This topic explains how you can configure the {product-proxy} to route all reads to the target cluster instead of the origin cluster.
@@ -15,16 +15,12 @@ This operation is a configuration change that can be carried out as explained xr
15
15
16
16
[TIP]
17
17
====
18
-
If you performed the optional steps described in the prior topic, xref:enable-async-dual-reads.adoc[] -- to verify that your target cluster was ready and tuned appropriately to handle the production read load -- be sure to disable async dual reads when you're done testing.
19
-
If you haven't already, revert `read_mode` in `vars/zdm_proxy_core_config.yml` to `PRIMARY_ONLY` when switching sync reads to the target cluster.
20
-
Example:
18
+
If you xref:enable-async-dual-reads.adoc[enabled asynchronous dual reads] to test your target cluster's performance, make sure that you disable asynchronous dual reads when you're done testing.
21
19
22
-
[source,yml]
23
-
----
24
-
read_mode: PRIMARY_ONLY
25
-
----
20
+
To do this, edit the `vars/zdm_proxy_core_config.yml` file, and then set the `read_mode` variable to `PRIMARY_ONLY`.
26
21
27
-
If you don't disable async dual reads, {product-proxy} instances continue to send async reads to the origin, which, although harmless, is unnecessary.
22
+
If you don't disable asynchronous dual reads, {product-proxy} instances send asynchronous, duplicate read requests to your origin cluster.
Copy file name to clipboardExpand all lines: modules/ROOT/pages/components.adoc
+42-37Lines changed: 42 additions & 37 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -17,63 +17,68 @@ The main component of the {company} {product} toolkit is {product-proxy}, which
17
17
{product-proxy} is open-source software that is available from the {product-proxy-repo}[zdm-proxy GitHub repo].
18
18
This project is open for public contributions.
19
19
20
-
The {product-proxy} is an orchestrator for monitoring application activity and keeping multiple clusters in sync through dual writes.
20
+
The {product-proxy} is an orchestrator for monitoring application activity and keeping multiple clusters (databases) in sync through dual writes.
21
21
{product-proxy} isn't linked to the actual migration process.
22
22
It doesn't perform data migrations and it doesn't have awareness of ongoing migrations.
23
23
Instead, you use a data migration tool, like {sstable-sideloader}, {cass-migrator}, or {dsbulk-migrator}, to perform the data migration and validate migrated data.
24
24
25
-
=== How {product-proxy} works
25
+
{product-proxy} reduces risks to upgrades and migrations by decoupling the origin cluster from the target cluster and maintaining consistency between both clusters.
26
+
You decide when you want to switch permanently to the target cluster.
26
27
27
-
{company} created {product-proxy} to function between the application and both the origin and target databases.
28
-
The databases can be any CQL-compatible data store, such as {cass-reg}, {dse}, and {astra-db}.
29
-
The proxy always sends every write operation (Insert, Update, Delete) synchronously to both clusters at the desired Consistency Level:
28
+
After migrating your data, changes to your application code are usually minimal, depending on your client's compatibility with the origin and target clusters.
29
+
Typically, you only need to update the connection string.
30
30
31
-
* If the write is successful in both clusters, it returns a successful acknowledgement to the client application.
32
-
* If the write fails on either cluster, the failure is passed back to the client application so that it can retry it as appropriate, based on its own retry policy.
31
+
[#how-zdm-proxy-handles-reads-and-writes]
32
+
=== How {product-proxy} handles reads and writes
33
+
34
+
{company} created {product-proxy} to orchestrate requests between a client application and both the origin and target clusters.
35
+
These clusters can be any CQL-compatible data store, such as {cass-reg}, {dse}, and {astra-db}.
36
+
37
+
During the migration process, you designate one cluster as the _primary cluster_, which serves as the source of truth for reads.
38
+
For the majority of the migration process, this is typically the origin cluster.
39
+
Towards the end of the migration process, when you are ready to read from your target cluster, you set the target cluster as the primary cluster.
40
+
41
+
==== Writes
42
+
43
+
{product-proxy} sends every write operation (`INSERT`, `UPDATE`, `DELETE`) synchronously to both clusters at the requested consistency level:
44
+
45
+
* If the write is acknowledged in both clusters at the requested consistency level, then the operation returns a successful write acknowledgement to the client that issued the request.
46
+
* If the write fails in either cluster, then {product-proxy} passes a write failure, originating from the primary cluster, back to the client.
47
+
The client can then retry the request, if appropriate, based on the client's retry policy.
33
48
34
49
This design ensures that new data is always written to both clusters, and that any failure on either cluster is always made visible to the client application.
35
-
{product-proxy} also sends all reads to the primary cluster, and then returns the result to the client application.
36
-
The primary cluster is initially the origin cluster, and you change it to the target cluster at the end of the migration process.
37
50
38
-
{product-proxy} is designed to be highly available. It can be scaled horizontally, so typical deployments are made up of a minimum of 3 servers.
39
-
{product-proxy} can be restarted in a rolling fashion, for example, to change configuration for different phases of the migration.
51
+
For information about how {product-proxy} handles lightweight transactions (LWTs), see xref:feasibility-checklists.adoc#_lightweight_transactions_and_the_applied_flag[Lightweight Transactions and the applied flag].
40
52
41
-
=== Key features of {product-proxy}
53
+
==== Reads
42
54
43
-
* Allows you to lift-and-shift existing application code from your origin cluster to your target cluster by changing only the connection string, if all else is compatible.
55
+
By default, {product-proxy} sends all reads to the primary cluster, and then returns the result to the client application.
44
56
45
-
* Reduces risks to upgrades and migrations by decoupling the origin cluster from the target cluster.
46
-
You can determine an explicit cut-over point once you're ready to commit to using the target cluster permanently.
57
+
If you enable _asynchronous dual reads_, {product-proxy} sends asynchronous read requests to the secondary cluster (typically the target cluster) in addition to the synchronous read requests that are sent to the primary cluster.
47
58
48
-
* Bifurcates writes synchronously to both clusters during the migration process.
59
+
This feature is designed to test the target cluster's ability to handle a production workload before you permanently switch to the target cluster at the end of the migration process.
49
60
50
-
* Read operations return the response from the primary (origin) cluster, which is its designated source of truth.
51
-
+
52
-
During a migration, the primary cluster is typically the origin cluster.
53
-
Near the end of the migration, you shift the primary cluster to be the target cluster.
61
+
With or without asynchronous dual reads, the client application only receives results from synchronous reads on the primary cluster.
62
+
The results of asynchronous reads aren't returned to the client because asynchronous reads are for testing purposes only.
54
63
55
-
* Option to read asynchronously from the target cluster as well as the origin cluster
56
-
This capability is called **Asynchronous Dual Reads** or **Read Mirroring**, and it allows you to observe what read latencies and throughput the target cluster can achieve under the actual production load.
57
-
+
58
-
** Results from the asynchronous reads executed on the target cluster are not sent back to the client application.
59
-
** This design implies that a failure on asynchronous reads from the target cluster does not cause an error on the client application.
60
-
** Asynchronous dual reads can be enabled and disabled dynamically with a rolling restart of the {product-proxy} instances.
64
+
For more information, see xref:ROOT:enable-async-dual-reads.adoc[].
61
65
62
-
[NOTE]
63
-
====
64
-
When using Asynchronous Dual Reads, any additional read load on the target cluster may impact its ability to keep up with writes.
65
-
This behavior is expected and desired.
66
-
The idea is to mimic the full read and write load on the target cluster so there are no surprises during the last migration phase; that is, after cutting over completely to the target cluster.
67
-
====
66
+
=== High availability and multiple {product-proxy} instances
67
+
68
+
{product-proxy} is designed to be highly available and run a clustered fashion to avoid a single point of failure.
68
69
69
-
=== Run multiple {product-proxy} instances
70
+
With the exception of local test environments, {company} recommends that all {product-proxy} deployments have multiple {product-proxy} instances.
71
+
Deployments typically consist of three or more instances.
70
72
71
-
{product-proxy} has been designed to run in a clustered fashion so that it is never a single point of failure.
72
-
Unless it is for a demo or local testing environment, a {product-proxy} deployment should always comprise multiple {product-proxy} instances.
73
+
[TIP]
74
+
====
75
+
Throughout the {product-short} documentation, the term _{product-proxy} deployment_ refers to the entire deployment, and _{product-proxy} instance_ refers to an individual proxy process in the deployment.
76
+
====
73
77
74
-
Throughout the documentation, the term _{product-proxy} deployment_ refers to the entire deployment, and _{product-proxy} instance_ refers to an individual proxy process in the deployment.
78
+
You can scale {product-proxy} instances horizontally and vertically.
79
+
To avoid downtime when applying configuration changes, you can perform rolling restarts on your {product-proxy} instances.
75
80
76
-
You can use the {product-utility} and {product-automation} to set up and run Ansible playbooks that deploy and manage {product-proxy} and its monitoring stack.
81
+
For simplicity, you can use the {product-utility} and {product-automation} to set up and run Ansible playbooks that deploy and manage {product-proxy} and its monitoring stack.
0 commit comments