Skip to content

Commit afb700f

Browse files
authored
ZDM-602, ZDM-614, ZDM-583, ZDM-576, DOC-5267, and more - Grab bag of small migration doc updates, old tickets, start detailed cleanup of some topics (#201)
* cluster compatibility updates * add legacy app migration webinar link * zdm proxy scaling * system.peers and system.local troubleshooting tip * token aware routing * importance of timestamp preservation * improved wording and some notes * edit some text * zdm-583 * ZDM-605 * zdm-613 * remove comment * fix issues * sme review round 1 * some async dual read edits * async dual reads rewrite * fix tabs * latency statement * cqlsh/credential rewrite
1 parent bd9d110 commit afb700f

21 files changed

+557
-310
lines changed
-134 KB
Binary file not shown.

modules/ROOT/pages/astra-migration-paths.adoc

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -103,4 +103,5 @@ If you have questions about migrating from a specific source to {astra-db}, cont
103103

104104
== See also
105105

106+
* https://www.datastax.com/events/migrating-your-legacy-cassandra-app-to-astra-db[Migrating your legacy {cass-reg} app to {astra-db}]
106107
* xref:astra-db-serverless:databases:migration-path-serverless.adoc[Migrate to {astra-db}]

modules/ROOT/pages/cassandra-data-migrator.adoc

Lines changed: 31 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -6,25 +6,33 @@
66
//This page was an exact duplicate of cdm-overview.adoc and the (now deleted) cdm-steps.adoc, they are just in different parts of the nav.
77

88
// tag::body[]
9-
You can use {cass-migrator} ({cass-migrator-short}) to migrate and validate tables between {cass-short}-based clusters.
10-
It is designed to connect to your target cluster, compare it with the origin cluster, log any differences, and, optionally, automatically reconcile inconsistencies and missing data.
9+
You can use {cass-migrator} ({cass-migrator-short}) for data migration and validation between {cass-reg}-based databases.
10+
It supports important {cass} features and offers extensive configuration options:
1111

12-
{cass-migrator-short} facilitates data transfer by creating multiple jobs that access the {cass-short} cluster concurrently, making it an ideal choice for migrating large datasets.
13-
It offers extensive configuration options, including logging, reconciliation, performance optimization, and more.
12+
* Logging and run tracking
13+
* Automatic reconciliation
14+
* Performance tuning
15+
* Record filtering
16+
* Support for advanced data types, including sets, lists, maps, and UDTs
17+
* Support for SSL, including custom cipher algorithms
18+
* Use `writetime` timestamps to maintain chronological write history
19+
* Use Time To Live (TTL) values to maintain data lifecycles
1420
15-
{cass-migrator-short} features include the following:
21+
For more information and a complete list of features, see the {cass-migrator-repo}?tab=readme-ov-file#features[{cass-migrator-short} GitHub repository].
1622

17-
* Validate migration accuracy and performance using examples that provide a smaller, randomized data set.
18-
* Preserve internal `writetime` timestamps and Time To Live (TTL) values.
19-
* Use advanced data types, including sets, lists, maps, and UDTs.
20-
* Filter records from the origin cluster's data, using {cass-short}'s internal `writetime` timestamp.
21-
* Use SSL Support, including custom cipher algorithms.
23+
== {cass-migrator} requirements
2224

23-
For more features and information, see the {cass-migrator-repo}?tab=readme-ov-file#features[{cass-migrator-short} GitHub repository].
25+
To use {cass-migrator-short} successfully, your origin and target clusters must be {cass-short}-based databases with matching schemas.
2426

25-
== {cass-migrator} requirements
27+
== {cass-migrator-short} with {product-proxy}
28+
29+
You can use {cass-migrator-short} alone or with {product-proxy}.
30+
31+
When using {cass-migrator-short} with {product-proxy}, {cass-short}'s last-write-wins semantics ensure that new, real-time writes accurately take precedence over historical writes.
32+
33+
Last-write-wins compares the `writetime` of conflicting records, and then retains the most recent write.
2634

27-
To use {cass-migrator-short} successfully, your origin and target clusters must have matching schemas.
35+
For example, if a new write occurs in your target cluster with a `writetime` of `2023-10-01T12:05:00Z`, and then {cass-migrator-short} migrates a record against the same row with a `writetime` of `2023-10-01T12:00:00Z`, the target cluster retains the data from the new write because it has the most recent `writetime`.
2836

2937
== Install {cass-migrator}
3038

@@ -124,6 +132,10 @@ For example, the 4.x series of {cass-migrator-short} isn't backwards compatible
124132
[#migrate]
125133
== Run a {cass-migrator-short} data migration job
126134

135+
A data migration job copies data from a table in your origin cluster to a table with the same schema in your target cluster.
136+
137+
To optimize large-scale migrations, {cass-migrator-short} can run multiple concurrent migration jobs on the same table.
138+
127139
The following `spark-submit` command migrates one table from the origin to the target cluster, using the configuration in your properties file.
128140
The migration job is specified in the `--class` argument.
129141

@@ -189,7 +201,9 @@ For additional modifications to this command, see <<advanced>>.
189201
[#cdm-validation-steps]
190202
== Run a {cass-migrator-short} data validation job
191203

192-
After you migrate data, you can use {cass-migrator-short}'s data validation mode to find inconsistencies between the origin and target tables.
204+
After migrating data, use {cass-migrator-short}'s data validation mode to identify any inconsistencies between the origin and target tables, such as missing or mismatched records.
205+
206+
Optionally, {cass-migrator-short} can automatically correct discrepancies in the target cluster during validation.
193207

194208
. Use the following `spark-submit` command to run a data validation job using the configuration in your properties file.
195209
The data validation job is specified in the `--class` argument.
@@ -276,9 +290,10 @@ Optionally, you can run {cass-migrator-short} validation jobs in **AutoCorrect**
276290
+
277291
[IMPORTANT]
278292
====
279-
`TIMESTAMP` has an effect on this function.
293+
Timestamps have an effect on this function.
294+
295+
If the `writetime` of the origin record (determined with `.writetime.names`) is before the `writetime` of the corresponding target record, then the original write won't appear in the target cluster.
280296
281-
If the `WRITETIME` of the origin record (determined with `.writetime.names`) is earlier than the `WRITETIME` of the target record, then the change doesn't appear in the target cluster.
282297
This comparative state can be challenging to troubleshoot if individual columns or cells were modified in the target cluster.
283298
====
284299

modules/ROOT/pages/change-read-routing.adoc

Lines changed: 5 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
= Phase 4: Route reads to the target
1+
= Route reads to the target
22
:page-tag: migration,zdm,zero-downtime,zdm-proxy,read-routing
33

44
This topic explains how you can configure the {product-proxy} to route all reads to the target cluster instead of the origin cluster.
@@ -15,16 +15,12 @@ This operation is a configuration change that can be carried out as explained xr
1515

1616
[TIP]
1717
====
18-
If you performed the optional steps described in the prior topic, xref:enable-async-dual-reads.adoc[] -- to verify that your target cluster was ready and tuned appropriately to handle the production read load -- be sure to disable async dual reads when you're done testing.
19-
If you haven't already, revert `read_mode` in `vars/zdm_proxy_core_config.yml` to `PRIMARY_ONLY` when switching sync reads to the target cluster.
20-
Example:
18+
If you xref:enable-async-dual-reads.adoc[enabled asynchronous dual reads] to test your target cluster's performance, make sure that you disable asynchronous dual reads when you're done testing.
2119
22-
[source,yml]
23-
----
24-
read_mode: PRIMARY_ONLY
25-
----
20+
To do this, edit the `vars/zdm_proxy_core_config.yml` file, and then set the `read_mode` variable to `PRIMARY_ONLY`.
2621
27-
If you don't disable async dual reads, {product-proxy} instances continue to send async reads to the origin, which, although harmless, is unnecessary.
22+
If you don't disable asynchronous dual reads, {product-proxy} instances send asynchronous, duplicate read requests to your origin cluster.
23+
This is harmless but unnecessary.
2824
====
2925

3026
== Changing the read routing configuration

modules/ROOT/pages/components.adoc

Lines changed: 42 additions & 37 deletions
Original file line numberDiff line numberDiff line change
@@ -17,63 +17,68 @@ The main component of the {company} {product} toolkit is {product-proxy}, which
1717
{product-proxy} is open-source software that is available from the {product-proxy-repo}[zdm-proxy GitHub repo].
1818
This project is open for public contributions.
1919

20-
The {product-proxy} is an orchestrator for monitoring application activity and keeping multiple clusters in sync through dual writes.
20+
The {product-proxy} is an orchestrator for monitoring application activity and keeping multiple clusters (databases) in sync through dual writes.
2121
{product-proxy} isn't linked to the actual migration process.
2222
It doesn't perform data migrations and it doesn't have awareness of ongoing migrations.
2323
Instead, you use a data migration tool, like {sstable-sideloader}, {cass-migrator}, or {dsbulk-migrator}, to perform the data migration and validate migrated data.
2424

25-
=== How {product-proxy} works
25+
{product-proxy} reduces risks to upgrades and migrations by decoupling the origin cluster from the target cluster and maintaining consistency between both clusters.
26+
You decide when you want to switch permanently to the target cluster.
2627

27-
{company} created {product-proxy} to function between the application and both the origin and target databases.
28-
The databases can be any CQL-compatible data store, such as {cass-reg}, {dse}, and {astra-db}.
29-
The proxy always sends every write operation (Insert, Update, Delete) synchronously to both clusters at the desired Consistency Level:
28+
After migrating your data, changes to your application code are usually minimal, depending on your client's compatibility with the origin and target clusters.
29+
Typically, you only need to update the connection string.
3030

31-
* If the write is successful in both clusters, it returns a successful acknowledgement to the client application.
32-
* If the write fails on either cluster, the failure is passed back to the client application so that it can retry it as appropriate, based on its own retry policy.
31+
[#how-zdm-proxy-handles-reads-and-writes]
32+
=== How {product-proxy} handles reads and writes
33+
34+
{company} created {product-proxy} to orchestrate requests between a client application and both the origin and target clusters.
35+
These clusters can be any CQL-compatible data store, such as {cass-reg}, {dse}, and {astra-db}.
36+
37+
During the migration process, you designate one cluster as the _primary cluster_, which serves as the source of truth for reads.
38+
For the majority of the migration process, this is typically the origin cluster.
39+
Towards the end of the migration process, when you are ready to read from your target cluster, you set the target cluster as the primary cluster.
40+
41+
==== Writes
42+
43+
{product-proxy} sends every write operation (`INSERT`, `UPDATE`, `DELETE`) synchronously to both clusters at the requested consistency level:
44+
45+
* If the write is acknowledged in both clusters at the requested consistency level, then the operation returns a successful write acknowledgement to the client that issued the request.
46+
* If the write fails in either cluster, then {product-proxy} passes a write failure, originating from the primary cluster, back to the client.
47+
The client can then retry the request, if appropriate, based on the client's retry policy.
3348

3449
This design ensures that new data is always written to both clusters, and that any failure on either cluster is always made visible to the client application.
35-
{product-proxy} also sends all reads to the primary cluster, and then returns the result to the client application.
36-
The primary cluster is initially the origin cluster, and you change it to the target cluster at the end of the migration process.
3750

38-
{product-proxy} is designed to be highly available. It can be scaled horizontally, so typical deployments are made up of a minimum of 3 servers.
39-
{product-proxy} can be restarted in a rolling fashion, for example, to change configuration for different phases of the migration.
51+
For information about how {product-proxy} handles lightweight transactions (LWTs), see xref:feasibility-checklists.adoc#_lightweight_transactions_and_the_applied_flag[Lightweight Transactions and the applied flag].
4052

41-
=== Key features of {product-proxy}
53+
==== Reads
4254

43-
* Allows you to lift-and-shift existing application code from your origin cluster to your target cluster by changing only the connection string, if all else is compatible.
55+
By default, {product-proxy} sends all reads to the primary cluster, and then returns the result to the client application.
4456

45-
* Reduces risks to upgrades and migrations by decoupling the origin cluster from the target cluster.
46-
You can determine an explicit cut-over point once you're ready to commit to using the target cluster permanently.
57+
If you enable _asynchronous dual reads_, {product-proxy} sends asynchronous read requests to the secondary cluster (typically the target cluster) in addition to the synchronous read requests that are sent to the primary cluster.
4758

48-
* Bifurcates writes synchronously to both clusters during the migration process.
59+
This feature is designed to test the target cluster's ability to handle a production workload before you permanently switch to the target cluster at the end of the migration process.
4960

50-
* Read operations return the response from the primary (origin) cluster, which is its designated source of truth.
51-
+
52-
During a migration, the primary cluster is typically the origin cluster.
53-
Near the end of the migration, you shift the primary cluster to be the target cluster.
61+
With or without asynchronous dual reads, the client application only receives results from synchronous reads on the primary cluster.
62+
The results of asynchronous reads aren't returned to the client because asynchronous reads are for testing purposes only.
5463

55-
* Option to read asynchronously from the target cluster as well as the origin cluster
56-
This capability is called **Asynchronous Dual Reads** or **Read Mirroring**, and it allows you to observe what read latencies and throughput the target cluster can achieve under the actual production load.
57-
+
58-
** Results from the asynchronous reads executed on the target cluster are not sent back to the client application.
59-
** This design implies that a failure on asynchronous reads from the target cluster does not cause an error on the client application.
60-
** Asynchronous dual reads can be enabled and disabled dynamically with a rolling restart of the {product-proxy} instances.
64+
For more information, see xref:ROOT:enable-async-dual-reads.adoc[].
6165

62-
[NOTE]
63-
====
64-
When using Asynchronous Dual Reads, any additional read load on the target cluster may impact its ability to keep up with writes.
65-
This behavior is expected and desired.
66-
The idea is to mimic the full read and write load on the target cluster so there are no surprises during the last migration phase; that is, after cutting over completely to the target cluster.
67-
====
66+
=== High availability and multiple {product-proxy} instances
67+
68+
{product-proxy} is designed to be highly available and run a clustered fashion to avoid a single point of failure.
6869

69-
=== Run multiple {product-proxy} instances
70+
With the exception of local test environments, {company} recommends that all {product-proxy} deployments have multiple {product-proxy} instances.
71+
Deployments typically consist of three or more instances.
7072

71-
{product-proxy} has been designed to run in a clustered fashion so that it is never a single point of failure.
72-
Unless it is for a demo or local testing environment, a {product-proxy} deployment should always comprise multiple {product-proxy} instances.
73+
[TIP]
74+
====
75+
Throughout the {product-short} documentation, the term _{product-proxy} deployment_ refers to the entire deployment, and _{product-proxy} instance_ refers to an individual proxy process in the deployment.
76+
====
7377

74-
Throughout the documentation, the term _{product-proxy} deployment_ refers to the entire deployment, and _{product-proxy} instance_ refers to an individual proxy process in the deployment.
78+
You can scale {product-proxy} instances horizontally and vertically.
79+
To avoid downtime when applying configuration changes, you can perform rolling restarts on your {product-proxy} instances.
7580

76-
You can use the {product-utility} and {product-automation} to set up and run Ansible playbooks that deploy and manage {product-proxy} and its monitoring stack.
81+
For simplicity, you can use the {product-utility} and {product-automation} to set up and run Ansible playbooks that deploy and manage {product-proxy} and its monitoring stack.
7782

7883
== {product-utility} and {product-automation}
7984

0 commit comments

Comments
 (0)