Skip to content

Commit 0eed0bc

Browse files
DOC-5266 Aling data migration tool summaries, articles, and attributes (#202)
* fix attributes * attribute and redundancy cleanup * start component definition reconciliation * cdm component definition and intro tweaks * dsbulk migrator component description * remove article from ZDM proxy, etc * Update modules/ROOT/pages/setup-ansible-playbooks.adoc * Apply suggestions from code review Co-authored-by: brian-f <[email protected]> --------- Co-authored-by: brian-f <[email protected]>
1 parent afb700f commit 0eed0bc

27 files changed

+252
-219
lines changed

modules/ROOT/pages/cassandra-data-migrator.adoc

Lines changed: 6 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1,18 +1,19 @@
1-
= Use {cass-migrator} with {product-short}
1+
= Use {cass-migrator} with {product-proxy}
22
:navtitle: Use {cass-migrator}
3-
:description: Use {cass-migrator} to migrate data with {product-short}
3+
:description: You can use {cass-migrator} ({cass-migrator-short}) for data migration and validation between {cass-reg}-based databases.
44
:page-aliases: cdm-parameters.adoc, ROOT:cdm-steps.adoc
55

66
//This page was an exact duplicate of cdm-overview.adoc and the (now deleted) cdm-steps.adoc, they are just in different parts of the nav.
77

88
// tag::body[]
9-
You can use {cass-migrator} ({cass-migrator-short}) for data migration and validation between {cass-reg}-based databases.
10-
It supports important {cass} features and offers extensive configuration options:
9+
{description}
10+
It is best for large or complex migrations that benefit from advanced features and configuration options, such as the following:
1111

1212
* Logging and run tracking
1313
* Automatic reconciliation
1414
* Performance tuning
1515
* Record filtering
16+
* Column renaming
1617
* Support for advanced data types, including sets, lists, maps, and UDTs
1718
* Support for SSL, including custom cipher algorithms
1819
* Use `writetime` timestamps to maintain chronological write history
@@ -26,7 +27,7 @@ To use {cass-migrator-short} successfully, your origin and target clusters must
2627

2728
== {cass-migrator-short} with {product-proxy}
2829

29-
You can use {cass-migrator-short} alone or with {product-proxy}.
30+
You can use {cass-migrator-short} alone, with {product-proxy}, or for data validation after using another data migration tool.
3031

3132
When using {cass-migrator-short} with {product-proxy}, {cass-short}'s last-write-wins semantics ensure that new, real-time writes accurately take precedence over historical writes.
3233

modules/ROOT/pages/cdm-overview.adoc

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,4 @@
11
= {cass-migrator} ({cass-migrator-short}) overview
2+
:description: You can use {cass-migrator} ({cass-migrator-short}) for data migration and validation between {cass-reg}-based databases.
23

34
include::ROOT:cassandra-data-migrator.adoc[tags=body]

modules/ROOT/pages/change-read-routing.adoc

Lines changed: 7 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
= Route reads to the target
22
:page-tag: migration,zdm,zero-downtime,zdm-proxy,read-routing
33

4-
This topic explains how you can configure the {product-proxy} to route all reads to the target cluster instead of the origin cluster.
4+
This topic explains how you can configure {product-proxy} to route all reads to the target cluster instead of the origin cluster.
55

66
image::migration-phase4ra9.png["Phase 4 diagram shows read routing on {product-proxy} was switched to the target."]
77

@@ -58,7 +58,7 @@ ansible-playbook rolling_update_zdm_proxy.yml -i zdm_ansible_inventory
5858
Wait for the {product-proxy} instances to be restarted by Ansible, one by one.
5959
All instances will now send all reads to the target cluster instead of the origin cluster.
6060

61-
At this point, the target cluster becomes the primary cluster, but the {product-proxy} still keeps the origin cluster up-to-date through dual writes.
61+
At this point, the target cluster becomes the primary cluster, but {product-proxy} still keeps the origin cluster up-to-date through dual writes.
6262

6363
== Verifying the read routing change
6464

@@ -67,11 +67,11 @@ This is not a required step, but you may wish to do it for peace of mind.
6767

6868
[TIP]
6969
====
70-
Issuing a `DESCRIBE` or a read to any system table through the {product-proxy} is *not* a valid verification.
70+
Issuing a `DESCRIBE` or a read to any system table through {product-proxy} isn't a valid verification.
7171
72-
The {product-proxy} handles reads to system tables differently, by intercepting them and always routing them to the origin, in some cases partly populating them at proxy level.
72+
{product-proxy} handles reads to system tables differently, by intercepting them and always routing them to the origin, in some cases partly populating them at the proxy level.
7373
74-
This means that system reads are *not representative* of how the {product-proxy} routes regular user reads.
74+
This means that system reads don't represent how {product-proxy} routes regular user reads.
7575
Even after you switched the configuration to read the target cluster as the primary cluster, all system reads still go to the origin.
7676
7777
Although `DESCRIBE` requests are not system requests, they are also generally resolved in a different way to regular requests, and should not be used as a means to verify the read routing behavior.
@@ -81,7 +81,7 @@ Verifying that the correct routing is taking place is a slightly cumbersome oper
8181

8282
For this reason, the only way to do a manual verification test is to force a discrepancy of some test data between the clusters.
8383
To do this, you could consider using the xref:connect-clients-to-proxy.adoc#_themis_client[Themis sample client application].
84-
This client application connects directly to the origin cluster, the target cluster, and the {product-proxy}.
84+
This client application connects directly to the origin cluster, the target cluster, and {product-proxy}.
8585
It inserts some test data in its own table, and then you can view the results of reads from each source.
8686
Refer to the Themis README for more information.
8787

@@ -93,5 +93,5 @@ For example `CREATE TABLE test_keyspace.test_table(k TEXT PRIMARY KEY, v TEXT);`
9393
Insert a row with any key, and with a value specific to the origin cluster, for example `INSERT INTO test_keyspace.test_table(k, v) VALUES ('1', 'Hello from the origin cluster!');`.
9494
* Now, use `cqlsh` to connect *directly to the target cluster*.
9595
Insert a row with the same key as above, but with a value specific to the target cluster, for example `INSERT INTO test_keyspace.test_table(k, v) VALUES ('1', 'Hello from the target cluster!');`.
96-
* Now, use `cqlsh` to xref:connect-clients-to-proxy.adoc#_connecting_cqlsh_to_the_zdm_proxy[connect to the {product-proxy}], and then issue a read request for this test table: `SELECT * FROM test_keyspace.test_table WHERE k = '1';`.
96+
* Now, use `cqlsh` to xref:connect-clients-to-proxy.adoc#_connecting_cqlsh_to_the_zdm_proxy[connect to {product-proxy}], and then issue a read request for this test table: `SELECT * FROM test_keyspace.test_table WHERE k = '1';`.
9797
The result will clearly show you where the read actually comes from.

modules/ROOT/pages/components.adoc

Lines changed: 48 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -3,24 +3,23 @@
33
:description: Learn about {company} migration tools.
44
:page-tag: migration,zdm,zero-downtime,zdm-proxy,components
55

6-
{company} migration tools include the {product} {product-short} toolkit and three data migration tools.
6+
The {company} {product} ({product-short}) toolkit includes {product-proxy}, {product-utility}, {product-automation}, and several data migration tools.
77

8-
{product-short} is comprised of {product-proxy}, {product-utility}, and {product-automation}, which orchestrate activity-in-transition on your clusters.
9-
To move and validate data, you use {sstable-sideloader}, {cass-migrator}, or {dsbulk-migrator}.
8+
For live migrations, {product-proxy} orchestrates activity-in-transition on your clusters.
9+
{product-utility} and {product-automation} facilitate the deployment and management of {product-proxy}.
1010

11-
You can also use {sstable-sideloader}, {cass-migrator-short}, and {dsbulk-migrator} on their own, outside the context of {product-short}.
11+
To move and validate data, you use data migration tools.
12+
You can use these tools alone or with {product-proxy}.
1213

1314
== {product-proxy}
1415

15-
The main component of the {company} {product} toolkit is {product-proxy}, which is designed to be a lightweight proxy that handles all real-time requests generated by your client applications during the migration process.
16+
The main component of the {company} {product} toolkit is {product-proxy-repo}[{product-proxy}], which is designed to be a lightweight proxy that handles all real-time requests generated by your client applications during the migration process.
17+
This tool is open-source software that is open for xref:ROOT:contributions.adoc[public contributions].
1618

17-
{product-proxy} is open-source software that is available from the {product-proxy-repo}[zdm-proxy GitHub repo].
18-
This project is open for public contributions.
19-
20-
The {product-proxy} is an orchestrator for monitoring application activity and keeping multiple clusters (databases) in sync through dual writes.
19+
{product-proxy} is an orchestrator for monitoring application activity and keeping multiple clusters (databases) in sync through dual writes.
2120
{product-proxy} isn't linked to the actual migration process.
2221
It doesn't perform data migrations and it doesn't have awareness of ongoing migrations.
23-
Instead, you use a data migration tool, like {sstable-sideloader}, {cass-migrator}, or {dsbulk-migrator}, to perform the data migration and validate migrated data.
22+
Instead, you use a <<data-migration-tools,data migration tool>> to perform the data migration and validate migrated data.
2423

2524
{product-proxy} reduces risks to upgrades and migrations by decoupling the origin cluster from the target cluster and maintaining consistency between both clusters.
2625
You decide when you want to switch permanently to the target cluster.
@@ -78,24 +77,58 @@ Throughout the {product-short} documentation, the term _{product-proxy} deployme
7877
You can scale {product-proxy} instances horizontally and vertically.
7978
To avoid downtime when applying configuration changes, you can perform rolling restarts on your {product-proxy} instances.
8079

81-
For simplicity, you can use the {product-utility} and {product-automation} to set up and run Ansible playbooks that deploy and manage {product-proxy} and its monitoring stack.
80+
For simplicity, you can use {product-utility} and {product-automation} to set up and run Ansible playbooks that deploy and manage {product-proxy} and its monitoring stack.
8281

8382
== {product-utility} and {product-automation}
8483

85-
You can use the {product-automation-repo}[{product-utility} and {product-automation}] to set up and run Ansible playbooks that deploy and manage {product-proxy} and its monitoring stack.
84+
You can use {product-automation-repo}[{product-utility} and {product-automation}] to set up and run Ansible playbooks that deploy and manage {product-proxy} and the associated monitoring stack.
8685

8786
https://www.ansible.com/[Ansible] is a suite of software tools that enables infrastructure as code.
8887
It is open source and its capabilities include software provisioning, configuration management, and application deployment functionality.
8988
The Ansible automation for {product-short} is organized into playbooks, each implementing a specific operation.
9089
The machine from which the playbooks are run is known as the Ansible Control Host.
9190
In {product-short}, the Ansible Control Host runs as a Docker container.
9291

93-
You use the {product-utility} to set up Ansible in a Docker container, and then you use {product-automation} to run the Ansible playbooks from the Docker container created by {product-utility}.
92+
You use {product-utility} to set up Ansible in a Docker container, and then you use {product-automation} to run the Ansible playbooks from the Docker container created by {product-utility}.
9493

95-
The {product-utility} creates the Docker container acting as the Ansible Control Host, from which {product-automation} allows you to deploy and manage the {product-proxy} instances and the associated monitoring stack, which includes Prometheus metrics and Grafana visualizations of the metrics data.
94+
{product-utility} creates the Docker container acting as the Ansible Control Host, from which {product-automation} allows you to deploy and manage the {product-proxy} instances and the associated monitoring stack, which includes Prometheus metrics and Grafana visualizations of the metrics data.
9695

9796
To use {product-utility} and {product-automation}, you must prepare the recommended infrastructure, as explained in xref:deployment-infrastructure.adoc[].
9897

9998
For more information, see xref:setup-ansible-playbooks.adoc[] and xref:deploy-proxy-monitoring.adoc[].
10099

101-
include::ROOT:migrate-and-validate-data.adoc[tags=migration-tool-summaries]
100+
== Data migration tools
101+
102+
You use data migration tools to move data between clusters and validate the migrated data.
103+
104+
You can use these tools alone or with {product-proxy}.
105+
106+
=== {sstable-sideloader}
107+
108+
{sstable-sideloader} is a service running in {astra-db} that imports data from snapshots of your existing {cass-short}-based cluster.
109+
This tool is exclusively for migrations that move data to {astra-db}.
110+
111+
For more information, see xref:sideloader:sideloader-zdm.adoc[].
112+
113+
=== {cass-migrator}
114+
115+
You can use {cass-migrator} ({cass-migrator-short}) for data migration and validation between {cass-reg}-based databases.
116+
It offers extensive functionality and configuration options to support large and complex migrations as well as post-migration data validation.
117+
118+
You can use {cass-migrator-short} by itself, with {product-proxy}, or for data validation after using another data migration tool.
119+
120+
For more information, see xref:ROOT:cassandra-data-migrator.adoc[].
121+
122+
=== {dsbulk-migrator}
123+
124+
{dsbulk-migrator} extends {dsbulk-loader} with migration-specific commands: `migrate-live`, `generate-script`, and `generate-ddl`.
125+
126+
It is best for smaller migrations or migrations that don't require extensive data validation, aside from post-migration row counts.
127+
128+
You can use {dsbulk-migrator} alone or with {product-proxy}.
129+
130+
For more information, see xref:ROOT:dsbulk-migrator.adoc[].
131+
132+
=== Custom data migration processes
133+
134+
If you want to write your own custom data migration processes, you can use a tool like Apache Spark(TM).

modules/ROOT/pages/connect-clients-to-proxy.adoc

Lines changed: 8 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22
:navtitle: Connect client applications to {product-proxy}
33
:page-tag: migration,zdm,zero-downtime,zdm-proxy,connect-apps
44

5-
The {product-proxy} is designed to be similar to a conventional {cass-reg} cluster.
5+
{product-proxy} is designed to be similar to a conventional {cass-reg} cluster.
66
You communicate with it using the CQL query language used in your existing client applications.
77
It understands the same messaging protocols used by {cass-short}, {dse}, and {astra-db}.
88
As a result, most of your client applications won't be able to distinguish between connecting to {product-proxy} and connecting directly to your {cass-short} cluster.
@@ -13,7 +13,7 @@ We conclude by describing two sample client applications that serve as real-worl
1313

1414
You can use the provided sample client applications, in addition to your own, as a quick way to validate that the deployed {product-proxy} is reading and writing data from the expected origin and target clusters.
1515

16-
Finally, we will explain how to connect the `cqlsh` command-line client to the {product-proxy}.
16+
This topic also explains how to connect CQL shell (`cqlsh`) to {product-proxy}.
1717

1818
== {company}-compatible drivers
1919

@@ -147,8 +147,8 @@ For information about {astra-db} credentials in your {product-proxy} configurati
147147

148148
=== Disable client-side compression with {product-proxy}
149149

150-
Client applications must not enable client-side compression when connecting through the {product-proxy}, as this is not currently supported.
151-
This is disabled by default in all drivers, but if it was enabled in your client application configuration it will have to be temporarily disabled when connecting to the {product-proxy}.
150+
Client applications must not enable client-side compression when connecting through {product-proxy}, as this is not currently supported.
151+
This is disabled by default in all drivers, but if it was enabled in your client application configuration, it will have to be temporarily disabled when connecting to {product-proxy}.
152152

153153
=== {product-proxy} ignores token-aware routing
154154

@@ -186,16 +186,12 @@ You can find the details of building and running {product-demo} in the https://g
186186
[[_themis_client]]
187187
=== Themis client
188188

189-
https://github.com/absurdfarce/themis[Themis] is a Java command-line client application that allows you to insert randomly generated data into some combination of these three sources:
189+
https://github.com/absurdfarce/themis[Themis] is a Java command-line client application that allows you to write randomly generated data directly to the origin cluster, directly to the target cluster, or indirectly to both clusters through {product-proxy}.
190190

191-
* Directly into the origin
192-
* Directly into the target
193-
* Into the {product-proxy}, and subsequently on to the origin and target
191+
Then, you can use the client application to query the data and confirm that {product-proxy} is reading and writing data from the expected sources.
194192

195-
The client application can then be used to query the inserted data.
196-
This allows you to validate that the {product-proxy} is reading and writing data from the expected sources.
197-
Configuration details for the clusters and/or {product-proxy} are defined in a YAML file.
198-
Details are in the https://github.com/absurdfarce/themis/blob/main/README.md[README].
193+
Configuration details for the clusters and {product-proxy} are defined in a YAML file.
194+
For more information, see the https://github.com/absurdfarce/themis/blob/main/README.md[Themis README].
199195

200196
In addition to any utility as a validation tool, Themis also serves as an example of a larger client application which uses the Java driver to connect to a {product-proxy} -- as well as directly to {cass-short} clusters or {astra-db} -- and perform operations.
201197
The configuration logic as well as the cluster and session management code have been cleanly separated into distinct packages to make them easy to understand.

modules/ROOT/pages/connect-clients-to-target.adoc

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@
44

55
At this point in our migration phases, we've completed:
66

7-
* Phase 1: Connected client applications to {product-proxy}, which included setting up Ansible playbooks with the {product-utility}, and deploying the {product-proxy} instances via the Docker container with {product-automation}.
7+
* Phase 1: Connected client applications to {product-proxy}, which included setting up Ansible playbooks with {product-utility} and using {product-automation} to deploy the {product-proxy} instances with the Docker container.
88
99
* Phase 2: Migrated and validated our data with {cass-migrator} and/or {dsbulk-migrator}.
1010
@@ -31,7 +31,7 @@ For more information, see xref:datastax-drivers:compatibility:driver-matrix.adoc
3131

3232
To connect to {astra-db}, you need the following:
3333

34-
* The xref:astra-db-serverless:administration:manage-application-tokens.adoc[application token] credentials that you used to xref:ROOT:connect-clients-to-proxy.adoc[connect your applications to the {product-proxy}].
34+
* The xref:astra-db-serverless:administration:manage-application-tokens.adoc[application token] credentials that you used to xref:ROOT:connect-clients-to-proxy.adoc[connect your applications to {product-proxy}].
3535
+
3636
As before, you can use either of the following sets of credentials to connect to your {astra-db} database:
3737
+

modules/ROOT/pages/contributions.adoc

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@
33

44
{company} {product} ({product-short}) provides a simple and reliable way for users to migrate an existing {cass-reg} or {dse} cluster to {astra-db}, or to any {cass-short} or {dse-short} cluster, without any interruption of service to the client applications and data.
55

6-
The {product-proxy} is open source software (OSS). We welcome contributions from the developer community via Pull Requests on a fork, for evaluation by the {product-short} team.
6+
{product-proxy} is open source software (OSS). We welcome contributions from the developer community via Pull Requests on a fork, for evaluation by the {product-short} team.
77

88
The code sources for additional {product} components -- including {product-utility}, {product-automation}, {cass-migrator}, and {dsbulk-migrator} -- are available in public GitHub repos, where you may submit feedback and ideas via GitHub Issues.
99
Code contributions for those additional components are not open for PRs at this time.

modules/ROOT/pages/create-target.adoc

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -34,7 +34,7 @@ Assign your preferred values for the serverless database:
3434
* **Region**: choose your geographically preferred region - you can subsequently add more regions.
3535

3636
When the {astra-db} database reaches **Active** status, create an application token in the {astra-ui} with the *Read/Write User* role.
37-
This role will be used by the client application, the {product-proxy}, and the {product-automation}.
37+
This role will be used by the client application, {product-proxy}, and {product-automation}.
3838

3939
Save the generate token and credentials (Client ID, Client Secret, and Token) in a clearly named secure file.
4040

0 commit comments

Comments
 (0)