Skip to content

DOC-5266 Aling data migration tool summaries, articles, and attributes #202

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 8 commits into from
Jun 3, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 6 additions & 5 deletions modules/ROOT/pages/cassandra-data-migrator.adoc
Original file line number Diff line number Diff line change
@@ -1,18 +1,19 @@
= Use {cass-migrator} with {product-short}
= Use {cass-migrator} with {product-proxy}
:navtitle: Use {cass-migrator}
:description: Use {cass-migrator} to migrate data with {product-short}
:description: You can use {cass-migrator} ({cass-migrator-short}) for data migration and validation between {cass-reg}-based databases.
:page-aliases: cdm-parameters.adoc, ROOT:cdm-steps.adoc

//This page was an exact duplicate of cdm-overview.adoc and the (now deleted) cdm-steps.adoc, they are just in different parts of the nav.

// tag::body[]
You can use {cass-migrator} ({cass-migrator-short}) for data migration and validation between {cass-reg}-based databases.
It supports important {cass} features and offers extensive configuration options:
{description}
It is best for large or complex migrations that benefit from advanced features and configuration options, such as the following:

* Logging and run tracking
* Automatic reconciliation
* Performance tuning
* Record filtering
* Column renaming
* Support for advanced data types, including sets, lists, maps, and UDTs
* Support for SSL, including custom cipher algorithms
* Use `writetime` timestamps to maintain chronological write history
Expand All @@ -26,7 +27,7 @@ To use {cass-migrator-short} successfully, your origin and target clusters must

== {cass-migrator-short} with {product-proxy}

You can use {cass-migrator-short} alone or with {product-proxy}.
You can use {cass-migrator-short} alone, with {product-proxy}, or for data validation after using another data migration tool.

When using {cass-migrator-short} with {product-proxy}, {cass-short}'s last-write-wins semantics ensure that new, real-time writes accurately take precedence over historical writes.

Expand Down
1 change: 1 addition & 0 deletions modules/ROOT/pages/cdm-overview.adoc
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
= {cass-migrator} ({cass-migrator-short}) overview
:description: You can use {cass-migrator} ({cass-migrator-short}) for data migration and validation between {cass-reg}-based databases.

include::ROOT:cassandra-data-migrator.adoc[tags=body]
14 changes: 7 additions & 7 deletions modules/ROOT/pages/change-read-routing.adoc
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
= Route reads to the target
:page-tag: migration,zdm,zero-downtime,zdm-proxy,read-routing

This topic explains how you can configure the {product-proxy} to route all reads to the target cluster instead of the origin cluster.
This topic explains how you can configure {product-proxy} to route all reads to the target cluster instead of the origin cluster.

image::migration-phase4ra9.png["Phase 4 diagram shows read routing on {product-proxy} was switched to the target."]

Expand Down Expand Up @@ -58,7 +58,7 @@ ansible-playbook rolling_update_zdm_proxy.yml -i zdm_ansible_inventory
Wait for the {product-proxy} instances to be restarted by Ansible, one by one.
All instances will now send all reads to the target cluster instead of the origin cluster.

At this point, the target cluster becomes the primary cluster, but the {product-proxy} still keeps the origin cluster up-to-date through dual writes.
At this point, the target cluster becomes the primary cluster, but {product-proxy} still keeps the origin cluster up-to-date through dual writes.

== Verifying the read routing change

Expand All @@ -67,11 +67,11 @@ This is not a required step, but you may wish to do it for peace of mind.

[TIP]
====
Issuing a `DESCRIBE` or a read to any system table through the {product-proxy} is *not* a valid verification.
Issuing a `DESCRIBE` or a read to any system table through {product-proxy} isn't a valid verification.

The {product-proxy} handles reads to system tables differently, by intercepting them and always routing them to the origin, in some cases partly populating them at proxy level.
{product-proxy} handles reads to system tables differently, by intercepting them and always routing them to the origin, in some cases partly populating them at the proxy level.

This means that system reads are *not representative* of how the {product-proxy} routes regular user reads.
This means that system reads don't represent how {product-proxy} routes regular user reads.
Even after you switched the configuration to read the target cluster as the primary cluster, all system reads still go to the origin.

Although `DESCRIBE` requests are not system requests, they are also generally resolved in a different way to regular requests, and should not be used as a means to verify the read routing behavior.
Expand All @@ -81,7 +81,7 @@ Verifying that the correct routing is taking place is a slightly cumbersome oper

For this reason, the only way to do a manual verification test is to force a discrepancy of some test data between the clusters.
To do this, you could consider using the xref:connect-clients-to-proxy.adoc#_themis_client[Themis sample client application].
This client application connects directly to the origin cluster, the target cluster, and the {product-proxy}.
This client application connects directly to the origin cluster, the target cluster, and {product-proxy}.
It inserts some test data in its own table, and then you can view the results of reads from each source.
Refer to the Themis README for more information.

Expand All @@ -93,5 +93,5 @@ For example `CREATE TABLE test_keyspace.test_table(k TEXT PRIMARY KEY, v TEXT);`
Insert a row with any key, and with a value specific to the origin cluster, for example `INSERT INTO test_keyspace.test_table(k, v) VALUES ('1', 'Hello from the origin cluster!');`.
* Now, use `cqlsh` to connect *directly to the target cluster*.
Insert a row with the same key as above, but with a value specific to the target cluster, for example `INSERT INTO test_keyspace.test_table(k, v) VALUES ('1', 'Hello from the target cluster!');`.
* Now, use `cqlsh` to xref:connect-clients-to-proxy.adoc#_connecting_cqlsh_to_the_zdm_proxy[connect to the {product-proxy}], and then issue a read request for this test table: `SELECT * FROM test_keyspace.test_table WHERE k = '1';`.
* Now, use `cqlsh` to xref:connect-clients-to-proxy.adoc#_connecting_cqlsh_to_the_zdm_proxy[connect to {product-proxy}], and then issue a read request for this test table: `SELECT * FROM test_keyspace.test_table WHERE k = '1';`.
The result will clearly show you where the read actually comes from.
63 changes: 48 additions & 15 deletions modules/ROOT/pages/components.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -3,24 +3,23 @@
:description: Learn about {company} migration tools.
:page-tag: migration,zdm,zero-downtime,zdm-proxy,components

{company} migration tools include the {product} {product-short} toolkit and three data migration tools.
The {company} {product} ({product-short}) toolkit includes {product-proxy}, {product-utility}, {product-automation}, and several data migration tools.

{product-short} is comprised of {product-proxy}, {product-utility}, and {product-automation}, which orchestrate activity-in-transition on your clusters.
To move and validate data, you use {sstable-sideloader}, {cass-migrator}, or {dsbulk-migrator}.
For live migrations, {product-proxy} orchestrates activity-in-transition on your clusters.
{product-utility} and {product-automation} facilitate the deployment and management of {product-proxy}.

You can also use {sstable-sideloader}, {cass-migrator-short}, and {dsbulk-migrator} on their own, outside the context of {product-short}.
To move and validate data, you use data migration tools.
You can use these tools alone or with {product-proxy}.

== {product-proxy}

The main component of the {company} {product} toolkit is {product-proxy}, which is designed to be a lightweight proxy that handles all real-time requests generated by your client applications during the migration process.
The main component of the {company} {product} toolkit is {product-proxy-repo}[{product-proxy}], which is designed to be a lightweight proxy that handles all real-time requests generated by your client applications during the migration process.
This tool is open-source software that is open for xref:ROOT:contributions.adoc[public contributions].

{product-proxy} is open-source software that is available from the {product-proxy-repo}[zdm-proxy GitHub repo].
This project is open for public contributions.

The {product-proxy} is an orchestrator for monitoring application activity and keeping multiple clusters (databases) in sync through dual writes.
{product-proxy} is an orchestrator for monitoring application activity and keeping multiple clusters (databases) in sync through dual writes.
{product-proxy} isn't linked to the actual migration process.
It doesn't perform data migrations and it doesn't have awareness of ongoing migrations.
Instead, you use a data migration tool, like {sstable-sideloader}, {cass-migrator}, or {dsbulk-migrator}, to perform the data migration and validate migrated data.
Instead, you use a <<data-migration-tools,data migration tool>> to perform the data migration and validate migrated data.

{product-proxy} reduces risks to upgrades and migrations by decoupling the origin cluster from the target cluster and maintaining consistency between both clusters.
You decide when you want to switch permanently to the target cluster.
Expand Down Expand Up @@ -78,24 +77,58 @@ Throughout the {product-short} documentation, the term _{product-proxy} deployme
You can scale {product-proxy} instances horizontally and vertically.
To avoid downtime when applying configuration changes, you can perform rolling restarts on your {product-proxy} instances.

For simplicity, you can use the {product-utility} and {product-automation} to set up and run Ansible playbooks that deploy and manage {product-proxy} and its monitoring stack.
For simplicity, you can use {product-utility} and {product-automation} to set up and run Ansible playbooks that deploy and manage {product-proxy} and its monitoring stack.

== {product-utility} and {product-automation}

You can use the {product-automation-repo}[{product-utility} and {product-automation}] to set up and run Ansible playbooks that deploy and manage {product-proxy} and its monitoring stack.
You can use {product-automation-repo}[{product-utility} and {product-automation}] to set up and run Ansible playbooks that deploy and manage {product-proxy} and the associated monitoring stack.

https://www.ansible.com/[Ansible] is a suite of software tools that enables infrastructure as code.
It is open source and its capabilities include software provisioning, configuration management, and application deployment functionality.
The Ansible automation for {product-short} is organized into playbooks, each implementing a specific operation.
The machine from which the playbooks are run is known as the Ansible Control Host.
In {product-short}, the Ansible Control Host runs as a Docker container.

You use the {product-utility} to set up Ansible in a Docker container, and then you use {product-automation} to run the Ansible playbooks from the Docker container created by {product-utility}.
You use {product-utility} to set up Ansible in a Docker container, and then you use {product-automation} to run the Ansible playbooks from the Docker container created by {product-utility}.

The {product-utility} creates the Docker container acting as the Ansible Control Host, from which {product-automation} allows you to deploy and manage the {product-proxy} instances and the associated monitoring stack, which includes Prometheus metrics and Grafana visualizations of the metrics data.
{product-utility} creates the Docker container acting as the Ansible Control Host, from which {product-automation} allows you to deploy and manage the {product-proxy} instances and the associated monitoring stack, which includes Prometheus metrics and Grafana visualizations of the metrics data.

To use {product-utility} and {product-automation}, you must prepare the recommended infrastructure, as explained in xref:deployment-infrastructure.adoc[].

For more information, see xref:setup-ansible-playbooks.adoc[] and xref:deploy-proxy-monitoring.adoc[].

include::ROOT:migrate-and-validate-data.adoc[tags=migration-tool-summaries]
== Data migration tools

You use data migration tools to move data between clusters and validate the migrated data.

You can use these tools alone or with {product-proxy}.

=== {sstable-sideloader}

{sstable-sideloader} is a service running in {astra-db} that imports data from snapshots of your existing {cass-short}-based cluster.
This tool is exclusively for migrations that move data to {astra-db}.

For more information, see xref:sideloader:sideloader-zdm.adoc[].

=== {cass-migrator}

You can use {cass-migrator} ({cass-migrator-short}) for data migration and validation between {cass-reg}-based databases.
It offers extensive functionality and configuration options to support large and complex migrations as well as post-migration data validation.

You can use {cass-migrator-short} by itself, with {product-proxy}, or for data validation after using another data migration tool.

For more information, see xref:ROOT:cassandra-data-migrator.adoc[].

=== {dsbulk-migrator}

{dsbulk-migrator} extends {dsbulk-loader} with migration-specific commands: `migrate-live`, `generate-script`, and `generate-ddl`.

It is best for smaller migrations or migrations that don't require extensive data validation, aside from post-migration row counts.

You can use {dsbulk-migrator} alone or with {product-proxy}.

For more information, see xref:ROOT:dsbulk-migrator.adoc[].

=== Custom data migration processes

If you want to write your own custom data migration processes, you can use a tool like Apache Spark(TM).
20 changes: 8 additions & 12 deletions modules/ROOT/pages/connect-clients-to-proxy.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
:navtitle: Connect client applications to {product-proxy}
:page-tag: migration,zdm,zero-downtime,zdm-proxy,connect-apps

The {product-proxy} is designed to be similar to a conventional {cass-reg} cluster.
{product-proxy} is designed to be similar to a conventional {cass-reg} cluster.
You communicate with it using the CQL query language used in your existing client applications.
It understands the same messaging protocols used by {cass-short}, {dse}, and {astra-db}.
As a result, most of your client applications won't be able to distinguish between connecting to {product-proxy} and connecting directly to your {cass-short} cluster.
Expand All @@ -13,7 +13,7 @@ We conclude by describing two sample client applications that serve as real-worl

You can use the provided sample client applications, in addition to your own, as a quick way to validate that the deployed {product-proxy} is reading and writing data from the expected origin and target clusters.

Finally, we will explain how to connect the `cqlsh` command-line client to the {product-proxy}.
This topic also explains how to connect CQL shell (`cqlsh`) to {product-proxy}.

== {company}-compatible drivers

Expand Down Expand Up @@ -147,8 +147,8 @@ For information about {astra-db} credentials in your {product-proxy} configurati

=== Disable client-side compression with {product-proxy}

Client applications must not enable client-side compression when connecting through the {product-proxy}, as this is not currently supported.
This is disabled by default in all drivers, but if it was enabled in your client application configuration it will have to be temporarily disabled when connecting to the {product-proxy}.
Client applications must not enable client-side compression when connecting through {product-proxy}, as this is not currently supported.
This is disabled by default in all drivers, but if it was enabled in your client application configuration, it will have to be temporarily disabled when connecting to {product-proxy}.

=== {product-proxy} ignores token-aware routing

Expand Down Expand Up @@ -186,16 +186,12 @@ You can find the details of building and running {product-demo} in the https://g
[[_themis_client]]
=== Themis client

https://github.com/absurdfarce/themis[Themis] is a Java command-line client application that allows you to insert randomly generated data into some combination of these three sources:
https://github.com/absurdfarce/themis[Themis] is a Java command-line client application that allows you to write randomly generated data directly to the origin cluster, directly to the target cluster, or indirectly to both clusters through {product-proxy}.

* Directly into the origin
* Directly into the target
* Into the {product-proxy}, and subsequently on to the origin and target
Then, you can use the client application to query the data and confirm that {product-proxy} is reading and writing data from the expected sources.

The client application can then be used to query the inserted data.
This allows you to validate that the {product-proxy} is reading and writing data from the expected sources.
Configuration details for the clusters and/or {product-proxy} are defined in a YAML file.
Details are in the https://github.com/absurdfarce/themis/blob/main/README.md[README].
Configuration details for the clusters and {product-proxy} are defined in a YAML file.
For more information, see the https://github.com/absurdfarce/themis/blob/main/README.md[Themis README].

In addition to any utility as a validation tool, Themis also serves as an example of a larger client application which uses the Java driver to connect to a {product-proxy} -- as well as directly to {cass-short} clusters or {astra-db} -- and perform operations.
The configuration logic as well as the cluster and session management code have been cleanly separated into distinct packages to make them easy to understand.
Expand Down
4 changes: 2 additions & 2 deletions modules/ROOT/pages/connect-clients-to-target.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@

At this point in our migration phases, we've completed:

* Phase 1: Connected client applications to {product-proxy}, which included setting up Ansible playbooks with the {product-utility}, and deploying the {product-proxy} instances via the Docker container with {product-automation}.
* Phase 1: Connected client applications to {product-proxy}, which included setting up Ansible playbooks with {product-utility} and using {product-automation} to deploy the {product-proxy} instances with the Docker container.

* Phase 2: Migrated and validated our data with {cass-migrator} and/or {dsbulk-migrator}.

Expand All @@ -31,7 +31,7 @@ For more information, see xref:datastax-drivers:compatibility:driver-matrix.adoc

To connect to {astra-db}, you need the following:

* The xref:astra-db-serverless:administration:manage-application-tokens.adoc[application token] credentials that you used to xref:ROOT:connect-clients-to-proxy.adoc[connect your applications to the {product-proxy}].
* The xref:astra-db-serverless:administration:manage-application-tokens.adoc[application token] credentials that you used to xref:ROOT:connect-clients-to-proxy.adoc[connect your applications to {product-proxy}].
+
As before, you can use either of the following sets of credentials to connect to your {astra-db} database:
+
Expand Down
2 changes: 1 addition & 1 deletion modules/ROOT/pages/contributions.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@

{company} {product} ({product-short}) provides a simple and reliable way for users to migrate an existing {cass-reg} or {dse} cluster to {astra-db}, or to any {cass-short} or {dse-short} cluster, without any interruption of service to the client applications and data.

The {product-proxy} is open source software (OSS). We welcome contributions from the developer community via Pull Requests on a fork, for evaluation by the {product-short} team.
{product-proxy} is open source software (OSS). We welcome contributions from the developer community via Pull Requests on a fork, for evaluation by the {product-short} team.

The code sources for additional {product} components -- including {product-utility}, {product-automation}, {cass-migrator}, and {dsbulk-migrator} -- are available in public GitHub repos, where you may submit feedback and ideas via GitHub Issues.
Code contributions for those additional components are not open for PRs at this time.
Expand Down
2 changes: 1 addition & 1 deletion modules/ROOT/pages/create-target.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,7 @@ Assign your preferred values for the serverless database:
* **Region**: choose your geographically preferred region - you can subsequently add more regions.

When the {astra-db} database reaches **Active** status, create an application token in the {astra-ui} with the *Read/Write User* role.
This role will be used by the client application, the {product-proxy}, and the {product-automation}.
This role will be used by the client application, {product-proxy}, and {product-automation}.

Save the generate token and credentials (Client ID, Client Secret, and Token) in a clearly named secure file.

Expand Down
Loading