Skip to content

ZDM-602, ZDM-614, ZDM-583, ZDM-576, DOC-5267, and more - Grab bag of small migration doc updates, old tickets, start detailed cleanup of some topics #201

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 19 commits into from
Jun 3, 2025
Merged
Show file tree
Hide file tree
Changes from 13 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions modules/ROOT/pages/astra-migration-paths.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -103,4 +103,5 @@ If you have questions about migrating from a specific source to {astra-db}, cont

== See also

* https://www.datastax.com/events/migrating-your-legacy-cassandra-app-to-astra-db[Migrating your legacy {cass-reg} app to {astra-db}]
* xref:astra-db-serverless:databases:migration-path-serverless.adoc[Migrate to {astra-db}]
23 changes: 16 additions & 7 deletions modules/ROOT/pages/cassandra-data-migrator.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -14,11 +14,19 @@ It offers extensive configuration options, including logging, reconciliation, pe

{cass-migrator-short} features include the following:

* Validate migration accuracy and performance using examples that provide a smaller, randomized data set.
* Preserve internal `writetime` timestamps and Time To Live (TTL) values.
* Use advanced data types, including sets, lists, maps, and UDTs.
* Filter records from the origin cluster's data, using {cass-short}'s internal `writetime` timestamp.
* Use SSL Support, including custom cipher algorithms.
* Validate migration accuracy and performance with a smaller, randomized data set.

* Preserve internal `writetime` timestamps and Time To Live (TTL) values to maintain chronological write history.
+
When using {cass-migrator-short} with {product-proxy}, the preserved timestamps ensure that new, real-time writes accurately take precedence over historical writes.
+
You can also add custom fixed `writetime` and `ttl` values.

* Support for advanced data types, including sets, lists, maps, and UDTs.

* Option to filter records from the origin cluster based `writetime`, partition/token ranges, or CQL conditions.

* Support for SSL, including custom cipher algorithms.

For more features and information, see the {cass-migrator-repo}?tab=readme-ov-file#features[{cass-migrator-short} GitHub repository].

Expand Down Expand Up @@ -276,9 +284,10 @@ Optionally, you can run {cass-migrator-short} validation jobs in **AutoCorrect**
+
[IMPORTANT]
====
`TIMESTAMP` has an effect on this function.
Timestamps have an effect on this function.

If the `writetime` of the origin record (determined with `.writetime.names`) is before the `writetime` of the corresponding target record, then the original write won't appear in the target cluster.

If the `WRITETIME` of the origin record (determined with `.writetime.names`) is earlier than the `WRITETIME` of the target record, then the change doesn't appear in the target cluster.
This comparative state can be challenging to troubleshoot if individual columns or cells were modified in the target cluster.
====

Expand Down
85 changes: 48 additions & 37 deletions modules/ROOT/pages/components.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -17,63 +17,74 @@ The main component of the {company} {product} toolkit is {product-proxy}, which
{product-proxy} is open-source software that is available from the {product-proxy-repo}[zdm-proxy GitHub repo].
This project is open for public contributions.

The {product-proxy} is an orchestrator for monitoring application activity and keeping multiple clusters in sync through dual writes.
The {product-proxy} is an orchestrator for monitoring application activity and keeping multiple clusters (databases) in sync through dual writes.
{product-proxy} isn't linked to the actual migration process.
It doesn't perform data migrations and it doesn't have awareness of ongoing migrations.
Instead, you use a data migration tool, like {sstable-sideloader}, {cass-migrator}, or {dsbulk-migrator}, to perform the data migration and validate migrated data.

=== How {product-proxy} works
{product-proxy} reduces risks to upgrades and migrations by decoupling the origin cluster from the target cluster and maintaining consistency between both clusters.
You decide when you want to switch permanently to the target cluster.

{company} created {product-proxy} to function between the application and both the origin and target databases.
The databases can be any CQL-compatible data store, such as {cass-reg}, {dse}, and {astra-db}.
The proxy always sends every write operation (Insert, Update, Delete) synchronously to both clusters at the desired Consistency Level:
After migration your data, migrating your application code can be as minimal as changing the connection string, depending on your client's compatibility with your origin and target clusters.

* If the write is successful in both clusters, it returns a successful acknowledgement to the client application.
* If the write fails on either cluster, the failure is passed back to the client application so that it can retry it as appropriate, based on its own retry policy.
[#how-zdm-proxy-handles-reads-and-writes]
=== How {product-proxy} handles reads and writes

{company} created {product-proxy} to orchestrate requests between a client application and both the origin and target clusters.
These clusters can be any CQL-compatible data store, such as {cass-reg}, {dse}, and {astra-db}.

During the migration process, you designate one cluster as the _primary cluster_, which serves as the source of truth for reads.
For the majority of the migration process, this is typically the origin cluster.
Towards the end of the migration process, when you are ready to read from your target cluster, you set the target cluster as the primary cluster.

==== Writes

{product-proxy} sends every write operation (`INSERT`, `UPDATE`, `DELETE`) synchronously to both clusters at the requested consistency level:

* If the write is acknowledged in both clusters at the requested consistency level, then the operation returns a successful write acknowledgement to the client that issued the request.
* If the write fails in either cluster, then the primary cluster passes a write failure back to the client.
The client can then execute its retry policy and reissue the request, if applicable.

This design ensures that new data is always written to both clusters, and that any failure on either cluster is always made visible to the client application.
{product-proxy} also sends all reads to the primary cluster, and then returns the result to the client application.
The primary cluster is initially the origin cluster, and you change it to the target cluster at the end of the migration process.

{product-proxy} is designed to be highly available. It can be scaled horizontally, so typical deployments are made up of a minimum of 3 servers.
{product-proxy} can be restarted in a rolling fashion, for example, to change configuration for different phases of the migration.
For information about how {product-proxy} handles lightweight transactions (LWTs), see xref:feasibility-checklists.adoc#_lightweight_transactions_and_the_applied_flag[Lightweight Transactions and the applied flag].

=== Key features of {product-proxy}
==== Reads

* Allows you to lift-and-shift existing application code from your origin cluster to your target cluster by changing only the connection string, if all else is compatible.
By default, {product-proxy} sends all reads to the primary cluster, and then returns the result to the client application.

* Reduces risks to upgrades and migrations by decoupling the origin cluster from the target cluster.
You can determine an explicit cut-over point once you're ready to commit to using the target cluster permanently.
//TODO: Compare Async Dual Writes content with enable-async-dual-reads.adoc.

* Bifurcates writes synchronously to both clusters during the migration process.
Optionally, you can configure {product-proxy} to read asynchronously from both the origin and target clusters.
This is known as **Asynchronous Dual Reads** or **Read Mirroring**, and it allows you to test the target cluster's ability to handle a production workload.

* Read operations return the response from the primary (origin) cluster, which is its designated source of truth.
+
During a migration, the primary cluster is typically the origin cluster.
Near the end of the migration, you shift the primary cluster to be the target cluster.
* Results from the asynchronous reads executed on the target cluster aren't sent back to the client application.
* This design implies that a failure on asynchronous reads from the target cluster won't cause an error on the client application.
* With Asynchronous Dual Reads, the additional read load on the target cluster can impact its ability to execute writes.
This behavior is expected because this feature is designed to mimic the full read and write workload on the target cluster.
This allows you to judge the target cluster's performance and make any adjustments before permanently switching to the target cluster at the end of the migration process.

* Option to read asynchronously from the target cluster as well as the origin cluster
This capability is called **Asynchronous Dual Reads** or **Read Mirroring**, and it allows you to observe what read latencies and throughput the target cluster can achieve under the actual production load.
+
** Results from the asynchronous reads executed on the target cluster are not sent back to the client application.
** This design implies that a failure on asynchronous reads from the target cluster does not cause an error on the client application.
** Asynchronous dual reads can be enabled and disabled dynamically with a rolling restart of the {product-proxy} instances.
You can dynamically enable and disable Asynchronous Dual Reads by modifying the configuration of your {product-proxy} instances, and then performing a rolling restart of your {product-proxy} instances.
For more information, see xref:ROOT:enable-async-dual-reads.adoc[].

[NOTE]
====
When using Asynchronous Dual Reads, any additional read load on the target cluster may impact its ability to keep up with writes.
This behavior is expected and desired.
The idea is to mimic the full read and write load on the target cluster so there are no surprises during the last migration phase; that is, after cutting over completely to the target cluster.
====
After enabling Asynchronous Dual Reads, observe the target cluster's read latency and throughput to determine how well the target cluster performs under the expected production workload.

=== High availability and multiple {product-proxy} instances

{product-proxy} is designed to be highly available and run a clustered fashion to avoid a single point of failure.

=== Run multiple {product-proxy} instances
With the exception of local test environments, {company} recommends that all {product-proxy} deployments have multiple {product-proxy} instances.
Deployments typically consist of three or more instances.

{product-proxy} has been designed to run in a clustered fashion so that it is never a single point of failure.
Unless it is for a demo or local testing environment, a {product-proxy} deployment should always comprise multiple {product-proxy} instances.
[TIP]
====
Throughout the {product-short} documentation, the term _{product-proxy} deployment_ refers to the entire deployment, and _{product-proxy} instance_ refers to an individual proxy process in the deployment.
====

Throughout the documentation, the term _{product-proxy} deployment_ refers to the entire deployment, and _{product-proxy} instance_ refers to an individual proxy process in the deployment.
You can scale {product-proxy} instances horizontally and vertically.
To avoid downtime when applying configuration changes, you can perform rolling restarts on your {product-proxy} instances.

You can use the {product-utility} and {product-automation} to set up and run Ansible playbooks that deploy and manage {product-proxy} and its monitoring stack.
For simplicity, you can use the {product-utility} and {product-automation} to set up and run Ansible playbooks that deploy and manage {product-proxy} and its monitoring stack.

== {product-utility} and {product-automation}

Expand Down
71 changes: 52 additions & 19 deletions modules/ROOT/pages/connect-clients-to-proxy.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -126,6 +126,16 @@ This is also the case if authentication is required by the target only, but not
.How different sets of credentials are used by the {product-proxy} when authentication is enabled on both clusters
image::zdm-proxy-credential-usage.png[{product-proxy} credentials usage, 550]

=== Token-aware routing with {product-proxy}

Token-aware routing isn't enforced when connecting through {product-proxy} because these instances don't hold actual token ranges in the same way as database nodes.
Instead, each {product-proxy} instance has a unique, non-overlapping set of synthetic tokens that simulate token ownership and enable balanced load distribution across the instances.

Upon receiving a request, a {product-proxy} instance routes the request to appropriate source and target database nodes, independent of token ownership.

If your clients have token-aware routing enabled, you don't need to disable this behavior while using {product-proxy}.
Clients can continue to operate with token-aware routing enabled without negative impacts to functionality or performance.

=== {astra-db} credentials

If your {product-proxy} is configured to use {astra-db} as the origin or target cluster, then your client application doesn't need to provide a {scb} when connecting to the proxy.
Expand All @@ -140,7 +150,11 @@ As an alternative to providing the {scb-short} directly, you can xref:astra-db-s
[IMPORTANT]
====
These sample applications are for demonstration purposes only.
They are not intended for production use.
They are not intended for production use or for production-scale performance testing.

To test your target cluster's ability to handle production workloads, you can xref:ROOT:enable-async-dual-reads.adoc[enable Asynchronous Dual Reads].

To assess the performance of {product-proxy}, {company} recommends http://docs.nosqlbench.io/getting-started/[NoSQLBench].
====

The following sample client applications demonstrate how to use the Java driver with {product-proxy} and the origin and target for that proxy.
Expand Down Expand Up @@ -171,28 +185,47 @@ Details are in the https://github.com/absurdfarce/themis/blob/main/README.md[REA
In addition to any utility as a validation tool, Themis also serves as an example of a larger client application which uses the Java driver to connect to a {product-proxy} -- as well as directly to {cass-short} clusters or {astra-db} -- and perform operations.
The configuration logic as well as the cluster and session management code have been cleanly separated into distinct packages to make them easy to understand.

== Connecting CQLSH to the {product-proxy}
== Connect the CQL shell to {product-proxy}

https://downloads.datastax.com/#cqlsh[CQLSH] is a simple, command-line client that is able to connect to any CQL cluster, enabling you to interactively send CQL requests to it.
CQLSH comes pre-installed on any {cass-short} or {dse-short} node, or it can be downloaded and run as a standalone client on any machine able to connect to the desired cluster.
CQL shell (`cqlsh`) is a command-line tool that you can use to send {cass-short} Query Language (CQL) statements to your {cass-short}-based clusters, including {astra-db}, {dse-short}, {hcd-short}, and {cass} databases.

Using CQLSH to connect to a {product-proxy} instance is very easy:
You can use your database's included version of CQL shell, or you can download and run the standalone CQL shell.

* Download CQLSH for free from https://downloads.datastax.com/#cqlsh[here] on a machine that has connectivity to the {product-proxy} instances:
** To connect to the {product-proxy}, any version is fine.
** The {astra}-compatible version additionally supports connecting directly to an {astra-db} cluster by passing the cluster's {scb-short} and valid credentials.
* Install it by uncompressing the archive: `tar -xvf cqlsh-<...>.tar.gz`.
* Navigate to the `cqlsh-<...>/bin` directory, for example `cd cqlsh-astra/bin`.
* Launch CQLSH:
** Specify the IP of a {product-proxy} instance.
** Specify the port on which the {product-proxy} listens for client connections, if different to `9042`.
** Use the appropriate credentials for the {product-proxy}, as explained xref:_client_application_credentials[above].
Your origin and target clusters must have a common `cql_version` between them.
If there is no CQL version that is compatible with both clusters, CQL shell won't be able to connect to {product-proxy}.

For example, if one of your {product-proxy} instances has IP Address `172.18.10.34` and listens on port `14002`, the command would look like:
[source,bash]
To connect CQL shell to a {product-proxy} instance, do the following:

. On a machine that can connect to your {product-proxy} instance, https://downloads.datastax.com/#cqlsh[download CQL shell].
+
Any version of CQL shell can connect to {product-proxy}, but some clusters require a specific CQL shell version.

. Install CQL shell by extracting the downloaded archive:
+
[source,shell,subs="+quotes"]
----
./cqlsh 172.18.10.34 14002 -u <my_creds_user> -p <my_creds_password>
tar -xvf **CQLSH_ARCHIVE**
----
+
Replace `**CQLSH_ARCHIVE**` with the file name of the downloaded CQL shell archive, such as `cqlsh-astra-20210304-bin.tar.gz`.

. Change to the `bin` directory in your CQL shell installation directory.
For example, if you installed CQL shell for {astra-db}, you would run `cd cqlsh-astra/bin`.

If the {product-proxy} listens on port `9042`, you can omit the port from the command above.
If credentials are not required, just omit the `-u` and `-p` options.
. Launch CQL shell:
+
[source,shell,subs="+quotes"]
----
./cqlsh **ZDM_PROXY_IP** **PORT** -u **CLIENT_USERNAME** -p **CLIENT_PASSWORD**
----
+
Replace the following:
+
* `**ZDM_PROXY_IP**`: The IP address of your {product-proxy} instance.
* `**PORT**`: The port on which the {product-proxy} instance listens for client connections.
If you are using the default port, 9042, you can omit this argument.
* `**CLIENT_USERNAME**` and `**CLIENT_PASSWORD**`: The required xref:_client_application_credentials[{product-proxy} credentials].
If user authentication isn't enabled for your {product-proxy} instance, you can omit these arguments.
+
If your origin or target cluster is an {astra-db} database, don't use the {scb-short} when attempting to connect CQL shell to {product-proxy}.
If you include the {scb-short}, CQL shell ignores all other connection arguments and connects exclusively to your {astra-db} database instead of {product-proxy}.
6 changes: 5 additions & 1 deletion modules/ROOT/pages/connect-clients-to-target.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -105,4 +105,8 @@ Your client application is now able to connect directly to your {astra-db} datab
== Phase 5 of migration completed

Until this point, in case of any issues, you could have abandoned the migration and rolled back to connect directly to the origin cluster at any time.
From this point onward, the clusters will diverge, and the target cluster becomes the source of truth for your client applications and data.
From this point onward, the clusters will diverge, and the target cluster becomes the source of truth for your client applications and data.

== See also

* https://www.datastax.com/events/migrating-your-legacy-cassandra-app-to-astra-db[Migrating your legacy {cass-reg} app to {astra-db}]
Loading