diff --git a/modules/ROOT/images/migration-phase3ra9.png b/modules/ROOT/images/migration-phase3ra9.png deleted file mode 100644 index 597ccf3a..00000000 Binary files a/modules/ROOT/images/migration-phase3ra9.png and /dev/null differ diff --git a/modules/ROOT/pages/astra-migration-paths.adoc b/modules/ROOT/pages/astra-migration-paths.adoc index 00c42cba..329d4e75 100644 --- a/modules/ROOT/pages/astra-migration-paths.adoc +++ b/modules/ROOT/pages/astra-migration-paths.adoc @@ -103,4 +103,5 @@ If you have questions about migrating from a specific source to {astra-db}, cont == See also +* https://www.datastax.com/events/migrating-your-legacy-cassandra-app-to-astra-db[Migrating your legacy {cass-reg} app to {astra-db}] * xref:astra-db-serverless:databases:migration-path-serverless.adoc[Migrate to {astra-db}] \ No newline at end of file diff --git a/modules/ROOT/pages/cassandra-data-migrator.adoc b/modules/ROOT/pages/cassandra-data-migrator.adoc index b4109302..f9b8fadf 100644 --- a/modules/ROOT/pages/cassandra-data-migrator.adoc +++ b/modules/ROOT/pages/cassandra-data-migrator.adoc @@ -6,25 +6,33 @@ //This page was an exact duplicate of cdm-overview.adoc and the (now deleted) cdm-steps.adoc, they are just in different parts of the nav. // tag::body[] -You can use {cass-migrator} ({cass-migrator-short}) to migrate and validate tables between {cass-short}-based clusters. -It is designed to connect to your target cluster, compare it with the origin cluster, log any differences, and, optionally, automatically reconcile inconsistencies and missing data. +You can use {cass-migrator} ({cass-migrator-short}) for data migration and validation between {cass-reg}-based databases. +It supports important {cass} features and offers extensive configuration options: -{cass-migrator-short} facilitates data transfer by creating multiple jobs that access the {cass-short} cluster concurrently, making it an ideal choice for migrating large datasets. -It offers extensive configuration options, including logging, reconciliation, performance optimization, and more. +* Logging and run tracking +* Automatic reconciliation +* Performance tuning +* Record filtering +* Support for advanced data types, including sets, lists, maps, and UDTs +* Support for SSL, including custom cipher algorithms +* Use `writetime` timestamps to maintain chronological write history +* Use Time To Live (TTL) values to maintain data lifecycles -{cass-migrator-short} features include the following: +For more information and a complete list of features, see the {cass-migrator-repo}?tab=readme-ov-file#features[{cass-migrator-short} GitHub repository]. -* Validate migration accuracy and performance using examples that provide a smaller, randomized data set. -* Preserve internal `writetime` timestamps and Time To Live (TTL) values. -* Use advanced data types, including sets, lists, maps, and UDTs. -* Filter records from the origin cluster's data, using {cass-short}'s internal `writetime` timestamp. -* Use SSL Support, including custom cipher algorithms. +== {cass-migrator} requirements -For more features and information, see the {cass-migrator-repo}?tab=readme-ov-file#features[{cass-migrator-short} GitHub repository]. +To use {cass-migrator-short} successfully, your origin and target clusters must be {cass-short}-based databases with matching schemas. -== {cass-migrator} requirements +== {cass-migrator-short} with {product-proxy} + +You can use {cass-migrator-short} alone or with {product-proxy}. + +When using {cass-migrator-short} with {product-proxy}, {cass-short}'s last-write-wins semantics ensure that new, real-time writes accurately take precedence over historical writes. + +Last-write-wins compares the `writetime` of conflicting records, and then retains the most recent write. -To use {cass-migrator-short} successfully, your origin and target clusters must have matching schemas. +For example, if a new write occurs in your target cluster with a `writetime` of `2023-10-01T12:05:00Z`, and then {cass-migrator-short} migrates a record against the same row with a `writetime` of `2023-10-01T12:00:00Z`, the target cluster retains the data from the new write because it has the most recent `writetime`. == Install {cass-migrator} @@ -124,6 +132,10 @@ For example, the 4.x series of {cass-migrator-short} isn't backwards compatible [#migrate] == Run a {cass-migrator-short} data migration job +A data migration job copies data from a table in your origin cluster to a table with the same schema in your target cluster. + +To optimize large-scale migrations, {cass-migrator-short} can run multiple concurrent migration jobs on the same table. + The following `spark-submit` command migrates one table from the origin to the target cluster, using the configuration in your properties file. The migration job is specified in the `--class` argument. @@ -189,7 +201,9 @@ For additional modifications to this command, see <>. [#cdm-validation-steps] == Run a {cass-migrator-short} data validation job -After you migrate data, you can use {cass-migrator-short}'s data validation mode to find inconsistencies between the origin and target tables. +After migrating data, use {cass-migrator-short}'s data validation mode to identify any inconsistencies between the origin and target tables, such as missing or mismatched records. + +Optionally, {cass-migrator-short} can automatically correct discrepancies in the target cluster during validation. . Use the following `spark-submit` command to run a data validation job using the configuration in your properties file. The data validation job is specified in the `--class` argument. @@ -276,9 +290,10 @@ Optionally, you can run {cass-migrator-short} validation jobs in **AutoCorrect** + [IMPORTANT] ==== -`TIMESTAMP` has an effect on this function. +Timestamps have an effect on this function. + +If the `writetime` of the origin record (determined with `.writetime.names`) is before the `writetime` of the corresponding target record, then the original write won't appear in the target cluster. -If the `WRITETIME` of the origin record (determined with `.writetime.names`) is earlier than the `WRITETIME` of the target record, then the change doesn't appear in the target cluster. This comparative state can be challenging to troubleshoot if individual columns or cells were modified in the target cluster. ==== diff --git a/modules/ROOT/pages/change-read-routing.adoc b/modules/ROOT/pages/change-read-routing.adoc index a6309f1a..939d1fe1 100644 --- a/modules/ROOT/pages/change-read-routing.adoc +++ b/modules/ROOT/pages/change-read-routing.adoc @@ -1,4 +1,4 @@ -= Phase 4: Route reads to the target += Route reads to the target :page-tag: migration,zdm,zero-downtime,zdm-proxy,read-routing This topic explains how you can configure the {product-proxy} to route all reads to the target cluster instead of the origin cluster. @@ -15,16 +15,12 @@ This operation is a configuration change that can be carried out as explained xr [TIP] ==== -If you performed the optional steps described in the prior topic, xref:enable-async-dual-reads.adoc[] -- to verify that your target cluster was ready and tuned appropriately to handle the production read load -- be sure to disable async dual reads when you're done testing. -If you haven't already, revert `read_mode` in `vars/zdm_proxy_core_config.yml` to `PRIMARY_ONLY` when switching sync reads to the target cluster. -Example: +If you xref:enable-async-dual-reads.adoc[enabled asynchronous dual reads] to test your target cluster's performance, make sure that you disable asynchronous dual reads when you're done testing. -[source,yml] ----- -read_mode: PRIMARY_ONLY ----- +To do this, edit the `vars/zdm_proxy_core_config.yml` file, and then set the `read_mode` variable to `PRIMARY_ONLY`. -If you don't disable async dual reads, {product-proxy} instances continue to send async reads to the origin, which, although harmless, is unnecessary. +If you don't disable asynchronous dual reads, {product-proxy} instances send asynchronous, duplicate read requests to your origin cluster. +This is harmless but unnecessary. ==== == Changing the read routing configuration diff --git a/modules/ROOT/pages/components.adoc b/modules/ROOT/pages/components.adoc index 25916146..1e8648f0 100644 --- a/modules/ROOT/pages/components.adoc +++ b/modules/ROOT/pages/components.adoc @@ -17,63 +17,68 @@ The main component of the {company} {product} toolkit is {product-proxy}, which {product-proxy} is open-source software that is available from the {product-proxy-repo}[zdm-proxy GitHub repo]. This project is open for public contributions. -The {product-proxy} is an orchestrator for monitoring application activity and keeping multiple clusters in sync through dual writes. +The {product-proxy} is an orchestrator for monitoring application activity and keeping multiple clusters (databases) in sync through dual writes. {product-proxy} isn't linked to the actual migration process. It doesn't perform data migrations and it doesn't have awareness of ongoing migrations. Instead, you use a data migration tool, like {sstable-sideloader}, {cass-migrator}, or {dsbulk-migrator}, to perform the data migration and validate migrated data. -=== How {product-proxy} works +{product-proxy} reduces risks to upgrades and migrations by decoupling the origin cluster from the target cluster and maintaining consistency between both clusters. +You decide when you want to switch permanently to the target cluster. -{company} created {product-proxy} to function between the application and both the origin and target databases. -The databases can be any CQL-compatible data store, such as {cass-reg}, {dse}, and {astra-db}. -The proxy always sends every write operation (Insert, Update, Delete) synchronously to both clusters at the desired Consistency Level: +After migrating your data, changes to your application code are usually minimal, depending on your client's compatibility with the origin and target clusters. +Typically, you only need to update the connection string. -* If the write is successful in both clusters, it returns a successful acknowledgement to the client application. -* If the write fails on either cluster, the failure is passed back to the client application so that it can retry it as appropriate, based on its own retry policy. +[#how-zdm-proxy-handles-reads-and-writes] +=== How {product-proxy} handles reads and writes + +{company} created {product-proxy} to orchestrate requests between a client application and both the origin and target clusters. +These clusters can be any CQL-compatible data store, such as {cass-reg}, {dse}, and {astra-db}. + +During the migration process, you designate one cluster as the _primary cluster_, which serves as the source of truth for reads. +For the majority of the migration process, this is typically the origin cluster. +Towards the end of the migration process, when you are ready to read from your target cluster, you set the target cluster as the primary cluster. + +==== Writes + +{product-proxy} sends every write operation (`INSERT`, `UPDATE`, `DELETE`) synchronously to both clusters at the requested consistency level: + +* If the write is acknowledged in both clusters at the requested consistency level, then the operation returns a successful write acknowledgement to the client that issued the request. +* If the write fails in either cluster, then {product-proxy} passes a write failure, originating from the primary cluster, back to the client. +The client can then retry the request, if appropriate, based on the client's retry policy. This design ensures that new data is always written to both clusters, and that any failure on either cluster is always made visible to the client application. -{product-proxy} also sends all reads to the primary cluster, and then returns the result to the client application. -The primary cluster is initially the origin cluster, and you change it to the target cluster at the end of the migration process. -{product-proxy} is designed to be highly available. It can be scaled horizontally, so typical deployments are made up of a minimum of 3 servers. -{product-proxy} can be restarted in a rolling fashion, for example, to change configuration for different phases of the migration. +For information about how {product-proxy} handles lightweight transactions (LWTs), see xref:feasibility-checklists.adoc#_lightweight_transactions_and_the_applied_flag[Lightweight Transactions and the applied flag]. -=== Key features of {product-proxy} +==== Reads -* Allows you to lift-and-shift existing application code from your origin cluster to your target cluster by changing only the connection string, if all else is compatible. +By default, {product-proxy} sends all reads to the primary cluster, and then returns the result to the client application. -* Reduces risks to upgrades and migrations by decoupling the origin cluster from the target cluster. -You can determine an explicit cut-over point once you're ready to commit to using the target cluster permanently. +If you enable _asynchronous dual reads_, {product-proxy} sends asynchronous read requests to the secondary cluster (typically the target cluster) in addition to the synchronous read requests that are sent to the primary cluster. -* Bifurcates writes synchronously to both clusters during the migration process. +This feature is designed to test the target cluster's ability to handle a production workload before you permanently switch to the target cluster at the end of the migration process. -* Read operations return the response from the primary (origin) cluster, which is its designated source of truth. -+ -During a migration, the primary cluster is typically the origin cluster. -Near the end of the migration, you shift the primary cluster to be the target cluster. +With or without asynchronous dual reads, the client application only receives results from synchronous reads on the primary cluster. +The results of asynchronous reads aren't returned to the client because asynchronous reads are for testing purposes only. -* Option to read asynchronously from the target cluster as well as the origin cluster -This capability is called **Asynchronous Dual Reads** or **Read Mirroring**, and it allows you to observe what read latencies and throughput the target cluster can achieve under the actual production load. -+ -** Results from the asynchronous reads executed on the target cluster are not sent back to the client application. -** This design implies that a failure on asynchronous reads from the target cluster does not cause an error on the client application. -** Asynchronous dual reads can be enabled and disabled dynamically with a rolling restart of the {product-proxy} instances. +For more information, see xref:ROOT:enable-async-dual-reads.adoc[]. -[NOTE] -==== -When using Asynchronous Dual Reads, any additional read load on the target cluster may impact its ability to keep up with writes. -This behavior is expected and desired. -The idea is to mimic the full read and write load on the target cluster so there are no surprises during the last migration phase; that is, after cutting over completely to the target cluster. -==== +=== High availability and multiple {product-proxy} instances + +{product-proxy} is designed to be highly available and run a clustered fashion to avoid a single point of failure. -=== Run multiple {product-proxy} instances +With the exception of local test environments, {company} recommends that all {product-proxy} deployments have multiple {product-proxy} instances. +Deployments typically consist of three or more instances. -{product-proxy} has been designed to run in a clustered fashion so that it is never a single point of failure. -Unless it is for a demo or local testing environment, a {product-proxy} deployment should always comprise multiple {product-proxy} instances. +[TIP] +==== +Throughout the {product-short} documentation, the term _{product-proxy} deployment_ refers to the entire deployment, and _{product-proxy} instance_ refers to an individual proxy process in the deployment. +==== -Throughout the documentation, the term _{product-proxy} deployment_ refers to the entire deployment, and _{product-proxy} instance_ refers to an individual proxy process in the deployment. +You can scale {product-proxy} instances horizontally and vertically. +To avoid downtime when applying configuration changes, you can perform rolling restarts on your {product-proxy} instances. -You can use the {product-utility} and {product-automation} to set up and run Ansible playbooks that deploy and manage {product-proxy} and its monitoring stack. +For simplicity, you can use the {product-utility} and {product-automation} to set up and run Ansible playbooks that deploy and manage {product-proxy} and its monitoring stack. == {product-utility} and {product-automation} diff --git a/modules/ROOT/pages/connect-clients-to-proxy.adoc b/modules/ROOT/pages/connect-clients-to-proxy.adoc index aa718de6..5993ea18 100644 --- a/modules/ROOT/pages/connect-clients-to-proxy.adoc +++ b/modules/ROOT/pages/connect-clients-to-proxy.adoc @@ -75,72 +75,101 @@ The following links provide some good starting points for learning about the int * The https://docs.datastax.com/en/developer/cpp-driver/latest/topics/[getting started section] of the C/C++ driver documentation. * The https://docs.datastax.com/en/developer/nodejs-driver/latest/#basic-usage[basic usage section] of the Node.js driver documentation. -== Connect drivers to {product-proxy} +== Connect applications to {product-proxy} We mentioned above that connecting to a {product-proxy} should be almost indistinguishable from connecting directly to your {cass-short} cluster. This design decision means there isn't much to say here; everything we discussed in the section above also applies when connecting your driver to {product-proxy}. There are a few extra considerations to keep in mind, though, when using the proxy. -=== Client-side compression -Client applications must not enable client-side compression when connecting through the {product-proxy}, as this is not currently supported. -This is disabled by default in all drivers, but if it was enabled in your client application configuration it will have to be temporarily disabled when connecting to the {product-proxy}. - [[_client_application_credentials]] === Client application credentials -The credentials provided by the client application are used when forwarding its requests. -However, the client application has no notion that there are two clusters involved: from its point of view, it talks to just one cluster as usual. -For this reason, the {product-proxy} will only use the client application credentials when forwarding requests to one cluster (typically the target), and it will resort to using the credentials in its own configuration to forward requests to the other cluster (typically the origin). +Client applications provide cluster credentials to authenticate requests. + +Client applications connect to {product-proxy} in the same way that they connect to a cluster: by providing a set of credentials. + +Clients have no awareness of the {product-proxy} architecture or the existence of the two separate clusters (the origin and target). +Therefore, a client only provides a single set of credentials when connecting to {product-proxy}, the same as it would when connecting directly to a cluster. + +{product-proxy} uses the credentials provided by the client to forward requests to the cluster that corresponds with those credentials, which is usually the target cluster. +If necessary, {product-proxy} uses the credentials defined in `xref:ROOT:deploy-proxy-monitoring.adoc#cluster-and-core-configuration[zdm_proxy_cluster_config.yml]` to forward requests to the other cluster, which is usually the origin cluster. + +.Credential usage by {product-proxy} when authentication is required for both clusters +image::zdm-proxy-credential-usage.png[{product-proxy} credentials usage when authentication is required for both clusters, 550] -This means that, if your {product-proxy} is configured with an origin or target cluster with **user authentication enabled**, your client application has to provide credentials when connecting to the proxy: +==== Determine which credentials to provide -* If both clusters require authentication, your client application must pass the credentials for the target. -This is also the case if authentication is required by the target only, but not the origin. -* If the origin requires authentication but the target does not, then your client application must supply credentials for the origin. -* If neither cluster requires authentication, no credentials are needed. +The credentials your client must provide depend on the authentication requirements of the origin and target clusters: -[cols="1,1,1"] -|=== -|Auth enabled on the origin -|Auth enabled on the target -|Client application credentials +* *Authentication required for both clusters*: Your client application must supply credentials for the target cluster. +* *Authentication required for target cluster only*: Your client application must supply credentials for the target cluster. +* *Authentication required for origin cluster only*: Your client application must supply credentials for the origin cluster. +* *No authentication required for either cluster*: Your client application doesn't need to supply any cluster credentials. -|Yes -|Yes -|Target +==== Expected authentication credentials for self-managed clusters -|No -|Yes -|Target +For a self-managed clusters that require authentication, your client application must provide valid `username` and `password` values to access the cluster. -|Yes -|No -|Origin +For information about self-managed cluster credentials in your {product-proxy} configuration, see xref:ROOT:deploy-proxy-monitoring.adoc#cluster-and-core-configuration[Cluster and core configuration]. -|No -|No -|No credentials +[#expected-authentication-credentials-for-astra-db] +==== Expected authentication credentials for {astra-db} -|=== +For {astra-db} databases, your client application can provide either application token credentials or a {scb}. -.How different sets of credentials are used by the {product-proxy} when authentication is enabled on both clusters -image::zdm-proxy-credential-usage.png[{product-proxy} credentials usage, 550] +[tabs] +====== +Application token:: ++ +-- +For token-based authentication, do the following: -=== {astra-db} credentials +. xref:astra-db-serverless:administration:manage-application-tokens.adoc[Generate an application token] with the *Organization Administrator* role. ++ +The token has three values: `clientId`, `secret`, and `token`. -If your {product-proxy} is configured to use {astra-db} as the origin or target cluster, then your client application doesn't need to provide a {scb} when connecting to the proxy. +. Specify one of the following sets of credentials in your client application: -As an alternative to providing the {scb-short} directly, you can xref:astra-db-serverless:administration:manage-application-tokens.adoc[generate an application token] with the *Organization Administrator* role, and then specify one of the following sets of credentials generated with the token: +* Recommended: Set `username` to the literal string `token`, and set `password` to the {astra-db} `token` value (`AstraCS:...`). +* Legacy applications and older drivers: Set `username` to the `clientId` value, and set `password` to the `secret` value. +-- + +{scb-short}:: ++ +-- +For information about downloading the {scb-short}, see xref:astra-db-serverless:databases:secure-connect-bundle.adoc[]. + +For information about using a {scb-short} with a driver, see your driver's documentation. +-- +====== + +For information about {astra-db} credentials in your {product-proxy} configuration, see xref:ROOT:deploy-proxy-monitoring.adoc#cluster-and-core-configuration[Cluster and core configuration]. + +=== Disable client-side compression with {product-proxy} + +Client applications must not enable client-side compression when connecting through the {product-proxy}, as this is not currently supported. +This is disabled by default in all drivers, but if it was enabled in your client application configuration it will have to be temporarily disabled when connecting to the {product-proxy}. -* Token-only authentication: Set `username` to the literal string `token`, and set `password` to your {astra-db} application token. -* Client ID and secret authentication (legacy): Set `username` to the `clientId` generated with your application token, and then set `password` to the `secret` generated with your application token. +=== {product-proxy} ignores token-aware routing + +Token-aware routing isn't enforced when connecting through {product-proxy} because these instances don't hold actual token ranges in the same way as database nodes. +Instead, each {product-proxy} instance has a unique, non-overlapping set of synthetic tokens that simulate token ownership and enable balanced load distribution across the instances. + +Upon receiving a request, a {product-proxy} instance routes the request to appropriate source and target database nodes, independent of token ownership. + +If your clients have token-aware routing enabled, you don't need to disable this behavior while using {product-proxy}. +Clients can continue to operate with token-aware routing enabled without negative impacts to functionality or performance. == Sample client applications [IMPORTANT] ==== These sample applications are for demonstration purposes only. -They are not intended for production use. +They are not intended for production use or for production-scale performance testing. + +To test your target cluster's ability to handle production workloads, you can xref:ROOT:enable-async-dual-reads.adoc[enable asynchronous dual reads]. + +To assess the performance of {product-proxy}, {company} recommends http://docs.nosqlbench.io/getting-started/[NoSQLBench]. ==== The following sample client applications demonstrate how to use the Java driver with {product-proxy} and the origin and target for that proxy. @@ -171,28 +200,57 @@ Details are in the https://github.com/absurdfarce/themis/blob/main/README.md[REA In addition to any utility as a validation tool, Themis also serves as an example of a larger client application which uses the Java driver to connect to a {product-proxy} -- as well as directly to {cass-short} clusters or {astra-db} -- and perform operations. The configuration logic as well as the cluster and session management code have been cleanly separated into distinct packages to make them easy to understand. -== Connecting CQLSH to the {product-proxy} +== Connect the CQL shell to {product-proxy} + +CQL shell (`cqlsh`) is a command-line tool that you can use to send {cass-short} Query Language (CQL) statements to your {cass-short}-based clusters, including {astra-db}, {dse-short}, {hcd-short}, and {cass} databases. + +You can use your database's included version of CQL shell, or you can download and run the standalone CQL shell. -https://downloads.datastax.com/#cqlsh[CQLSH] is a simple, command-line client that is able to connect to any CQL cluster, enabling you to interactively send CQL requests to it. -CQLSH comes pre-installed on any {cass-short} or {dse-short} node, or it can be downloaded and run as a standalone client on any machine able to connect to the desired cluster. +Your origin and target clusters must have a common `cql_version` between them. +If there is no CQL version that is compatible with both clusters, CQL shell won't be able to connect to {product-proxy}. -Using CQLSH to connect to a {product-proxy} instance is very easy: +To connect CQL shell to a {product-proxy} instance, do the following: -* Download CQLSH for free from https://downloads.datastax.com/#cqlsh[here] on a machine that has connectivity to the {product-proxy} instances: -** To connect to the {product-proxy}, any version is fine. -** The {astra}-compatible version additionally supports connecting directly to an {astra-db} cluster by passing the cluster's {scb-short} and valid credentials. -* Install it by uncompressing the archive: `tar -xvf cqlsh-<...>.tar.gz`. -* Navigate to the `cqlsh-<...>/bin` directory, for example `cd cqlsh-astra/bin`. -* Launch CQLSH: -** Specify the IP of a {product-proxy} instance. -** Specify the port on which the {product-proxy} listens for client connections, if different to `9042`. -** Use the appropriate credentials for the {product-proxy}, as explained xref:_client_application_credentials[above]. +. On a machine that can connect to your {product-proxy} instance, https://downloads.datastax.com/#cqlsh[download CQL shell]. ++ +Any version of CQL shell can connect to {product-proxy}, but some clusters require a specific CQL shell version. -For example, if one of your {product-proxy} instances has IP Address `172.18.10.34` and listens on port `14002`, the command would look like: -[source,bash] +. Install CQL shell by extracting the downloaded archive: ++ +[source,shell,subs="+quotes"] ---- -./cqlsh 172.18.10.34 14002 -u -p +tar -xvf **CQLSH_ARCHIVE** ---- ++ +Replace `**CQLSH_ARCHIVE**` with the file name of the downloaded CQL shell archive, such as `cqlsh-astra-20210304-bin.tar.gz`. + +. Change to the `bin` directory in your CQL shell installation directory. +For example, if you installed CQL shell for {astra-db}, you would run `cd cqlsh-astra/bin`. + +. Launch CQL shell: ++ +[source,shell,subs="+quotes"] +---- +./cqlsh **ZDM_PROXY_IP** **PORT** -u **USERNAME** -p **PASSWORD** +---- ++ +Replace the following: ++ +* `**ZDM_PROXY_IP**`: The IP address of your {product-proxy} instance. +* `**PORT**`: The port on which the {product-proxy} instance listens for client connections. +If you are using the default port, 9042, you can omit this argument. +* `**USERNAME**` and `**PASSWORD**`: Valid xref:_client_application_credentials[client connection credentials], depending on the authentication requirements for your origin and target clusters: ++ +** *Authentication required for both clusters*: Provide credentials for the target cluster. +** *Authentication required for target cluster only*: Provide credentials for the target cluster. +** *Authentication required for origin cluster only*: Provide credentials for the origin cluster. +** *No authentication required for either cluster*: Omit the `-u` and `-p` arguments. + ++ +[IMPORTANT] +==== +If you need to provide credentials for an {astra-db} database, don't use the {scb-short} when attempting to connect CQL shell to {product-proxy}. +Instead, use the token-based authentication option explained in <>. -If the {product-proxy} listens on port `9042`, you can omit the port from the command above. -If credentials are not required, just omit the `-u` and `-p` options. \ No newline at end of file +If you include the {scb-short}, CQL shell ignores all other connection arguments and connects exclusively to your {astra-db} database instead of {product-proxy}. +==== \ No newline at end of file diff --git a/modules/ROOT/pages/connect-clients-to-target.adoc b/modules/ROOT/pages/connect-clients-to-target.adoc index af5af50d..c1db87be 100644 --- a/modules/ROOT/pages/connect-clients-to-target.adoc +++ b/modules/ROOT/pages/connect-clients-to-target.adoc @@ -105,4 +105,8 @@ Your client application is now able to connect directly to your {astra-db} datab == Phase 5 of migration completed Until this point, in case of any issues, you could have abandoned the migration and rolled back to connect directly to the origin cluster at any time. -From this point onward, the clusters will diverge, and the target cluster becomes the source of truth for your client applications and data. \ No newline at end of file +From this point onward, the clusters will diverge, and the target cluster becomes the source of truth for your client applications and data. + +== See also + +* https://www.datastax.com/events/migrating-your-legacy-cassandra-app-to-astra-db[Migrating your legacy {cass-reg} app to {astra-db}] \ No newline at end of file diff --git a/modules/ROOT/pages/deploy-proxy-monitoring.adoc b/modules/ROOT/pages/deploy-proxy-monitoring.adoc index 5f5067a2..748bd4f7 100644 --- a/modules/ROOT/pages/deploy-proxy-monitoring.adoc +++ b/modules/ROOT/pages/deploy-proxy-monitoring.adoc @@ -94,7 +94,7 @@ The default is 9042. ** For an {astra-db} database, leave this unset. * `*_astra_secure_connect_bundle_path`, `*_astra_db_id`, and `*_astra_token`: ** For a self-managed cluster, leave all of these unset. -** For an {astra-db} database, provide either `*_astra_secure_connect_bundle_path` _or both_ `*_astra_db_id` and `*_astra_token`. +** For an {astra-db} database, provide either `*_astra_secure_connect_bundle_path` or both `*_astra_db_id` and `*_astra_token`. *** If you want {product-automation} to automatically download your database's {scb}, use `*_astra_db_id` and `*_astra_token`. Set `*_astra_db_id` to your xref:astra-db-serverless:databases:create-database.adoc#get-db-id[database's ID], and set `*_astra_token` to your application token, which is prefixed by `AstraCS:`. *** If you want to manually upload your database's {scb-short} to the jumphost, use `*_astra_secure_connect_bundle_path`. diff --git a/modules/ROOT/pages/enable-async-dual-reads.adoc b/modules/ROOT/pages/enable-async-dual-reads.adoc index 8c60b71c..980ee9c4 100644 --- a/modules/ROOT/pages/enable-async-dual-reads.adoc +++ b/modules/ROOT/pages/enable-async-dual-reads.adoc @@ -1,81 +1,98 @@ -= Phase 3: Enable asynchronous dual reads -:page-tag: migration,zdm,zero-downtime,zdm-proxy,async-reads += Enable asynchronous dual reads +:description: Use asynchronous dual reads to test your target database's ability to handle a simulated production workload. -In this phase, you can optionally enable asynchronous dual reads. -The idea is to test performance and verify that the target cluster can handle your application's live request load before cutting over from the origin cluster to the target cluster. +In this optional phase, you can enable the _asynchronous dual reads_ feature to test the secondary (target) cluster's ability to handle a production workload before you permanently redirect read requests in xref:ROOT:change-read-routing.adoc[phase 4]. -image::migration-phase3ra.png[Phase 3 diagram shows optional step enabling async dual reads to test performance of the target.] +By default, {product-proxy} sends all reads to the primary (origin) cluster, and then returns the result to the client application. -//For illustrations of all the migration phases, see the xref:introduction.adoc#_migration_phases[Introduction]. +When you enable _asynchronous dual reads_, {product-proxy} sends asynchronous read requests to the secondary cluster in addition to the synchronous read requests that are sent to the primary cluster. -[TIP] -==== -As you test the performance on the target, be sure to examine the async read metrics. -As noted in the xref:#_validating_performance_and_error_rate[section] below, you can learn more in xref:metrics.adoc#_asynchronous_read_requests_metrics[Asynchronous read requests metrics]. -==== +At this point in the migration process, the secondary cluster is typically the target cluster. +Because this feature is intended to test your target cluster's ability to handle a simulated production workload, there is no reason to run it against the origin cluster that is already capable of handling production workloads. -== Steps +image:migration-phase3ra.png["Migration phase 3 diagram with asynchronous dual reads sent to the secondary cluster."] -The steps consist of changing the `read_mode` configuration variable in `vars/zdm_proxy_core_config.yml` from `PRIMARY_ONLY` (the default) to `DUAL_ASYNC_ON_SECONDARY`. +This allows you to assess the target cluster's performance and make any adjustments before permanently switching your workloads to the target cluster. -Example: +== Response and error handling with asynchronous dual reads +With or without asynchronous dual reads, the client application only receives results from synchronous reads on the primary cluster. +The client never receives results from asynchronous reads on the secondary cluster because these results are used only for {product-proxy}'s asynchronous dual read metrics. + +By design, if an asynchronous read fails or times out, it has no impact on client operations and the client application doesn't receive an error. +However, the increased workload from read requests can cause write requests to fail or time out on the secondary cluster. +With or without asynchronous dual reads, a failed write on either cluster returns an error to the client application and potentially triggers a retry. + +This functionality is intentional so you can simulate production-scale read traffic on the secondary cluster, in addition to the existing write traffic from {product-proxy}'s xref:components.adoc#how-zdm-proxy-handles-reads-and-writes[dual writes], with the least impact to your applications. + +To avoid unnecessary failures due to unmigrated data, enable asynchronous dual reads only after you migrate, validate, and reconcile all data from the origin cluster to the target cluster. + +[#configure-asynchronous-dual-reads] +== Configure asynchronous dual reads + +Use the `read_mode` variable to enable or disable asynchronous dual reads. +Then, perform rolling restarts of your {product-proxy} instances to apply the configuration change. + +. In `vars/zdm_proxy_core_config.yml`, edit the `read_mode` variable: ++ +[tabs] +====== +Enable asynchronous dual reads:: ++ +-- [source,yml] ---- read_mode: DUAL_ASYNC_ON_SECONDARY ---- +-- -Before making the change, you should still have the origin as the primary cluster, which is the default: - +Disable asynchronous dual reads (default):: ++ +-- [source,yml] ---- -primary_cluster: ORIGIN # or empty +read_mode: PRIMARY_ONLY ---- +-- +====== + +. Perform rolling restarts to apply the configuration change to your {product-proxy} instances. ++ +[tabs] +====== +With {product-automation}:: ++ +-- +If you use {product-automation} to manage your {product-proxy} deployment, run the following command: + +[source,bash] +---- +ansible-playbook rolling_update_zdm_proxy.yml -i zdm_ansible_inventory +---- +-- -To apply this change, run the `rolling_update_zdm_proxy.yml` playbook as explained xref:manage-proxy-instances.adoc#change-mutable-config-variable[here]. - -[NOTE] -==== -This optional phase introduces an additional check to make sure that the target can handle the load without timeouts or unacceptable latencies. -You would typically perform this step once you have migrated all the existing data from the origin cluster and completed all validation checks and reconciliation, if necessary. -==== - -== Asynchronous Dual Reads mode - -When using the {product-proxy}, all writes are synchronously sent to both the origin and target clusters. -Reads operate differently: with the default read mode, reads are only sent to the primary cluster (Origin by default). - -In Phase 4, you will change the read routing so that reads are routed to the target. -Before you do this, you might want to temporarily send the reads to both clusters to make sure that the target can handle the full workload of reads and writes. - -If you set the proxy's `read_mode` configuration variable to `DUAL_ASYNC_ON_SECONDARY`, then asynchronous dual reads will be enabled. -That change will result in reads being additionally sent to the secondary cluster. -The proxy will return the read response to the client application as soon as the primary cluster's response arrives. - -The secondary cluster's response will only be used to track metrics. -There will be no impact to the client application if the read fails on the secondary cluster, or if the read performance on the secondary cluster is degraded. -Therefore, you can use this feature as a safer way to test the full workload on the target before setting the target as the primary cluster in Phase 4 +Without {product-automation}:: ++ +-- +If you don't use {product-automation}, you must manually restart each instance. -[NOTE] -==== -In some cases the additional read requests can cause the write requests to fail or timeout on that cluster. -This means that, while this feature provides a way to route read requests to the target with a lower chance of having impact on the client application, it doesn't completely eliminate that chance. -==== +To avoid downtime, wait for each instance to fully restart and begin receiving traffic before restarting the next instance. +-- +====== ++ +For more information about rolling restarts and changing {product-proxy} configuration variables, see xref:manage-proxy-instances.adoc[]. -[[_validating_performance_and_error_rate]] -== Validating performance and error rate +== Monitor the target cluster's performance -Because the client application is not impacted by these asynchronous reads, the only way to measure the performance and error rate of these asynchronous reads are: +After enabling asynchronous dual reads, observe the target cluster's performance to determine how well it performs under the expected production workload. -* Check the metrics of the cluster itself -* Check the asynchronous reads section of the {product-proxy} metrics +To assess performance, you can monitor the following: -In the {product-proxy} Grafana dashboard that the {product-automation} is able to deploy, there is a section dedicated to asynchronous reads where you can see latency percentiles, error rates, and some other metrics specific to these requests. +* Cluster health metrics like latency, throughput, and error rate +* {product-proxy}'s xref:metrics.adoc#_asynchronous_read_requests_metrics[asynchronous read requests metrics] -For more, see xref:metrics.adoc#_asynchronous_read_requests_metrics[Asynchronous read requests metrics]. +If needed, adjust the target cluster's configuration and continue monitoring until the cluster reaches your performance targets. -== Reminder to switch off async dual reads +== Next steps -Once you are satisfied that your target cluster is ready and tuned appropriately to handle the production read load, you can switch your sync reads to the target permanently. -At this point, be sure to also disable async dual reads by reverting `read_mode` in `vars/zdm_proxy_core_config.yml` to `PRIMARY_ONLY`. -For more information and instructions, see xref:change-read-routing.adoc[]. +When you are confident that the target cluster is prepared to handle production workloads, you can <>, and then permanently xref:ROOT:change-read-routing.adoc[route read requests to the target cluster]. \ No newline at end of file diff --git a/modules/ROOT/pages/faqs.adoc b/modules/ROOT/pages/faqs.adoc index ef610962..f0ec037b 100644 --- a/modules/ROOT/pages/faqs.adoc +++ b/modules/ROOT/pages/faqs.adoc @@ -5,6 +5,7 @@ If you're new to the {company} {product} features, these FAQs are for you. //TODO: Eliminate redundancies in these FAQs and the Glossary. +//FAQs in ZDM-proxy repo: https://github.com/datastax/zdm-proxy/blob/main/faq.md#what-versions-of-apache-cassandra-or-cql-compatible-data-stores-does-the-zdm-proxy-support == What is meant by {product}? @@ -66,7 +67,7 @@ Bottom line: You want to migrate your critical database infrastructure without r == Which releases of {cass-short} or {dse-short} are supported for migrations? -include::ROOT:partial$migration-scenarios.adoc[] +See xref:ROOT:zdm-proxy-migration-paths.adoc[]. == Does {product-short} migrate clusters? @@ -127,12 +128,14 @@ For TLS details, see xref:tls.adoc[]. == How does {product-proxy} handle Lightweight Transactions (LWTs)? +//TODO: Compare and replace with link to LWT section on feasibility-checklists.adoc + {product-proxy} handles LWTs as write operations. The proxy sends the LWT to the origin and target clusters concurrently, and waits for a response from both. {product-proxy} will return a `success` status to the client if both the origin and target clusters send successful acknowledgements. Otherwise, it will return a `failure` status if one or both do not return an acknowledgement. -What sets LWTs apart from regular writes is that they are conditional. For important details, including the client context for a returned `applied` flag, see xref:feasibility-checklists.adoc#_lightweight_transactions_and_the_applied_flag[Lightweight Transactions and the `applied` flag]. +What sets LWTs apart from regular writes is that they are conditional. For important details, including the client context for a returned `applied` flag, see xref:feasibility-checklists.adoc#_lightweight_transactions_and_the_applied_flag[Lightweight transactions and the applied flag]. == Can {product-proxy} be deployed as a sidecar? diff --git a/modules/ROOT/pages/feasibility-checklists.adoc b/modules/ROOT/pages/feasibility-checklists.adoc index e8825d26..adb36997 100644 --- a/modules/ROOT/pages/feasibility-checklists.adoc +++ b/modules/ROOT/pages/feasibility-checklists.adoc @@ -11,7 +11,7 @@ If your database doesn't meet these requirements, you can still complete the mig {product-proxy} supports protocol versions `v3`, `v4`, `DSE_V1`, and `DSE_V2`. -//TODO: Verify v5 status +//TODO: V5 status: https://github.com/datastax/zdm-proxy/blob/main/faq.md#what-versions-of-apache-cassandra-or-cql-compatible-data-stores-does-the-zdm-proxy-support {product-proxy} technically doesn't support `v5`. If `v5` is requested, the proxy handles protocol negotiation so that the client application properly downgrades the protocol version to `v4`. This means that any client application using a recent driver that supports protocol version `v5` can be migrated using the {product-proxy} (as long as it does not use v5-specific functionality). @@ -38,6 +38,8 @@ TODO: Need to verify as these are in conflict with other information in this gui {dse-short} 4.6 migration support may be introduced when protocol version v2 is supported. * {astra-db}. + +See also: https://github.com/datastax/zdm-proxy/blob/main/faq.md#what-versions-of-apache-cassandra-or-cql-compatible-data-stores-does-the-zdm-proxy-support //// [TIP] @@ -126,7 +128,17 @@ Some application workloads can tolerate inconsistent data in some cases (especia ==== [[_lightweight_transactions_and_the_applied_flag]] -=== Lightweight Transactions and the `applied` flag +=== Lightweight transactions and the applied flag + +//TODO: Align with the write request language on components.adoc + +//// +The ZDM proxy can bifurcate lightweight transactions to the ORIGIN and TARGET clusters. +However, it only returns the applied flag from one cluster, whichever cluster is the source of truth. +Given that there are two separate clusters involved, the state of each cluster may be different. +For conditional writes, this may create a divergent state for a time. +It may not make a difference in many cases, but if lightweight transactions are used, we would recommend a reconciliation phase in the migration before switching reads to rely on the TARGET cluster. +//// {product-proxy} handles LWTs as write operations. The proxy sends the LWT to the origin and target clusters concurrently, and then waits for a response from both. diff --git a/modules/ROOT/pages/glossary.adoc b/modules/ROOT/pages/glossary.adoc index 08138cae..7586b12a 100644 --- a/modules/ROOT/pages/glossary.adoc +++ b/modules/ROOT/pages/glossary.adoc @@ -17,9 +17,11 @@ For details about the playbooks available in {product-automation}, see: [[_asynchronous_dual_reads]] == Asynchronous dual reads -An optional testing phase in which reads are sent to both the origin and target clusters. -This lets you check that the target cluster/database can handle the full workload of reads and writes before you finalize the migration and moving off the {product-proxy} instances. -For details, see xref:enable-async-dual-reads.adoc[]. +An optional feature that is designed to test the target cluster's ability to handle a production workload before you permanently switch to the target cluster at the end of the migration process. + +When enabled, {product-proxy} sends asynchronous read requests to the secondary cluster (typically the target cluster) in addition to the synchronous read requests that are sent to the primary cluster by default. + +For more information, see xref:ROOT:enable-async-dual-reads.adoc[]. == CQL @@ -36,17 +38,21 @@ See the diagram in the xref:introduction.adoc#migration-workflow[workflow introd [[origin]] == Origin -Your existing {cass-short}-based cluster, whether it's {cass-reg}, {dse}, or {astra-db}. +Your existing {cass-short}-based database that you are migrating away from. +It is the opposite of the <>. [[_primary_cluster]] == Primary cluster -The cluster that is currently considered the "primary" source of truth. -While writes are always sent to both clusters, the primary cluster is the one to which all synchronous reads are always sent, and their results are returned to the client application. -During a migration, the origin cluster is typically the primary cluster. -Near the end of the migration, you shift the primary cluster to be the target cluster. +The database that is designated as the source of truth for read requests. +It is the opposite of the <>. + +The primary cluster is set by {product-automation} through the `primary_cluster` variable, or you can set it directly through the `ZDM_PRIMARY_CLUSTER` environment variable for {product-proxy}. -For more, see <>. +For the majority of the migration process, the <> is typically the primary cluster. +Near the end of the migration, you shift the primary cluster to the <>. + +For information about which cluster receives reads and writes during the migration process, see xref:components.adoc#how-zdm-proxy-handles-reads-and-writes[How {product-proxy} handles reads and writes]. == Playbooks @@ -61,22 +67,18 @@ In our context here, see <>. == Read mirroring -See xref:glossary.adoc#_asynchronous_dual_reads[Asynchronous dual reads]. +See <<_asynchronous_dual_reads>>. [[secondary-cluster]] == Secondary cluster -During a migration, the secondary cluster is the one that is currently **not** the source of truth. - -When using the {product-proxy}, all writes are synchronously sent to both the origin and target clusters. -Reads operate differently: with the default read mode, reads are only sent to the primary cluster (Origin by default). -In Phase 3 of a migration, you can optionally send the reads to both clusters temporarily if you want to verify that the target cluster can handle the full workload of reads and writes. +The database that isn't designated as the source of truth for read requests. +It is the opposite of the <<_primary_cluster>>. -If you set the proxy's read mode configuration variable (`read_mode`) to `DUAL_ASYNC_ON_SECONDARY`, then asynchronous dual reads are enabled. -That change results in reads being additionally sent to the secondary cluster. +For the majority of the migration process, the secondary cluster is the <>. +Near the end of the migration, the target database becomes the <<_primary_cluster>>, and then the <> becomes the secondary cluster. -For more, see xref:glossary.adoc#_primary_cluster[Primary cluster]. -Also see xref:enable-async-dual-reads.adoc[]. +For information about which cluster receives reads and writes during the migration process, see xref:components.adoc#how-zdm-proxy-handles-reads-and-writes[How {product-proxy} handles reads and writes]. [[_secure_connect_bundle_scb]] == {scb} @@ -87,7 +89,8 @@ For more information, see xref:astra-db-serverless:databases:secure-connect-bund [[target]] == Target -The new cluster to which you want to migrate client applications and data with zero downtime. +The database to which you are migrating your data and applications. +It is the opposite of the <>. [[zdm-automation]] == {product-automation} diff --git a/modules/ROOT/pages/introduction.adoc b/modules/ROOT/pages/introduction.adoc index 5aba240f..90d69d14 100644 --- a/modules/ROOT/pages/introduction.adoc +++ b/modules/ROOT/pages/introduction.adoc @@ -29,14 +29,12 @@ For more information about these tools, see xref:ROOT:components.adoc[]. When the migration is complete, the data is present in the new database, and you can update your client applications to connect exclusively to the new database. The old database becomes obsolete and can be removed. -== {product-short} requirements +== Requirements for zero downtime -True zero downtime migration is only possible if your database meets the minimum requirements described in xref:ROOT:feasibility-checklists.adoc[]. +True zero downtime migration is only possible if your database meets the minimum requirements, including cluster compatibility, described in xref:ROOT:feasibility-checklists.adoc[] If your database doesn't meet these requirements, you can still complete the migration, but downtime might be necessary to finish the migration. -=== Supported migration paths - -include::ROOT:partial$migration-scenarios.adoc[] +For more information, see xref:ROOT:feasibility-checklists.adoc[] == Migration phases @@ -54,6 +52,7 @@ At this point, your client applications are performing read/write operations wit image:pre-migration0ra9.png["Pre-migration environment."] +//The text from this note is duplicated on the feasibility checks page. [TIP] ==== For the migration to succeed, the origin and target clusters must have matching schemas. @@ -93,10 +92,13 @@ image:migration-phase2ra9a.png["Migration Phase 2."] === Phase 3: Enable asynchronous dual reads -In this phase, you can optionally enable asynchronous dual reads. -The idea is to test performance and verify that the target cluster can handle your application's live request load before cutting over from the origin to the target permanently. +In this optional phase, you can enable the _asynchronous dual reads_ feature to test the target cluster's ability to handle a production workload before you permanently switch your applications to the target cluster at the end of the migration process. + +When enabled, {product-proxy} sends asynchronous read requests to the secondary cluster in addition to the synchronous read requests that are sent to the primary cluster by default. + +For more information, see xref:ROOT:enable-async-dual-reads.adoc[] and xref:ROOT:components.adoc#how_zdm_proxy_handles_reads_and_writes[How {product-proxy} handles reads and writes]. -image:migration-phase3ra9.png["Migration Phase 3."] +image:migration-phase3ra.png["Migration Phase 3."] === Phase 4: Route reads to the target cluster diff --git a/modules/ROOT/pages/manage-proxy-instances.adoc b/modules/ROOT/pages/manage-proxy-instances.adoc index a22081e5..af0cf313 100644 --- a/modules/ROOT/pages/manage-proxy-instances.adoc +++ b/modules/ROOT/pages/manage-proxy-instances.adoc @@ -8,60 +8,62 @@ In this topic, we'll learn how to perform simple operations on your {product-pro * Change a mutable configuration variable * Upgrade the {product-proxy} version -All these operations can be easily done by running Ansible playbooks. +With {product-automation}, you can use Ansible playbooks for all of these operations. -Make sure you are connected to the Ansible Control Host Docker container. As explained before, you can do so from the jumphost machine by running: +== Perform a rolling restart of the proxies + +Rolling restarts of the {product-proxy} instances are useful to apply configuration changes or to upgrade the {product-proxy} version without impacting the availability of the deployment. +[tabs] +====== +With {product-automation}:: ++ +-- +If you use {product-automation} to manage your {product-proxy} deployment, you can use a dedicated playbook to perform rolling restarts of all {product-proxy} instances in a deployment: + +. Connect to the Ansible Control Host Docker container. +You can do this from the jumphost machine by running the following command: ++ [source,bash] ---- docker exec -it zdm-ansible-container bash ---- - -You will see a prompt like: - ++ +.Result +[%collapsible] +==== [source,bash] ---- ubuntu@52772568517c:~$ ---- +==== -== Perform a rolling restart of the proxies - -Although this operation is not required in any particular step of the migration, you may still find it convenient in some circumstances. -For this reason, there is a specific playbook that performs this operation. - -Connect to the Ansible Control Host Docker container as explained above and run: - +. Run the rolling restart playbook: ++ [source,bash] ---- -ansible-playbook rolling_restart_zdm_proxy.yml -i zdm_ansible_inventory +ansible-playbook rolling_update_zdm_proxy.yml -i zdm_ansible_inventory ---- ++ +While running, this playbook gracefully stops one container and waits for it to shut down before restarting the container. +Then, it calls the xref:deploy-proxy-monitoring.adoc#_indications_of_success_on_origin_and_target_clusters[readiness endpoint] to check the container's status: ++ +* If the check fails, the playbook repeats the check every five seconds for a maximum of six attempts. +If all six attempts fail, the playbook interrupts the entire rolling restart process. +* If the check succeeds, the playbook waits before proceeding to the next container. ++ +The default pause between containers is 10 seconds. +You can change the pause duration in `zdm-proxy-automation/ansible/vars/zdm_playbook_internal_config.yml`. +-- -This is all that is needed. - -[NOTE] -==== -This playbook simply restarts the existing {product-proxy} containers. -It does **not** apply any configuration change or change the version. - -If you wish to xref:change-mutable-config-variable[apply configuration changes] or xref:_upgrade_the_proxy_version[perform version upgrades] in a rolling fashion, follow the instructions in the respective sections. -==== - -This playbook restarts each proxy container one by one, without impacting the availability of the {product-proxy} deployment. It automates the following steps: - -. It stops one container gracefully, waiting for it to shut down. -. It starts the container again. -. It checks that the container has come up successfully by checking the readiness endpoint: -.. If unsuccessful, it repeats the check for six times at 5-second intervals and eventually interrupts the whole process if the check still fails. -.. If successful, it waits for a configurable interval and then moves on to the next container. - -The pause between the restart of each {product-proxy} instance defaults to 10 seconds. -To change this value, you can set the desired number of seconds in `zdm-proxy-automation/ansible/vars/zdm_playbook_internal_config.yml`. +Without {product-automation}:: ++ +-- +If you don't use {product-automation}, you must manually restart each instance. -[TIP] -==== -To check the state of your {product-proxy} instances, you have a couple of options. -See xref:deploy-proxy-monitoring.adoc#_indications_of_success_on_origin_and_target_clusters[Indications of success on origin and target clusters]. -==== +To avoid downtime, wait for each instance to fully restart and begin receiving traffic before restarting the next instance. +-- +====== [#access-the-proxy-logs] == Access the proxy logs @@ -84,26 +86,48 @@ To leave the logs open and continuously output the latest log messages, append t === Collect the logs -You can easily retrieve the logs of all {product-proxy} instances using a dedicated playbook (`collect_zdm_proxy_logs.yml`). -You can view the playbook's configuration values in `vars/zdm_proxy_log_collection_config.yml`, but no changes to it are required. +{product-automation} has a dedicated playbook, `collect_zdm_proxy_logs.yml`, that you can use to collect logs for all {product-proxy} instances in a deployment. -Connect to the Ansible Control Host container as explained above and run: +You can view the playbook's configuration in `vars/zdm_proxy_log_collection_config.yml`, but no changes are required to run it. +. Connect to the Ansible Control Host Docker container. +You can do this from the jumphost machine by running the following command: ++ [source,bash] ---- -ansible-playbook collect_zdm_proxy_logs.yml -i zdm_ansible_inventory +docker exec -it zdm-ansible-container bash ---- ++ +.Result +[%collapsible] +==== +[source,bash] +---- +ubuntu@52772568517c:~$ +---- +==== -This playbook creates a single zip file, called `zdm_proxy_logs_.zip`, containing the logs from all proxy instances, and stores it on the Ansible Control Host Docker container in the directory `/home/ubuntu/zdm_proxy_archived_logs`. - -To copy the zip file from the container to the jumphost, open a shell on the jumphost and run the following command: - +. Run the log collection playbook: ++ [source,bash] ---- -docker cp zdm-ansible-container:/home/ubuntu/zdm_proxy_archived_logs/ +ansible-playbook collect_zdm_proxy_logs.yml -i zdm_ansible_inventory ---- ++ +This playbook creates a single zip file, `zdm_proxy_logs_**TIMESTAMP**.zip`, that contains the logs from all proxy instances. +This archive is stored on the Ansible Control Host Docker container at `/home/ubuntu/zdm_proxy_archived_logs`. -The archive will be copied to the specified destination directory on the jumphost. +. To copy the archive from the container to the jumphost, open a shell on the jumphost, and then run the following command: ++ +[source,bash,subs="+quotes"] +---- +docker cp zdm-ansible-container:/home/ubuntu/zdm_proxy_archived_logs/zdm_proxy_logs_**TIMESTAMP**.zip **DESTINATION_DIRECTORY_ON_JUMPHOST** +---- ++ +Replace the following: ++ +* `**TIMESTAMP**`: The timestamp from the name of your log file archive +* `**DESTINATION_DIRECTORY_ON_JUMPHOST**`: The path to the directory where you want to copy the archive [[change-mutable-config-variable]] == Change a mutable configuration variable @@ -270,35 +294,86 @@ Then run the same playbook as above, with the following command: ansible-playbook rolling_update_zdm_proxy.yml -i zdm_ansible_inventory ---- -== Scaling operations +== Scale operations with {product-automation} -{product-automation} doesn't provide a way to perform scaling up/down operations in a rolling fashion out of the box. -If you need a larger {product-proxy} deployment, you have two options: +{product-automation} doesn't provide a way to scale operations up or down in a rolling fashion. +If you are using {product-automation} and you need a larger {product-proxy} deployment, you have two options: -. Creating a new deployment and moving your client applications to it. -This is the recommended approach, which can be done through the automation without any downtime. -. Adding more instances to the existing deployment. -This is slightly more manual and requires a brief downtime window. +Recommended: Create a new deployment:: +This is the recommended way to scale your {product-proxy} deployment because it requires no downtime. ++ +With this option, you create a new {product-proxy} deployment, and then move your client application to it: ++ +. xref:ROOT:setup-ansible-playbooks.adoc[Create a new {product-proxy} deployment] with the desired topology on a new set of machines. +. Change the contact points in the application configuration so that the application instances point to the new {product-proxy} deployment. +. Perform a rolling restart of the application instances to apply the new contact point configuration. ++ +The rolling restart ensures there is no interruption of service. +The application instances switch seamlessly from the old deployment to the new one, and they are able to serve requests immediately. +. After restarting all application instances, you can safely remove the old {product-proxy} deployment. -The first option requires that you deploy a new {product-proxy} cluster on the side, and move the client applications to this new proxy cluster. -This can be done by creating a new {product-proxy} deployment with the desired topology on a new set of machines (following the normal process), and then changing the contact points in the application configuration so that the application instances point to the new {product-proxy} deployment. +Add instances to an existing deployment:: +This option requires some manual effort and a brief amount of downtime. ++ +With this option, you change the topology of your existing {product-proxy} deployment, and then restart the entire deployment to apply the change: -This first option just requires a rolling restart of the application instances (to apply the contact point configuration update) and does not cause any interruption of service, because the application instances can just move seamlessly from the old deployment to the new one, which are able to serve requests straight away. +. Amend the inventory file so that it contains one line for each machine where you want to deploy a {product-proxy} instance. ++ +For example, if you want to add three nodes to a deployment with six nodes, then the amended inventory file must contain nine total IPs, including the six existing IPs and the three new IPs. -The second option consists of changing the topology of an existing {product-proxy} deployment. -For example, let's say that you wish to add three new nodes to an existing six-node deployment. -To do this, you need to amend the inventory file so that it contains one line for each machine where you want a proxy instance to be deployed (in this case, the amended inventory file will contain nine proxy IPs, six of which were already there plus the three new ones) and then run the `deploy_zdm_proxy.yml` playbook again. +. Run the `deploy_zdm_proxy.yml` playbook to apply the change and start the new instances. ++ +Rerunning the playbook stops the existing instances, destroys them, and then creates and starts a new deployment with new instances based on the amended inventory. +This results in a brief interruption of service for your entire {product-proxy} deployment. -This second option will stop the existing six proxies, destroy them, create a new nine-node deployment from scratch based on the amended inventory and start it up, therefore resulting in a brief interruption of availability of the whole {product-proxy} deployment. +== Scale {product-proxy} without {product-automation} -[NOTE] -==== -{product-proxy} containers can be be scaled out by any number of proxies as you see fit, not necessarily in multiples of three. -==== +If you aren't using {product-automation}, you can still add and remove {product-proxy} instances. + +[#add-an-instance] +Add an instance:: +. Prepare and configure the new {product-proxy} instances appropriately based on your other instances. ++ +Make sure the new instance's configuration references all planned {product-proxy} cluster nodes. +. On all {product-proxy} instances, add the new instance's address to the `ZDM_PROXY_TOPOLOGY_ADDRESSES` environment variable. ++ +Make sure to include all new nodes. +. On the new {product-proxy} instance, set the `ZDM_PROXY_TOPOLOGY_INDEX` to the next sequential integer after the greatest one in your existing deployment. +. Perform a rolling restart of all {product-proxy} instances, one at a time. + +Vertically scale existing instances:: +Use these steps to increase or decrease resources for existing {product-proxy} instances, such as CPU or memory. +To avoid downtime, perform the following steps on one instance at a time: ++ +. Stop the first {product-proxy} instance that you want to modify. +. Modify the instance's resources as required. ++ +Make sure the instance's IP address remains the same. +If the IP address changes, you need to <>. +. Restart the modified {product-proxy} instance. +. Wait until the instance starts, and then confirm that it is receiving traffic. +. Repeat these steps to modify each additional instance, one at a time. + +Remove an instance:: +. On all {product-proxy} instances, remove the unused instance's address from the `ZDM_PROXY_TOPOLOGY_ADDRESSES` environment variable. +. Perform a rolling restart of all remaining {product-proxy} instances. +. Clean up resources used by the removed instance, such as the container or VM. + +== Purpose of proxy topology addresses + +When you configure a {product-proxy} deployment, either through {product-automation} or manually-managed {product-proxy} instances, you specify the addresses of your instances. +These are populated in the `ZDM_PROXY_TOPOLOGY_ADDRESSES` variable, either manually or automatically depending on how you manage your instances. + +{cass-short} drivers look up nodes on a cluster by querying the `system.peers` table. +{product-proxy} uses the topology addresses to effectively respond to the driver's request for connection nodes. +If there are no topology addresses specified, {product-proxy} defaults to a single-instance configuration. +This means that driver connections will use only that one {product-proxy} instance, rather than all instances in your {product-proxy} deployment. + +If that one instance goes down, {product-proxy} won't know that there are other instances available, and your application can experience an outage. +Additionally, if you need to restart {product-proxy} instances, and there is only one instance specified in the topology addresses, your migration will have downtime while that one instance restarts. -If you are not using the {product-automation} and want to remove or add a proxy manually, follow these steps: +== See also -. If adding a {product-proxy} instance, prepare and configure it appropriately based on the other instances. -. Update the `ZDM_PROXY_TOPOLOGY_ADDRESSES` environment variable on all {product-proxy} instances - removing or adding the {product-proxy} instance's address to the list. -. Set the `ZDM_PROXY_TOPOLOGY_INDEX` on the new {product-proxy} instance to be the next sequential integer after the highest one in your existing deployment. -. Perform a rolling restart on all {product-proxy} instances. +* xref:ROOT:troubleshooting-tips.adoc[] +* xref:ROOT:troubleshooting-scenarios.adoc[] +* xref:deploy-proxy-monitoring.adoc#_indications_of_success_on_origin_and_target_clusters[Indications of success on origin and target clusters] \ No newline at end of file diff --git a/modules/ROOT/pages/metrics.adoc b/modules/ROOT/pages/metrics.adoc index 81b4d2e3..a6613905 100644 --- a/modules/ROOT/pages/metrics.adoc +++ b/modules/ROOT/pages/metrics.adoc @@ -18,7 +18,7 @@ For this reason, we strongly encourage you to monitor the {product-proxy}, eithe The Grafana dashboards are ready to go with metrics that are being scraped from the {product-proxy} instances. If you already have a Grafana deployment then you can import the dashboards from the two {product-short} dashboard files from this {product-automation-repo}/tree/main/grafana-dashboards[{product-automation} GitHub location]. - + == Grafana dashboard for {product-proxy} metrics There are three groups of metrics in this dashboard: @@ -29,13 +29,16 @@ There are three groups of metrics in this dashboard: image::zdm-grafana-proxy-dashboard1.png[Grafana dashboard shows three categories of {product-short} metrics for the proxy.] +[#proxy-level-metrics] === Proxy-level metrics -* Latency: -** Read Latency: total latency measured by the {product-proxy} (including post-processing like response aggregation) for read requests. -This metric has two labels (`reads_origin` and `reads_target`): the label that has data will depend on which cluster is receiving the reads, i.e. which cluster is currently considered the xref:glossary.adoc#_primary_cluster[primary cluster]. -This is configured by the {product-automation} through the variable `primary_cluster`, or directly through the environment variable `ZDM_PRIMARY_CLUSTER` of the {product-proxy}. -** Write Latency: total latency measured by the {product-proxy} (including post-processing like response aggregation) for write requests. +* Latency ++ +** Read Latency: Total latency measured by the {product-proxy} per read request, including post-processing, such as response aggregation. +This metric has two labels: `reads_origin` and `reads_target`. +The label that has data depends on which cluster is receiving the reads, which is the current xref:glossary.adoc#_primary_cluster[primary cluster]. +** Write Latency: Total latency measured by the {product-proxy} per write request, including post-processing, such as response aggregation. +This metric is measured as the total latency across both clusters for a single xref:ROOT:components.adoc#how-zdm-proxy-handles-reads-and-writes[bifurcated write request]. * Throughput (same structure as the previous latency metrics): ** Read Throughput @@ -69,9 +72,11 @@ To see error metrics by error type, see the node-level error metrics on the next [[_node_level_metrics]] === Node-level metrics -* Latency: metrics on this bucket are not split by request type like the proxy level latency metrics so writes and reads are mixed together: -** Origin: latency measured by the {product-proxy} up to the point it received a response from the origin connection. -** Target: latency measured by the {product-proxy} up to the point it received a response from the target connection. +* Latency: Node-level latency metrics report combined read and write latency per cluster, not per request. +For latency by request type, see <>. ++ +** Origin: Latency, as measured by {product-proxy}, up to the point that it received a response from the origin connection. +** Target: latency, as measured by {product-proxy}, up to the point it received a response from the target connection. * Throughput: same as node level latency metrics, reads and writes are mixed together. @@ -95,16 +100,14 @@ Possible values for the `error` type label: [[_asynchronous_read_requests_metrics]] === Asynchronous read requests metrics -These metrics are specific to asynchronous reads, so they are only populated if asynchronous dual reads are enabled. -This is done by setting the {product-automation} variable `read_mode`, or its equivalent environment variable `ZDM_READ_MODE`, to `DUAL_ASYNC_ON_SECONDARY` as explained xref:enable-async-dual-reads.adoc[here]. +These metrics are only recorded if you xref:ROOT:enable-async-dual-reads.adoc[enable asynchronous dual reads]. -These metrics track: +These metrics track the following information for asynchronous read requests: -* Latency. -* Throughput. -* Number of dedicated connections per node for async reads: whether it's origin or target connections depends on the {product-proxy} configuration. -That is, if the primary cluster is the origin cluster, then the asynchronous reads are sent to the target cluster. -* Number of errors per error type per node. +* Latency +* Throughput +* Number of dedicated connections per node for the cluster receiving the asynchronous read requests +* Number of errors per node, separated by error type === Insights via the {product-proxy} metrics diff --git a/modules/ROOT/pages/setup-ansible-playbooks.adoc b/modules/ROOT/pages/setup-ansible-playbooks.adoc index e2cda141..aefbbcec 100644 --- a/modules/ROOT/pages/setup-ansible-playbooks.adoc +++ b/modules/ROOT/pages/setup-ansible-playbooks.adoc @@ -104,7 +104,7 @@ ssh -F jumphost . From the jumphost, download the latest {product-utility} executable from the {product-automation-repo}/releases[{product-automation} GitHub repository] {product-automation-shield}. + -The package filename format is `zdm-util-**PLATFORM**-**VERSION***.tgz`. +The package filename format is `zdm-util-**PLATFORM**-**VERSION**.tgz`. The following example downloads {product-utility} version 2.3.0 for Linux amd64. To download a different package, change the version and package filename accordingly. + diff --git a/modules/ROOT/pages/troubleshooting-scenarios.adoc b/modules/ROOT/pages/troubleshooting-scenarios.adoc index 39e1c95d..c1bef973 100644 --- a/modules/ROOT/pages/troubleshooting-scenarios.adoc +++ b/modules/ROOT/pages/troubleshooting-scenarios.adoc @@ -439,7 +439,7 @@ We encourage you to upgrade to that version or greater. By default, {product-proxy} now sends heartbeats after 30 seconds of inactivity on a cluster connection, to keep it alive. You can tune the heartbeat interval with the Ansible configuration variable `heartbeat_insterval_ms`, or by directly setting the `ZDM_HEARTBEAT_INTERVAL_MS` environment variable if you do not use the {product-automation}. -== Performance degradation with {product-short} +== Performance degradation with {product-proxy} === Symptoms @@ -447,22 +447,28 @@ Consider a case where a user runs separate benchmarks against: * {astra-db} directly * Origin directly -* {product-short} (with {astra-db} and the origin cluster) +* {product-proxy} (with {astra-db} and the origin cluster) -The results of these tests show latency/throughput values are worse with {product-short} than when connecting to {astra-db} or origin cluster directly. +The results of these tests show latency/throughput values are worse with {product-proxy} than when connecting to {astra-db} or origin cluster directly. === Cause -{product-short} will always add additional latency which, depending on the nature of the test, will also result in a lower throughput. +{product-short} always increases latency and, depending on the nature of the test, reduces throughput. Whether this performance hit is expected or not depends on the difference between the {product-short} test results and the test results with the cluster that performed the worst. -Writes in {product-short} require an `ACK` from both clusters while reads only require the result from the origin cluster (or target if the proxy is set up to route reads to the target cluster). +Writes in {product-short} require successful acknowledgement from both clusters, while reads only require the result from the primary cluster, which is typically the origin cluster. This means that if the origin cluster has better performance than the target cluster, then {product-short} will have worse write performance. -From our testing benchmarks, a performance degradation of up to 2x latency is not unheard of even without external factors adding more latency, but it is still worth checking some things that might add additional latency like whether the proxy is deployed on the same Availability Zone (AZ) as the origin cluster or application instances. +It is typical for latency to increase with {product-proxy}. +To minimize performance degradation with {product-proxy}, note the following: -Simple statements and batch statements are things that will make the proxy add additional latency compared to normal prepared statements. -Simple statements should be discouraged especially with the {product-proxy} because currently the proxy takes a considerable amount of time just parsing the queries and with prepared statements the proxy only has to parse them once. +* Make sure your {product-proxy} infrastructure or configuration doesn't unnecessarily increase latency. +For example, make sure your {product-proxy} instances are in the same availability zone (AZ) as your origin cluster or application instances. +* Understand the impact of simple and batch statements on latency, as compared to typical prepared statements. ++ +Avoid simple statements with {product-proxy} because they require significant time for {product-proxy} to parse the queries. ++ +In contrast, prepared statements are parsed once, and then reused on subsequent requests, if repreparation isn't required. === Solution or Workaround diff --git a/modules/ROOT/pages/troubleshooting-tips.adoc b/modules/ROOT/pages/troubleshooting-tips.adoc index fb539c13..80b468e1 100644 --- a/modules/ROOT/pages/troubleshooting-tips.adoc +++ b/modules/ROOT/pages/troubleshooting-tips.adoc @@ -145,6 +145,34 @@ Don't use `--rm` when you launch the {product-proxy} container. This flag will prevent you from accessing the logs when {product-proxy} stops or crashes. ==== +== Query system.peers and system.local to check for {product-proxy} configuration issues + +Querying `system.peers` and `system.local` can help you investigate {product-proxy} configuration issues: + +. xref:ROOT:connect-clients-to-proxy.adoc#connect-the-cql-shell-to-zdm-proxy[Connect CQL shell to a {product-proxy} instance.] + +. Query `system.peers`: ++ +[source,cql] +---- +SELECT * FROM system.peers +---- + +. Query `system.local`: ++ +[source,cql] +---- +SELECT * FROM system.local +---- + +. Repeat for each of your {product-proxy} instances. ++ +Because `system.peers` and `system.local` reflect the local {product-proxy} instance's configuration, you need to query all instances to get all information and identify potential misconfigurations. + +. Inspect the results for values related to an error that you are troubleshooting, such as IP addresses or tokens. ++ +For example, you might compare `cluster_name` to ensure that all instances are connected to the same cluster, rather than mixing contact points from different clusters. + == Report an issue To report an issue or get additional support, submit an issue in the {product-short} component GitHub repositories: diff --git a/modules/ROOT/pages/zdm-proxy-migration-paths.adoc b/modules/ROOT/pages/zdm-proxy-migration-paths.adoc index 57328f05..a42b90c9 100644 --- a/modules/ROOT/pages/zdm-proxy-migration-paths.adoc +++ b/modules/ROOT/pages/zdm-proxy-migration-paths.adoc @@ -1,6 +1,20 @@ = Cluster compatibility for {product} :description: Learn which sources and targets are eligible for {product}. +True zero downtime migration is only possible if your database meets the minimum requirements described in xref:ROOT:feasibility-checklists.adoc[], including compatibility of the source and target clusters. + +== Compatible source and target clusters for migrations with zero downtime + include::ROOT:partial$migration-scenarios.adoc[] -For more {product} requirements, see xref:ROOT:feasibility-checklists.adoc[]. \ No newline at end of file +== Incompatible clusters and migrations with some downtime + +If you don't want to use {product-proxy} or your databases don't meet the zero-downtime requirements, you can still complete the migration, but some downtime might be necessary to finish the migration. + +If your clusters are incompatible, you might be able to use data migration tools such as xref:ROOT:dsbulk-migrator-overview.adoc[{dsbulk-migrator}] or a custom data migration script. +Make sure you transform or prepare the data to comply with the target cluster's schema. + +== See also + +* xref:ROOT:components.adoc[] +* xref:ROOT:feasibility-checklists.adoc[] \ No newline at end of file diff --git a/modules/sideloader/pages/migrate-sideloader.adoc b/modules/sideloader/pages/migrate-sideloader.adoc index 1be294d4..8a402900 100644 --- a/modules/sideloader/pages/migrate-sideloader.adoc +++ b/modules/sideloader/pages/migrate-sideloader.adoc @@ -786,4 +786,5 @@ include::sideloader:partial$sideloader-partials.adoc[tags=validate] == See also * xref:sideloader:cleanup-sideloader.adoc[] -* xref:sideloader:troubleshoot-sideloader.adoc[] \ No newline at end of file +* xref:sideloader:troubleshoot-sideloader.adoc[] +* https://www.datastax.com/events/migrating-your-legacy-cassandra-app-to-astra-db[Migrating your legacy {cass-reg} app to {astra-db}] \ No newline at end of file diff --git a/modules/sideloader/pages/sideloader-overview.adoc b/modules/sideloader/pages/sideloader-overview.adoc index c29de547..8e056338 100644 --- a/modules/sideloader/pages/sideloader-overview.adoc +++ b/modules/sideloader/pages/sideloader-overview.adoc @@ -40,10 +40,14 @@ For more information, see xref:sideloader:prepare-sideloader.adoc[]. === Create snapshot backups {sstable-sideloader} uses snapshot backup files to import SSTable data from your existing origin cluster. -This is an ideal approach for database migrations because creating a snapshot has negligible performance impact on the origin cluster, and it preserves metadata like write timestamps and expiration times (TTLs). - Each snapshot for each node in the origin cluster must include all the keyspaces and individual CQL tables that you want to migrate. +These snapshots are ideal for database migrations because creating snapshots has a negligible performance impact on the origin cluster, and the snapshots preserve metadata like `writetime` and `ttl` values. + +When using {sstable-sideloader} with {product-proxy}, {cass-short}'s last-write-wins semantics ensure that new, real-time writes accurately take precedence over historical writes. +Last-write-wins compares the `writetime` of conflicting records, and then retains the most recent write. +For example, if a new write occurs in your target database with a `writetime` of `2023-10-01T12:05:00Z`, and then {sstable-sideloader} migrates a record against the same row with a `writetime` of `2023-10-01T12:00:00Z`, the target database retains the data from the new write because it has the most recent `writetime`. + For more information, see xref:sideloader:migrate-sideloader.adoc#create-snapshots[Migrate data with {sstable-sideloader}: Create snapshots]. === Prepare the target database