From 221a686061c8f567f551918fa1c20e3ccdc38848 Mon Sep 17 00:00:00 2001 From: April M Date: Tue, 3 Jun 2025 08:39:36 -0700 Subject: [PATCH 1/8] fix attributes --- modules/ROOT/pages/migrate-and-validate-data.adoc | 13 +++++-------- modules/sideloader/pages/prepare-sideloader.adoc | 4 ++-- modules/sideloader/pages/sideloader-overview.adoc | 2 +- modules/sideloader/pages/sideloader-zdm.adoc | 2 +- .../sideloader/partials/sideloader-partials.adoc | 4 ++-- 5 files changed, 11 insertions(+), 14 deletions(-) diff --git a/modules/ROOT/pages/migrate-and-validate-data.adoc b/modules/ROOT/pages/migrate-and-validate-data.adoc index e32b223f..79cdb4da 100644 --- a/modules/ROOT/pages/migrate-and-validate-data.adoc +++ b/modules/ROOT/pages/migrate-and-validate-data.adoc @@ -1,12 +1,9 @@ -= Phase 2: Migrate and validate data -:page-tag: migration,zdm,zero-downtime,validate-data += Migrate and validate data In Phase 2 of {product}, you migrate data from the origin to the target, and then validate the migrated data. image::migration-phase2ra.png[In {product-short} Phase 2, you migrate data from the origin cluster to the target cluster.] -//For illustrations of all the migration phases, see the xref:introduction.adoc#_migration_phases[Introduction]. - To move and validate data, you can use a dedicated data migration tool, such as {sstable-sideloader}, {cass-migrator}, or {dsbulk-migrator}, or your can create your own custom data migration script. // tag::migration-tool-summaries[] @@ -15,18 +12,18 @@ To move and validate data, you can use a dedicated data migration tool, such as {sstable-sideloader} is a service running in {astra-db} that directly imports data from snapshots of your existing {cass-short}-based cluster. This tool is exclusively for migrations that move data to {astra-db}. -You can use {sstable-sideloader} alone or in the context of {product-short}. +You can use {sstable-sideloader} alone or with {product-proxy}. For more information, see xref:sideloader:sideloader-zdm.adoc[]. == {cass-migrator} You can use {cass-migrator} ({cass-migrator-short}) to migrate and validate tables between {cass-short}-based clusters. -It is best for migrating large amounts of data and for migrations that need support for detailed logging, data verification, table column renaming, and reconciliation. +It is best for large migrations and for migrations that need advanced features, such as for detailed logging, data verification, table column renaming, and reconciliation. {cass-migrator-short} offers extensive functionality and configuration options to support large and complex migrations as well as post-migration data validation. -You can use {cass-migrator-short} by itself, in the context of {product-short}, or for data validation after using another migration tool, such as {sstable-sideloader}. +You can use {cass-migrator-short} by itself, with {product-proxy}, or for data validation after using another data migration tool, such as {sstable-sideloader}. For more information, see xref:ROOT:cassandra-data-migrator.adoc[]. @@ -38,7 +35,7 @@ It is best for smaller migrations or migrations that don't require data validati In addition to loading and unloading CSV and JSON data, you can use {dsbulk-migrator} to transfer data between databases. It can read data from a table in your origin database, and then write that data to a table in your target database. -You can use {dsbulk-migrator} alone or in the context of {product-short}. +You can use {dsbulk-migrator} alone or with {product-proxy}. For more information, see xref:ROOT:dsbulk-migrator.adoc[]. diff --git a/modules/sideloader/pages/prepare-sideloader.adoc b/modules/sideloader/pages/prepare-sideloader.adoc index 9424e1a8..f79419d3 100644 --- a/modules/sideloader/pages/prepare-sideloader.adoc +++ b/modules/sideloader/pages/prepare-sideloader.adoc @@ -167,7 +167,7 @@ If you choose the alternative option, you must modify the commands accordingly f * *{sstable-sideloader} doesn't support encrypted data*: If your origin cluster uses xref:6.9@dse:securing:transparent-data-encryption.adoc[{dse-short} Transparent Data Encryption], be aware that {sstable-sideloader} cannot migrate these SSTables. + If you have a mix of encrypted and unencrypted data, you can use {sstable-sideloader} to migrate the unencrypted data. -After the initial migration, you can use another strategy to move the encrypted data, such as {cass-migrator-repo}[{cass-short} Data Migrator (CDM)] or a manual export and reupload. +After the initial migration, you can use another strategy to move the encrypted data, such as {cass-migrator-repo}[{cass-migrator} ({cass-migrator-short})] or a manual export and reupload. * *{sstable-sideloader} doesn't support secondary indexes*: If you don't remove or replace these in your origin cluster, {sstable-sideloader} ignores these directories when importing the data to your {astra-db} database. @@ -179,7 +179,7 @@ Your administration server must have SSH access to each node in your origin clus {company} recommends that you install the following additional software on your administration server: -* {cass-migrator-repo}[{cass-short} Data Migrator (CDM)] to validate imported data and, in the context of {product}, reconcile it with the origin cluster. +* {cass-migrator-repo}[{cass-migrator} ({cass-migrator-short})] to validate imported data and, with {product-proxy}, reconcile it with the origin cluster. * https://jqlang.github.io/jq/[jq] to format JSON responses from the {astra} {devops-api}. The {devops-api} commands in this guide use this tool. diff --git a/modules/sideloader/pages/sideloader-overview.adoc b/modules/sideloader/pages/sideloader-overview.adoc index 8e056338..53a286a9 100644 --- a/modules/sideloader/pages/sideloader-overview.adoc +++ b/modules/sideloader/pages/sideloader-overview.adoc @@ -4,7 +4,7 @@ {sstable-sideloader} is a service running in {astra-db} that directly imports data from snapshot backups that you've uploaded to {astra-db} from an existing {cass-reg}, {dse}, or {hcd} cluster. -Because it imports data directly, {sstable-sideloader} can offer several advantages over CQL-based tools like xref:dsbulk:overview:dsbulk-about.adoc[{company} Bulk Loader (DSBulk)] and xref:ROOT:cassandra-data-migrator.adoc[{cass-short} Data Migrator (CDM)], including faster, more cost-effective data loading, and minimal performance impacts on your origin cluster and target database. +Because it imports data directly, {sstable-sideloader} can offer several advantages over CQL-based tools like xref:dsbulk:overview:dsbulk-about.adoc[{company} Bulk Loader (DSBulk)] and xref:ROOT:cassandra-data-migrator.adoc[{cass-migrator} ({cass-migrator-short})], including faster, more cost-effective data loading, and minimal performance impacts on your origin cluster and target database. == {sstable-sideloader} concepts diff --git a/modules/sideloader/pages/sideloader-zdm.adoc b/modules/sideloader/pages/sideloader-zdm.adoc index f3524186..8e2b5e8c 100644 --- a/modules/sideloader/pages/sideloader-zdm.adoc +++ b/modules/sideloader/pages/sideloader-zdm.adoc @@ -7,7 +7,7 @@ For compatible origin clusters, see xref:ROOT:astra-migration-paths.adoc[]. Because it imports data directly, {sstable-sideloader} can offer several advantages over CQL-based tools like {dsbulk-migrator} and {cass-migrator}, including faster, more cost-effective data loading, and minimal performance impacts on your origin cluster and target database. -{sstable-sideloader} uses the {astra} {devops-api}, your cloud provider's CLI, and `nodetool`. +To migrate data with {sstable-sideloader}, you use the {astra} {devops-api}, your cloud provider's CLI, and `nodetool`. include::sideloader:partial$sideloader-partials.adoc[tags=sideloader-zdm] diff --git a/modules/sideloader/partials/sideloader-partials.adoc b/modules/sideloader/partials/sideloader-partials.adoc index 46c5a350..14c5ca96 100644 --- a/modules/sideloader/partials/sideloader-partials.adoc +++ b/modules/sideloader/partials/sideloader-partials.adoc @@ -99,8 +99,8 @@ After this point, you must wait for the migration to finish, and then you can us // end::no-return[] // tag::sideloader-zdm[] -If you need to migrate a live database, you can use {sstable-sideloader} instead of DSBulk or {cass-short} Data Migrator during of xref:ROOT:migrate-and-validate-data.adoc[Phase 2 of {product} ({product-short})]. +If you need to migrate a live database, you can use {sstable-sideloader} instead of {dsbulk-migrator} or {cass-migrator} during of xref:ROOT:migrate-and-validate-data.adoc[Phase 2 of {product}]. -.Use {sstable-sideloader} in the context of {product}. +.Use {sstable-sideloader} with {product-proxy} svg::sideloader:astra-migration-toolkit.svg[] // end::sideloader-zdm[] \ No newline at end of file From 5d82e899e31fa147611e56cedb1f34ab243a4dd0 Mon Sep 17 00:00:00 2001 From: April M Date: Tue, 3 Jun 2025 09:14:51 -0700 Subject: [PATCH 2/8] attribute and redundancy cleanup --- modules/ROOT/pages/faqs.adoc | 43 +++++++++---------- .../ROOT/pages/troubleshooting-scenarios.adoc | 2 +- .../sideloader/pages/sideloader-overview.adoc | 2 +- modules/sideloader/pages/sideloader-zdm.adoc | 25 ++++++++--- 4 files changed, 41 insertions(+), 31 deletions(-) diff --git a/modules/ROOT/pages/faqs.adoc b/modules/ROOT/pages/faqs.adoc index f0ec037b..564f6f14 100644 --- a/modules/ROOT/pages/faqs.adoc +++ b/modules/ROOT/pages/faqs.adoc @@ -34,27 +34,19 @@ Yes, you can use the {product-short} interactive lab to see how the migration pr For more information, see xref:ROOT:introduction.adoc#lab[{product} interactive lab]. -== What components are provided with {product-short}? - -{company} {product} includes the following: - -* xref:glossary.adoc#zdm-proxy[**{product-proxy}**] is a service that operates between xref:glossary.adoc#origin[**Origin**], which is your existing cluster, and xref:glossary.adoc#target[**Target**], which is the cluster to which you are migrating. -* **{product-automation}** is an Ansible-based tool that allows you to deploy and manage the {product-proxy} instances and associated monitoring stack. -To simplify its setup, the suite includes the {product-utility}. -This interactive utility creates a Docker container acting as the Ansible Control Host. -The Ansible playbooks constitute the {product-automation}. -* **{cass-migrator}** is designed to: -** Connect to your clusters and compare the data between the origin and target clusters. -** Report differences in a detailed log file. -** Use AutoCorrect mode to reconcile any missing records and fix any data inconsistencies between the origin and target. -* **{dsbulk-migrator}** is provided to migrate smaller amounts of data from the origin to the target. -* Well-defined steps in this migration documentation, organized as a sequence of phases. +== What tools are available for {product}? + +To support live migrations, you can use {product-proxy}, {product-utility}, and {product-automation}. + +For data migration with or without downtime, you can use {sstable-sideloader}, {cass-migrator}, {dsbulk-migrator}, or custom data migration scripts. + +For more information, see xref:ROOT:components.adoc[]. == What exactly is {product-proxy}? {product-proxy} is a component designed to seamlessly handle the real-time client application activity while a migration is in progress. See xref:introduction.adoc#_role_of_zdm_proxy[here] for an overview. -== What are the benefits of {product} and its use cases? +== What are the benefits of {product-proxy} and its use cases? Migrating client applications between clusters is a need that arises in many scenarios. For example, you may want to: @@ -69,11 +61,14 @@ Bottom line: You want to migrate your critical database infrastructure without r See xref:ROOT:zdm-proxy-migration-paths.adoc[]. -== Does {product-short} migrate clusters? +== Does the {product} process migrate clusters? + +The {product} ({product-short}) process doesn't directly migrate clusters. +Instead, it migrates data and applications between clusters. -{product-short} does not migrate clusters. -With {product-short}, we are migrating data and applications *between clusters*. -At the end of the migration, your application will be running on your new cluster, which will have been populated with all the relevant data. +At the end of the migration process, your application runs exclusively on your new cluster, which was populated with data from the original cluster. + +{product-proxy} handles real-time requests generated by your client applications during the migration process, and keeps both clusters in sync through dual writes. == What challenges does {product-short} solve? @@ -83,7 +78,9 @@ Before {company} {product} was available, migrating client applications between == What is the pricing model? -The suite of {product} tools from {company} is free and open-sourced. +{product-proxy}, {product-utility}, {product-automation}, {cass-migrator}, and {dsbulk-migrator} are free and open-sourced. + +{sstable-sideloader} is part of an {astra-db} *Enterprise* subscription plan, and it incurs costs based on usage. == Is there support available if I have questions or issues during our migration? @@ -98,7 +95,9 @@ Additional examples serve as templates, from which you can learn about migration == Where are the public GitHub repos? -All the {company} {product} GitHub repos are public and open source. +//TODO: Move to contribution guide. + +All the {product-proxy} GitHub repos are public and open source. You are welcome to read the code and submit feedback via GitHub Issues per repo. In addition to sending feedback, you may submit Pull Requests (PRs) for potential inclusion. diff --git a/modules/ROOT/pages/troubleshooting-scenarios.adoc b/modules/ROOT/pages/troubleshooting-scenarios.adoc index c1bef973..d6182f29 100644 --- a/modules/ROOT/pages/troubleshooting-scenarios.adoc +++ b/modules/ROOT/pages/troubleshooting-scenarios.adoc @@ -154,7 +154,7 @@ None. Credentials are incorrect or have insufficient permissions. -There are three sets of credentials in play with {product-short}: +There are three sets of credentials in play with {product-proxy}: * Target: credentials that you set in the proxy configuration through the `ZDM_TARGET_USERNAME` and `ZDM_TARGET_PASSWORD` settings. diff --git a/modules/sideloader/pages/sideloader-overview.adoc b/modules/sideloader/pages/sideloader-overview.adoc index 53a286a9..dc3cbce9 100644 --- a/modules/sideloader/pages/sideloader-overview.adoc +++ b/modules/sideloader/pages/sideloader-overview.adoc @@ -113,7 +113,7 @@ For instructions and more information, see xref:sideloader:migrate-sideloader.ad include::sideloader:partial$sideloader-partials.adoc[tags=validate] -== Use {sstable-sideloader} with {product-short} +== Use {sstable-sideloader} with {product-proxy} include::sideloader:partial$sideloader-partials.adoc[tags=sideloader-zdm] diff --git a/modules/sideloader/pages/sideloader-zdm.adoc b/modules/sideloader/pages/sideloader-zdm.adoc index 8e2b5e8c..8a9d340e 100644 --- a/modules/sideloader/pages/sideloader-zdm.adoc +++ b/modules/sideloader/pages/sideloader-zdm.adoc @@ -1,14 +1,25 @@ -= Use {sstable-sideloader} with {product-short} += Use {sstable-sideloader} with {product-proxy} :navtitle: Use {sstable-sideloader} -:description: Use {sstable-sideloader} to migrate data with {product-short}. +:description: {sstable-sideloader} is a service running in {astra-db} that imports data from snapshots of your existing {cass-short}-based cluster. -{sstable-sideloader} is a service running in {astra-db} that directly imports data from snapshot backups that you've uploaded to {astra-db} from an existing {dse-short}, {hcd-short}, or other compatible {cass-short} cluster. -For compatible origin clusters, see xref:ROOT:astra-migration-paths.adoc[]. +{description} +This tool is exclusively for migrations that move data to {astra-db}. Because it imports data directly, {sstable-sideloader} can offer several advantages over CQL-based tools like {dsbulk-migrator} and {cass-migrator}, including faster, more cost-effective data loading, and minimal performance impacts on your origin cluster and target database. -To migrate data with {sstable-sideloader}, you use the {astra} {devops-api}, your cloud provider's CLI, and `nodetool`. +== Migrate data with {sstable-sideloader} + +To migrate data with {sstable-sideloader}, you use `nodetool`, a cloud provider's CLI, and the {astra} {devops-api}: + +* *`nodetool`*: Create snapshots of your existing {dse-short}, {hcd-short}, open-source {cass-short} cluster. +For compatible origin clusters, see xref:ROOT:astra-migration-paths.adoc[]. +* *Cloud provider CLI*: Upload snapshots to a dedicated cloud storage bucket for your migration. +* *{astra} {devops-api}*: Run the {sstable-sideloader} commands to write the data from cloud storage to your {astra-db} database. + +For more information and instructions, see xref:sideloader:sideloader-overview.adoc[]. + +== Use {sstable-sideloader} with {product-proxy} -include::sideloader:partial$sideloader-partials.adoc[tags=sideloader-zdm] +You can use {sstable-sideloader} alone or with {product-proxy}. -For more information, see xref:sideloader:sideloader-overview.adoc[]. \ No newline at end of file +include::sideloader:partial$sideloader-partials.adoc[tags=sideloader-zdm] \ No newline at end of file From 28012d7c5f0696b37836ab81cdcf3772206f0c52 Mon Sep 17 00:00:00 2001 From: April M Date: Tue, 3 Jun 2025 09:43:23 -0700 Subject: [PATCH 3/8] start component definition reconciliation --- .../ROOT/pages/cassandra-data-migrator.adoc | 4 +- modules/ROOT/pages/components.adoc | 61 +++++++++++++++---- modules/ROOT/pages/dsbulk-migrator.adoc | 4 +- .../ROOT/pages/migrate-and-validate-data.adoc | 8 +-- 4 files changed, 57 insertions(+), 20 deletions(-) diff --git a/modules/ROOT/pages/cassandra-data-migrator.adoc b/modules/ROOT/pages/cassandra-data-migrator.adoc index f9b8fadf..0b3ed3da 100644 --- a/modules/ROOT/pages/cassandra-data-migrator.adoc +++ b/modules/ROOT/pages/cassandra-data-migrator.adoc @@ -1,6 +1,6 @@ -= Use {cass-migrator} with {product-short} += Use {cass-migrator} with {product-proxy} :navtitle: Use {cass-migrator} -:description: Use {cass-migrator} to migrate data with {product-short} +:description: Use {cass-migrator} to migrate data with {product-proxy} :page-aliases: cdm-parameters.adoc, ROOT:cdm-steps.adoc //This page was an exact duplicate of cdm-overview.adoc and the (now deleted) cdm-steps.adoc, they are just in different parts of the nav. diff --git a/modules/ROOT/pages/components.adoc b/modules/ROOT/pages/components.adoc index 1e8648f0..57112ae4 100644 --- a/modules/ROOT/pages/components.adoc +++ b/modules/ROOT/pages/components.adoc @@ -3,24 +3,23 @@ :description: Learn about {company} migration tools. :page-tag: migration,zdm,zero-downtime,zdm-proxy,components -{company} migration tools include the {product} {product-short} toolkit and three data migration tools. +The {company} {product} ({product-short}) toolkit includes {product-proxy}, {product-utility}, and {product-automation}, and several data migration tools. -{product-short} is comprised of {product-proxy}, {product-utility}, and {product-automation}, which orchestrate activity-in-transition on your clusters. -To move and validate data, you use {sstable-sideloader}, {cass-migrator}, or {dsbulk-migrator}. +For live migrations, {product-proxy} orchestrates activity-in-transition on your clusters. +{product-utility} and {product-automation} facilitate the deployment and management of {product-proxy}. -You can also use {sstable-sideloader}, {cass-migrator-short}, and {dsbulk-migrator} on their own, outside the context of {product-short}. +To move and validate data, you use data migration tools. +You can use these tools alone or with {product-proxy}. == {product-proxy} -The main component of the {company} {product} toolkit is {product-proxy}, which is designed to be a lightweight proxy that handles all real-time requests generated by your client applications during the migration process. +The main component of the {company} {product} toolkit is {product-proxy-repo}[{product-proxy}], which is designed to be a lightweight proxy that handles all real-time requests generated by your client applications during the migration process. +This tool is open-source software that is open for xref:ROOT:contributions.adoc[public contributions]. -{product-proxy} is open-source software that is available from the {product-proxy-repo}[zdm-proxy GitHub repo]. -This project is open for public contributions. - -The {product-proxy} is an orchestrator for monitoring application activity and keeping multiple clusters (databases) in sync through dual writes. +{product-proxy} is an orchestrator for monitoring application activity and keeping multiple clusters (databases) in sync through dual writes. {product-proxy} isn't linked to the actual migration process. It doesn't perform data migrations and it doesn't have awareness of ongoing migrations. -Instead, you use a data migration tool, like {sstable-sideloader}, {cass-migrator}, or {dsbulk-migrator}, to perform the data migration and validate migrated data. +Instead, you use a <> to perform the data migration and validate migrated data. {product-proxy} reduces risks to upgrades and migrations by decoupling the origin cluster from the target cluster and maintaining consistency between both clusters. You decide when you want to switch permanently to the target cluster. @@ -82,7 +81,7 @@ For simplicity, you can use the {product-utility} and {product-automation} to se == {product-utility} and {product-automation} -You can use the {product-automation-repo}[{product-utility} and {product-automation}] to set up and run Ansible playbooks that deploy and manage {product-proxy} and its monitoring stack. +You can use {product-automation-repo}[{product-utility} and {product-automation}] to set up and run Ansible playbooks that deploy and manage {product-proxy} and the associated monitoring stack. https://www.ansible.com/[Ansible] is a suite of software tools that enables infrastructure as code. It is open source and its capabilities include software provisioning, configuration management, and application deployment functionality. @@ -98,4 +97,42 @@ To use {product-utility} and {product-automation}, you must prepare the recommen For more information, see xref:setup-ansible-playbooks.adoc[] and xref:deploy-proxy-monitoring.adoc[]. -include::ROOT:migrate-and-validate-data.adoc[tags=migration-tool-summaries] \ No newline at end of file +== Data migration tools + +You use data migration tools to move data between clusters and validate the migrated data. + +You can use these tools alone or with {product-proxy}. + +=== {sstable-sideloader} + +{sstable-sideloader} is a service running in {astra-db} that imports data from snapshots of your existing {cass-short}-based cluster. +This tool is exclusively for migrations that move data to {astra-db}. + +For more information, see xref:sideloader:sideloader-zdm.adoc[]. + +=== {cass-migrator} + +You can use {cass-migrator} ({cass-migrator-short}) to migrate and validate tables between {cass-short}-based clusters. +It is best for large migrations and for migrations that need advanced features, such as for detailed logging, data verification, table column renaming, and reconciliation. + +{cass-migrator-short} offers extensive functionality and configuration options to support large and complex migrations as well as post-migration data validation. + +You can use {cass-migrator-short} by itself, with {product-proxy}, or for data validation after using another data migration tool, such as {sstable-sideloader}. + +For more information, see xref:ROOT:cassandra-data-migrator.adoc[]. + +=== {dsbulk-migrator} + +{dsbulk-migrator} is an extension of {dsbulk-loader}. +It is best for smaller migrations or migrations that don't require data validation during the migration process. + +In addition to loading and unloading CSV and JSON data, you can use {dsbulk-migrator} to transfer data between databases. +It can read data from a table in your origin database, and then write that data to a table in your target database. + +You can use {dsbulk-migrator} alone or with {product-proxy}. + +For more information, see xref:ROOT:dsbulk-migrator.adoc[]. + +=== Custom data migration processes + +If you want to write your own custom data migration processes, you can use a tool like Apache Spark(TM). \ No newline at end of file diff --git a/modules/ROOT/pages/dsbulk-migrator.adoc b/modules/ROOT/pages/dsbulk-migrator.adoc index 0acc1cc7..00d11299 100644 --- a/modules/ROOT/pages/dsbulk-migrator.adoc +++ b/modules/ROOT/pages/dsbulk-migrator.adoc @@ -1,6 +1,6 @@ -= Use {dsbulk-migrator} with {product-short} += Use {dsbulk-migrator} with {product-proxy} :navtitle: Use {dsbulk-migrator} -:description: Use {dsbulk-migrator} to migrate data with {product-short}. +:description: Use {dsbulk-migrator} to migrate data with {product-proxy}. //TODO: Reorganize this page and consider breaking it up into smaller pages. diff --git a/modules/ROOT/pages/migrate-and-validate-data.adoc b/modules/ROOT/pages/migrate-and-validate-data.adoc index 79cdb4da..f014633e 100644 --- a/modules/ROOT/pages/migrate-and-validate-data.adoc +++ b/modules/ROOT/pages/migrate-and-validate-data.adoc @@ -6,10 +6,11 @@ image::migration-phase2ra.png[In {product-short} Phase 2, you migrate data from To move and validate data, you can use a dedicated data migration tool, such as {sstable-sideloader}, {cass-migrator}, or {dsbulk-migrator}, or your can create your own custom data migration script. -// tag::migration-tool-summaries[] +//Migration tool summaries are also on ROOT:components.adoc. + == {sstable-sideloader} -{sstable-sideloader} is a service running in {astra-db} that directly imports data from snapshots of your existing {cass-short}-based cluster. +{sstable-sideloader} is a service running in {astra-db} that imports data from snapshots of your existing {cass-short}-based cluster. This tool is exclusively for migrations that move data to {astra-db}. You can use {sstable-sideloader} alone or with {product-proxy}. @@ -41,5 +42,4 @@ For more information, see xref:ROOT:dsbulk-migrator.adoc[]. == Custom data migration processes -If you want to write your own custom data migration processes, you can use a tool like Apache Spark(TM). -// end::migration-tool-summaries[] \ No newline at end of file +If you want to write your own custom data migration processes, you can use a tool like Apache Spark(TM). \ No newline at end of file From 071d880924a8a896f4bc4cd760ef81517c2cce8e Mon Sep 17 00:00:00 2001 From: April M Date: Tue, 3 Jun 2025 09:50:41 -0700 Subject: [PATCH 4/8] cdm component definition and intro tweaks --- modules/ROOT/pages/cassandra-data-migrator.adoc | 9 +++++---- modules/ROOT/pages/cdm-overview.adoc | 1 + modules/ROOT/pages/components.adoc | 8 +++----- modules/ROOT/pages/migrate-and-validate-data.adoc | 8 +++----- 4 files changed, 12 insertions(+), 14 deletions(-) diff --git a/modules/ROOT/pages/cassandra-data-migrator.adoc b/modules/ROOT/pages/cassandra-data-migrator.adoc index 0b3ed3da..a467d4e8 100644 --- a/modules/ROOT/pages/cassandra-data-migrator.adoc +++ b/modules/ROOT/pages/cassandra-data-migrator.adoc @@ -1,18 +1,19 @@ = Use {cass-migrator} with {product-proxy} :navtitle: Use {cass-migrator} -:description: Use {cass-migrator} to migrate data with {product-proxy} +:description: You can use {cass-migrator} ({cass-migrator-short}) for data migration and validation between {cass-reg}-based databases. :page-aliases: cdm-parameters.adoc, ROOT:cdm-steps.adoc //This page was an exact duplicate of cdm-overview.adoc and the (now deleted) cdm-steps.adoc, they are just in different parts of the nav. // tag::body[] -You can use {cass-migrator} ({cass-migrator-short}) for data migration and validation between {cass-reg}-based databases. -It supports important {cass} features and offers extensive configuration options: +{description} +It is best for large or complex migrations that benefit from advanced features and configuration options, such as the following: * Logging and run tracking * Automatic reconciliation * Performance tuning * Record filtering +* Column renaming * Support for advanced data types, including sets, lists, maps, and UDTs * Support for SSL, including custom cipher algorithms * Use `writetime` timestamps to maintain chronological write history @@ -26,7 +27,7 @@ To use {cass-migrator-short} successfully, your origin and target clusters must == {cass-migrator-short} with {product-proxy} -You can use {cass-migrator-short} alone or with {product-proxy}. +You can use {cass-migrator-short} alone, with {product-proxy}, or for data validation after using another data migration tool. When using {cass-migrator-short} with {product-proxy}, {cass-short}'s last-write-wins semantics ensure that new, real-time writes accurately take precedence over historical writes. diff --git a/modules/ROOT/pages/cdm-overview.adoc b/modules/ROOT/pages/cdm-overview.adoc index ab33c66d..79644ff4 100644 --- a/modules/ROOT/pages/cdm-overview.adoc +++ b/modules/ROOT/pages/cdm-overview.adoc @@ -1,3 +1,4 @@ = {cass-migrator} ({cass-migrator-short}) overview +:description: You can use {cass-migrator} ({cass-migrator-short}) for data migration and validation between {cass-reg}-based databases. include::ROOT:cassandra-data-migrator.adoc[tags=body] \ No newline at end of file diff --git a/modules/ROOT/pages/components.adoc b/modules/ROOT/pages/components.adoc index 57112ae4..5f427361 100644 --- a/modules/ROOT/pages/components.adoc +++ b/modules/ROOT/pages/components.adoc @@ -112,12 +112,10 @@ For more information, see xref:sideloader:sideloader-zdm.adoc[]. === {cass-migrator} -You can use {cass-migrator} ({cass-migrator-short}) to migrate and validate tables between {cass-short}-based clusters. -It is best for large migrations and for migrations that need advanced features, such as for detailed logging, data verification, table column renaming, and reconciliation. +You can use {cass-migrator} ({cass-migrator-short}) for data migration and validation between {cass-reg}-based databases. +It offers extensive functionality and configuration options to support large and complex migrations as well as post-migration data validation. -{cass-migrator-short} offers extensive functionality and configuration options to support large and complex migrations as well as post-migration data validation. - -You can use {cass-migrator-short} by itself, with {product-proxy}, or for data validation after using another data migration tool, such as {sstable-sideloader}. +You can use {cass-migrator-short} by itself, with {product-proxy}, or for data validation after using another data migration tool. For more information, see xref:ROOT:cassandra-data-migrator.adoc[]. diff --git a/modules/ROOT/pages/migrate-and-validate-data.adoc b/modules/ROOT/pages/migrate-and-validate-data.adoc index f014633e..3c5b79d4 100644 --- a/modules/ROOT/pages/migrate-and-validate-data.adoc +++ b/modules/ROOT/pages/migrate-and-validate-data.adoc @@ -19,12 +19,10 @@ For more information, see xref:sideloader:sideloader-zdm.adoc[]. == {cass-migrator} -You can use {cass-migrator} ({cass-migrator-short}) to migrate and validate tables between {cass-short}-based clusters. -It is best for large migrations and for migrations that need advanced features, such as for detailed logging, data verification, table column renaming, and reconciliation. +You can use {cass-migrator} ({cass-migrator-short}) for data migration and validation between {cass-reg}-based databases. +It offers extensive functionality and configuration options to support large and complex migrations as well as post-migration data validation. -{cass-migrator-short} offers extensive functionality and configuration options to support large and complex migrations as well as post-migration data validation. - -You can use {cass-migrator-short} by itself, with {product-proxy}, or for data validation after using another data migration tool, such as {sstable-sideloader}. +You can use {cass-migrator-short} by itself, with {product-proxy}, or for data validation after using another data migration tool. For more information, see xref:ROOT:cassandra-data-migrator.adoc[]. From e6eae4fffe8974cc1bf20add45dab681d4002386 Mon Sep 17 00:00:00 2001 From: April M Date: Tue, 3 Jun 2025 10:00:20 -0700 Subject: [PATCH 5/8] dsbulk migrator component description --- modules/ROOT/pages/components.adoc | 6 ++---- modules/ROOT/pages/dsbulk-migrator-overview.adoc | 1 + modules/ROOT/pages/dsbulk-migrator.adoc | 7 ++++--- modules/ROOT/pages/migrate-and-validate-data.adoc | 6 ++---- 4 files changed, 9 insertions(+), 11 deletions(-) diff --git a/modules/ROOT/pages/components.adoc b/modules/ROOT/pages/components.adoc index 5f427361..f1975d2a 100644 --- a/modules/ROOT/pages/components.adoc +++ b/modules/ROOT/pages/components.adoc @@ -121,11 +121,9 @@ For more information, see xref:ROOT:cassandra-data-migrator.adoc[]. === {dsbulk-migrator} -{dsbulk-migrator} is an extension of {dsbulk-loader}. -It is best for smaller migrations or migrations that don't require data validation during the migration process. +{dsbulk-migrator} extends {dsbulk-loader} with migration-specific commands: `migrate-live`, `generate-script`, and `generate-ddl`. -In addition to loading and unloading CSV and JSON data, you can use {dsbulk-migrator} to transfer data between databases. -It can read data from a table in your origin database, and then write that data to a table in your target database. +It is best for smaller migrations or migrations that don't require extensive data validation, aside from post-migration row counts. You can use {dsbulk-migrator} alone or with {product-proxy}. diff --git a/modules/ROOT/pages/dsbulk-migrator-overview.adoc b/modules/ROOT/pages/dsbulk-migrator-overview.adoc index acc2540c..84769d92 100644 --- a/modules/ROOT/pages/dsbulk-migrator-overview.adoc +++ b/modules/ROOT/pages/dsbulk-migrator-overview.adoc @@ -1,3 +1,4 @@ = {dsbulk-migrator} overview +:description: {dsbulk-migrator} extends {dsbulk-loader} with migration commands. include::ROOT:dsbulk-migrator.adoc[tags=body] \ No newline at end of file diff --git a/modules/ROOT/pages/dsbulk-migrator.adoc b/modules/ROOT/pages/dsbulk-migrator.adoc index 00d11299..29a9c736 100644 --- a/modules/ROOT/pages/dsbulk-migrator.adoc +++ b/modules/ROOT/pages/dsbulk-migrator.adoc @@ -1,12 +1,13 @@ = Use {dsbulk-migrator} with {product-proxy} :navtitle: Use {dsbulk-migrator} -:description: Use {dsbulk-migrator} to migrate data with {product-proxy}. +:description: {dsbulk-migrator} extends {dsbulk-loader} with migration commands. //TODO: Reorganize this page and consider breaking it up into smaller pages. // tag::body[] -Use {dsbulk-migrator} to perform small or simple migrations that don't require data validation other than post-migration row counts. -This tool is also an option for migrations where you can shard data from large tables into more manageable quantities. +{dsbulk-migrator} is an extension of {dsbulk-loader}. +It is best for smaller migrations or migrations that don't require extensive data validation, aside from post-migration row counts. +You can also consider this tool for migrations where you can shard data from large tables into more manageable quantities. {dsbulk-migrator} extends {dsbulk-loader} with the following commands: diff --git a/modules/ROOT/pages/migrate-and-validate-data.adoc b/modules/ROOT/pages/migrate-and-validate-data.adoc index 3c5b79d4..bc8b603b 100644 --- a/modules/ROOT/pages/migrate-and-validate-data.adoc +++ b/modules/ROOT/pages/migrate-and-validate-data.adoc @@ -28,11 +28,9 @@ For more information, see xref:ROOT:cassandra-data-migrator.adoc[]. == {dsbulk-migrator} -{dsbulk-migrator} is an extension of {dsbulk-loader}. -It is best for smaller migrations or migrations that don't require data validation during the migration process. +{dsbulk-migrator} extends {dsbulk-loader} with migration-specific commands: `migrate-live`, `generate-script`, and `generate-ddl`. -In addition to loading and unloading CSV and JSON data, you can use {dsbulk-migrator} to transfer data between databases. -It can read data from a table in your origin database, and then write that data to a table in your target database. +It is best for smaller migrations or migrations that don't require extensive data validation, aside from post-migration row counts. You can use {dsbulk-migrator} alone or with {product-proxy}. From 6cc22ce498c76fa425811b41b167b5153a778ff5 Mon Sep 17 00:00:00 2001 From: April M Date: Tue, 3 Jun 2025 10:53:02 -0700 Subject: [PATCH 6/8] remove article from ZDM proxy, etc --- modules/ROOT/pages/change-read-routing.adoc | 14 ++--- modules/ROOT/pages/components.adoc | 6 +-- .../ROOT/pages/connect-clients-to-proxy.adoc | 20 +++---- .../ROOT/pages/connect-clients-to-target.adoc | 4 +- modules/ROOT/pages/contributions.adoc | 2 +- modules/ROOT/pages/create-target.adoc | 2 +- .../ROOT/pages/deploy-proxy-monitoring.adoc | 32 ++++++------ .../ROOT/pages/deployment-infrastructure.adoc | 7 +-- modules/ROOT/pages/faqs.adoc | 12 ++--- .../ROOT/pages/feasibility-checklists.adoc | 8 +-- modules/ROOT/pages/glossary.adoc | 4 +- modules/ROOT/pages/introduction.adoc | 4 +- .../ROOT/pages/manage-proxy-instances.adoc | 18 +++---- modules/ROOT/pages/metrics.adoc | 18 +++---- .../ROOT/pages/setup-ansible-playbooks.adoc | 52 +++++++++---------- modules/ROOT/pages/tls.adoc | 28 +++++----- .../ROOT/pages/troubleshooting-scenarios.adoc | 32 ++++++------ modules/ROOT/pages/troubleshooting-tips.adoc | 20 +++---- 18 files changed, 138 insertions(+), 145 deletions(-) diff --git a/modules/ROOT/pages/change-read-routing.adoc b/modules/ROOT/pages/change-read-routing.adoc index 939d1fe1..c44bed1b 100644 --- a/modules/ROOT/pages/change-read-routing.adoc +++ b/modules/ROOT/pages/change-read-routing.adoc @@ -1,7 +1,7 @@ = Route reads to the target :page-tag: migration,zdm,zero-downtime,zdm-proxy,read-routing -This topic explains how you can configure the {product-proxy} to route all reads to the target cluster instead of the origin cluster. +This topic explains how you can configure {product-proxy} to route all reads to the target cluster instead of the origin cluster. image::migration-phase4ra9.png["Phase 4 diagram shows read routing on {product-proxy} was switched to the target."] @@ -58,7 +58,7 @@ ansible-playbook rolling_update_zdm_proxy.yml -i zdm_ansible_inventory Wait for the {product-proxy} instances to be restarted by Ansible, one by one. All instances will now send all reads to the target cluster instead of the origin cluster. -At this point, the target cluster becomes the primary cluster, but the {product-proxy} still keeps the origin cluster up-to-date through dual writes. +At this point, the target cluster becomes the primary cluster, but {product-proxy} still keeps the origin cluster up-to-date through dual writes. == Verifying the read routing change @@ -67,11 +67,11 @@ This is not a required step, but you may wish to do it for peace of mind. [TIP] ==== -Issuing a `DESCRIBE` or a read to any system table through the {product-proxy} is *not* a valid verification. +Issuing a `DESCRIBE` or a read to any system table through {product-proxy} isn't a valid verification. -The {product-proxy} handles reads to system tables differently, by intercepting them and always routing them to the origin, in some cases partly populating them at proxy level. +{product-proxy} handles reads to system tables differently, by intercepting them and always routing them to the origin, in some cases partly populating them at proxy level. -This means that system reads are *not representative* of how the {product-proxy} routes regular user reads. +This means that system reads aren't representative of how {product-proxy} routes regular user reads. Even after you switched the configuration to read the target cluster as the primary cluster, all system reads still go to the origin. Although `DESCRIBE` requests are not system requests, they are also generally resolved in a different way to regular requests, and should not be used as a means to verify the read routing behavior. @@ -81,7 +81,7 @@ Verifying that the correct routing is taking place is a slightly cumbersome oper For this reason, the only way to do a manual verification test is to force a discrepancy of some test data between the clusters. To do this, you could consider using the xref:connect-clients-to-proxy.adoc#_themis_client[Themis sample client application]. -This client application connects directly to the origin cluster, the target cluster, and the {product-proxy}. +This client application connects directly to the origin cluster, the target cluster, and {product-proxy}. It inserts some test data in its own table, and then you can view the results of reads from each source. Refer to the Themis README for more information. @@ -93,5 +93,5 @@ For example `CREATE TABLE test_keyspace.test_table(k TEXT PRIMARY KEY, v TEXT);` Insert a row with any key, and with a value specific to the origin cluster, for example `INSERT INTO test_keyspace.test_table(k, v) VALUES ('1', 'Hello from the origin cluster!');`. * Now, use `cqlsh` to connect *directly to the target cluster*. Insert a row with the same key as above, but with a value specific to the target cluster, for example `INSERT INTO test_keyspace.test_table(k, v) VALUES ('1', 'Hello from the target cluster!');`. -* Now, use `cqlsh` to xref:connect-clients-to-proxy.adoc#_connecting_cqlsh_to_the_zdm_proxy[connect to the {product-proxy}], and then issue a read request for this test table: `SELECT * FROM test_keyspace.test_table WHERE k = '1';`. +* Now, use `cqlsh` to xref:connect-clients-to-proxy.adoc#_connecting_cqlsh_to_the_zdm_proxy[connect to {product-proxy}], and then issue a read request for this test table: `SELECT * FROM test_keyspace.test_table WHERE k = '1';`. The result will clearly show you where the read actually comes from. diff --git a/modules/ROOT/pages/components.adoc b/modules/ROOT/pages/components.adoc index f1975d2a..9678eacf 100644 --- a/modules/ROOT/pages/components.adoc +++ b/modules/ROOT/pages/components.adoc @@ -77,7 +77,7 @@ Throughout the {product-short} documentation, the term _{product-proxy} deployme You can scale {product-proxy} instances horizontally and vertically. To avoid downtime when applying configuration changes, you can perform rolling restarts on your {product-proxy} instances. -For simplicity, you can use the {product-utility} and {product-automation} to set up and run Ansible playbooks that deploy and manage {product-proxy} and its monitoring stack. +For simplicity, you can use {product-utility} and {product-automation} to set up and run Ansible playbooks that deploy and manage {product-proxy} and its monitoring stack. == {product-utility} and {product-automation} @@ -89,9 +89,9 @@ The Ansible automation for {product-short} is organized into playbooks, each imp The machine from which the playbooks are run is known as the Ansible Control Host. In {product-short}, the Ansible Control Host runs as a Docker container. -You use the {product-utility} to set up Ansible in a Docker container, and then you use {product-automation} to run the Ansible playbooks from the Docker container created by {product-utility}. +You use {product-utility} to set up Ansible in a Docker container, and then you use {product-automation} to run the Ansible playbooks from the Docker container created by {product-utility}. -The {product-utility} creates the Docker container acting as the Ansible Control Host, from which {product-automation} allows you to deploy and manage the {product-proxy} instances and the associated monitoring stack, which includes Prometheus metrics and Grafana visualizations of the metrics data. +{product-utility} creates the Docker container acting as the Ansible Control Host, from which {product-automation} allows you to deploy and manage the {product-proxy} instances and the associated monitoring stack, which includes Prometheus metrics and Grafana visualizations of the metrics data. To use {product-utility} and {product-automation}, you must prepare the recommended infrastructure, as explained in xref:deployment-infrastructure.adoc[]. diff --git a/modules/ROOT/pages/connect-clients-to-proxy.adoc b/modules/ROOT/pages/connect-clients-to-proxy.adoc index 5993ea18..ab0fe423 100644 --- a/modules/ROOT/pages/connect-clients-to-proxy.adoc +++ b/modules/ROOT/pages/connect-clients-to-proxy.adoc @@ -2,7 +2,7 @@ :navtitle: Connect client applications to {product-proxy} :page-tag: migration,zdm,zero-downtime,zdm-proxy,connect-apps -The {product-proxy} is designed to be similar to a conventional {cass-reg} cluster. +{product-proxy} is designed to be similar to a conventional {cass-reg} cluster. You communicate with it using the CQL query language used in your existing client applications. It understands the same messaging protocols used by {cass-short}, {dse}, and {astra-db}. As a result, most of your client applications won't be able to distinguish between connecting to {product-proxy} and connecting directly to your {cass-short} cluster. @@ -13,7 +13,7 @@ We conclude by describing two sample client applications that serve as real-worl You can use the provided sample client applications, in addition to your own, as a quick way to validate that the deployed {product-proxy} is reading and writing data from the expected origin and target clusters. -Finally, we will explain how to connect the `cqlsh` command-line client to the {product-proxy}. +This topic also explains how to connect CQL shell (`cqlsh`) to {product-proxy}. == {company}-compatible drivers @@ -147,8 +147,8 @@ For information about {astra-db} credentials in your {product-proxy} configurati === Disable client-side compression with {product-proxy} -Client applications must not enable client-side compression when connecting through the {product-proxy}, as this is not currently supported. -This is disabled by default in all drivers, but if it was enabled in your client application configuration it will have to be temporarily disabled when connecting to the {product-proxy}. +Client applications must not enable client-side compression when connecting through {product-proxy}, as this is not currently supported. +This is disabled by default in all drivers, but if it was enabled in your client application configuration it will have to be temporarily disabled when connecting to {product-proxy}. === {product-proxy} ignores token-aware routing @@ -186,16 +186,12 @@ You can find the details of building and running {product-demo} in the https://g [[_themis_client]] === Themis client -https://github.com/absurdfarce/themis[Themis] is a Java command-line client application that allows you to insert randomly generated data into some combination of these three sources: +https://github.com/absurdfarce/themis[Themis] is a Java command-line client application that allows you to write randomly generated data directly to the origin cluster, directly to the target cluster, or indirectly to both clusters through {product-proxy}. -* Directly into the origin -* Directly into the target -* Into the {product-proxy}, and subsequently on to the origin and target +Then, you can use the client application to query the data and confirm that {product-proxy} is reading and writing data from the expected sources. -The client application can then be used to query the inserted data. -This allows you to validate that the {product-proxy} is reading and writing data from the expected sources. -Configuration details for the clusters and/or {product-proxy} are defined in a YAML file. -Details are in the https://github.com/absurdfarce/themis/blob/main/README.md[README]. +Configuration details for the clusters and {product-proxy} are defined in a YAML file. +For more information, see the https://github.com/absurdfarce/themis/blob/main/README.md[Themis README]. In addition to any utility as a validation tool, Themis also serves as an example of a larger client application which uses the Java driver to connect to a {product-proxy} -- as well as directly to {cass-short} clusters or {astra-db} -- and perform operations. The configuration logic as well as the cluster and session management code have been cleanly separated into distinct packages to make them easy to understand. diff --git a/modules/ROOT/pages/connect-clients-to-target.adoc b/modules/ROOT/pages/connect-clients-to-target.adoc index c1db87be..bfee8d7f 100644 --- a/modules/ROOT/pages/connect-clients-to-target.adoc +++ b/modules/ROOT/pages/connect-clients-to-target.adoc @@ -4,7 +4,7 @@ At this point in our migration phases, we've completed: -* Phase 1: Connected client applications to {product-proxy}, which included setting up Ansible playbooks with the {product-utility}, and deploying the {product-proxy} instances via the Docker container with {product-automation}. +* Phase 1: Connected client applications to {product-proxy}, which included setting up Ansible playbooks with {product-utility} and using {product-automation} to deploy the {product-proxy} instances with the Docker container. * Phase 2: Migrated and validated our data with {cass-migrator} and/or {dsbulk-migrator}. @@ -31,7 +31,7 @@ For more information, see xref:datastax-drivers:compatibility:driver-matrix.adoc To connect to {astra-db}, you need the following: -* The xref:astra-db-serverless:administration:manage-application-tokens.adoc[application token] credentials that you used to xref:ROOT:connect-clients-to-proxy.adoc[connect your applications to the {product-proxy}]. +* The xref:astra-db-serverless:administration:manage-application-tokens.adoc[application token] credentials that you used to xref:ROOT:connect-clients-to-proxy.adoc[connect your applications to {product-proxy}]. + As before, you can use either of the following sets of credentials to connect to your {astra-db} database: + diff --git a/modules/ROOT/pages/contributions.adoc b/modules/ROOT/pages/contributions.adoc index b81ffe71..97f3a685 100644 --- a/modules/ROOT/pages/contributions.adoc +++ b/modules/ROOT/pages/contributions.adoc @@ -3,7 +3,7 @@ {company} {product} ({product-short}) provides a simple and reliable way for users to migrate an existing {cass-reg} or {dse} cluster to {astra-db}, or to any {cass-short} or {dse-short} cluster, without any interruption of service to the client applications and data. -The {product-proxy} is open source software (OSS). We welcome contributions from the developer community via Pull Requests on a fork, for evaluation by the {product-short} team. +{product-proxy} is open source software (OSS). We welcome contributions from the developer community via Pull Requests on a fork, for evaluation by the {product-short} team. The code sources for additional {product} components -- including {product-utility}, {product-automation}, {cass-migrator}, and {dsbulk-migrator} -- are available in public GitHub repos, where you may submit feedback and ideas via GitHub Issues. Code contributions for those additional components are not open for PRs at this time. diff --git a/modules/ROOT/pages/create-target.adoc b/modules/ROOT/pages/create-target.adoc index 495af820..2abad900 100644 --- a/modules/ROOT/pages/create-target.adoc +++ b/modules/ROOT/pages/create-target.adoc @@ -34,7 +34,7 @@ Assign your preferred values for the serverless database: * **Region**: choose your geographically preferred region - you can subsequently add more regions. When the {astra-db} database reaches **Active** status, create an application token in the {astra-ui} with the *Read/Write User* role. -This role will be used by the client application, the {product-proxy}, and the {product-automation}. +This role will be used by the client application, {product-proxy}, and {product-automation}. Save the generate token and credentials (Client ID, Client Secret, and Token) in a clearly named secure file. diff --git a/modules/ROOT/pages/deploy-proxy-monitoring.adoc b/modules/ROOT/pages/deploy-proxy-monitoring.adoc index 748bd4f7..d607757f 100644 --- a/modules/ROOT/pages/deploy-proxy-monitoring.adoc +++ b/modules/ROOT/pages/deploy-proxy-monitoring.adoc @@ -1,7 +1,7 @@ -= Deploy the {product-proxy} and monitoring += Deploy {product-proxy} and monitoring :page-tag: migration,zdm,zero-downtime,deploy,zdm-proxy,monitoring -This topic explains how to use the Ansible automation playbooks that you set up in the xref:setup-ansible-playbooks.adoc[prior topic] to deploy the {product-proxy} and its monitoring stack. +This topic explains how to use the Ansible automation playbooks that you set up in the xref:setup-ansible-playbooks.adoc[prior topic] to deploy {product-proxy} and its monitoring stack. Once completed, you will have a working and fully monitored {product-proxy} deployment. @@ -34,7 +34,7 @@ Now, `cd` into `zdm-proxy-automation/ansible` and `ls`. Example: image::zdm-ansible-container-ls3.png[Contents of the Ansible Control Host container] [[_configure_the_zdm_proxy]] -== Configure the {product-proxy} +== Configure {product-proxy} The {product-proxy} configuration is composed of five files: @@ -67,7 +67,7 @@ The `vi` and `nano` text editors are available in the container. .If you are on {product-automation} version 2.1.0 or earlier [%collapsible] ==== -Starting in version 2.2.0 of the {product-automation}, all origin and target cluster configuration variables are stored in `zdm_proxy_cluster_config.yml`. +Starting in version 2.2.0 of {product-automation}, all origin and target cluster configuration variables are stored in `zdm_proxy_cluster_config.yml`. In earlier versions, these variables are in the `zdm_proxy_core_config.yml` file. This change is backward compatible. @@ -159,7 +159,7 @@ For more information, see xref:enable-async-dual-reads.adoc[]. === Enable TLS encryption (optional) -If you want to enable TLS encryption between the client application and the {product-proxy}, or between the {product-proxy} and one (or both) self-managed clusters, you will need to specify some additional configuration. +If you want to enable TLS encryption between the client application and {product-proxy}, or between {product-proxy} and one or both self-managed clusters, you will need to specify some additional configuration. For instructions, see xref:ROOT:tls.adoc[]. [[_advanced_configuration_optional]] @@ -171,7 +171,7 @@ All advanced configuration variables not listed here are considered mutable and ==== Multi-datacenter clusters -For multi-datacenter origin clusters, you will need to specify the name of the datacenter that the {product-proxy} should consider local. To do this, set the property `origin_local_datacenter` to the datacenter name. +For multi-datacenter origin clusters, you will need to specify the name of the datacenter that {product-proxy} should consider local. To do this, set the property `origin_local_datacenter` to the datacenter name. Likewise, for multi-datacenter target clusters you will need to set `target_local_datacenter` appropriately. These two variables are stored in `vars/zdm_proxy_advanced_config.yml`. @@ -182,13 +182,13 @@ Note that this is not relevant for multi-region {astra-db} databases, where this Each {product-proxy} instance listens on port 9042 by default, like a regular {cass-short} cluster. This can be overridden by setting `zdm_proxy_listen_port` to a different value. -This can be useful if the origin nodes listen on a port that is not 9042 and you want to configure the {product-proxy} to listen on that same port to avoid changing the port in your client application configuration. +This can be useful if the origin nodes listen on a port that is not 9042 and you want to configure {product-proxy} to listen on that same port to avoid changing the port in your client application configuration. -The {product-proxy} exposes metrics on port 14001 by default. +{product-proxy} exposes metrics on port 14001 by default. This port is used by Prometheus to scrape the application-level proxy metrics. This can be changed by setting `metrics_port` to a different value if desired. -== Use Ansible to deploy the {product-proxy} +== Use Ansible to deploy {product-proxy} Now you can run the playbook that you've configured above. From the shell connected to the container, ensure that you are in `/home/ubuntu/zdm-proxy-automation/ansible` and run: @@ -206,14 +206,14 @@ That's it! A {product-proxy} container has been created on each proxy host. The playbook will create one {product-proxy} instance for each proxy host listed in the inventory file. It will indicate the operations that it is performing and print out any errors, or a success confirmation message at the end. -Confirm that the {product-short} proxies are up and running by using one of the following options: +Confirm that the {product-proxy} instances are up and running by using one of the following options: -* Call the `liveness` and `readiness` HTTP endpoints for {product-proxy} instances. -* Check {product-proxy} instances via docker logs. +* Call the `liveness` and `readiness` HTTP endpoints for the {product-proxy} instances. +* Check the {product-proxy} instances via docker logs. === Call the `liveness` and `readiness` HTTP endpoints -{product-short} metrics provide `/health/liveness` and `/health/readiness` HTTP endpoints, which you can call to determine the state of {product-proxy} instances. +{product-short} metrics provide `/health/liveness` and `/health/readiness` HTTP endpoints, which you can call to determine the state of the {product-proxy} instances. It's often fine to simply submit the `readiness` check to return the proxy's state. The format: @@ -232,7 +232,7 @@ curl -G "http://{{ hostvars[inventory_hostname]['ansible_default_ipv4']['address ---- The default port for metrics collection is `14001`. -You can override this port if you deploy the {product-proxy} with `metrics_port` set to a non-default port. +You can override this port if you deploy {product-proxy} with `metrics_port` set to a non-default port. For more information, see <>. Readiness example: @@ -328,13 +328,13 @@ Be aware that running the `deploy_zdm_proxy.yml` playbook results in a brief win [[_setting_up_the_monitoring_stack]] == Setting up the Monitoring stack -The {product-automation} enables you to easily set up a self-contained monitoring stack that is preconfigured to collect metrics from your {product-proxy} instances and display them in ready-to-use Grafana dashboards. +{product-automation} enables you to easily set up a self-contained monitoring stack that is preconfigured to collect metrics from your {product-proxy} instances and display them in ready-to-use Grafana dashboards. The monitoring stack is deployed entirely on Docker. It includes the following components, all deployed as Docker containers: * Prometheus node exporter, which runs on each {product-proxy} host and makes OS- and host-level metrics available to Prometheus. -* Prometheus server, to collect metrics from the {product-proxy} process, its Golang runtime and the Prometheus node exporter. +* Prometheus server, to collect metrics from {product-proxy}, its Golang runtime, and the Prometheus node exporter. * Grafana, to visualize all these metrics in three preconfigured dashboards (see xref:ROOT:metrics.adoc[]). After running the playbook described here, you will have a fully configured monitoring stack connected to your {product-proxy} deployment. diff --git a/modules/ROOT/pages/deployment-infrastructure.adoc b/modules/ROOT/pages/deployment-infrastructure.adoc index 6eabc43c..a641e5fe 100644 --- a/modules/ROOT/pages/deployment-infrastructure.adoc +++ b/modules/ROOT/pages/deployment-infrastructure.adoc @@ -11,7 +11,7 @@ A minimum of three proxy instances is recommended for any deployment apart from All {product-proxy} instances must be reachable by the client application and must be able to connect to your origin and target clusters. The {product-proxy} process is lightweight, requiring only a small amount of resources and no storage to persist state (apart from logs). -The {product-proxy} should be deployed close to your client application instances. +{product-proxy} should be deployed close to your client application instances. This can be on any cloud provider as well as on-premise, depending on your existing infrastructure. If you have a multi-DC cluster with multiple set of client application instances deployed to geographically distributed data centers, you should plan for a separate {product-proxy} deployment for each data center. @@ -22,7 +22,7 @@ image::zdm-during-migration3.png[Connectivity between client applications, proxy == Infrastructure requirements -To deploy the {product-proxy} and its companion monitoring stack, you will have to provision infrastructure that meets the following requirements. +To deploy {product-proxy} and its companion monitoring stack, you will have to provision infrastructure that meets the following requirements. [[_machines]] === Machines @@ -72,6 +72,7 @@ If there is one super large table (e.g. 15 TB of 20 TB is in one table), you can // TODO: investigate how to "leverage the parallelism of {cass-migrator} to run the migration process across all 4 machines." === Connectivity + The {product-proxy} machines must be reachable by: * The client application instances, on port 9042 @@ -106,7 +107,7 @@ The {product-proxy} and monitoring machines must be able to connect externally, * Various software packages (Docker, Prometheus, Grafana). * {product-proxy} image from DockerHub repo. -=== Connecting to the {product-short} infrastructure from an external machine +=== Connect to {product-proxy} infrastructure from an external machine To connect to the jumphost from an external machine, ensure that its IP address belongs to a permitted IP range. If you are connecting through a VPN that only intercepts connections to selected destinations, you may have to add a route from your VPN IP gateway to the public IP of the jumphost. diff --git a/modules/ROOT/pages/faqs.adoc b/modules/ROOT/pages/faqs.adoc index 564f6f14..7531b153 100644 --- a/modules/ROOT/pages/faqs.adoc +++ b/modules/ROOT/pages/faqs.adoc @@ -21,7 +21,7 @@ It is important to note that the {product} process requires you to be able to pe ==== In the context of migrating between clusters (client applications and data), the examples in this guide sometimes refer to the migration to our cloud-native database environment, {astra-db}. -However, it is important to emphasize that the {product-proxy} can be freely used to migrate without downtime between any combination of CQL clusters of any type. In addition to {astra-db}, examples include {cass-reg} or {dse}. +However, it is important to emphasize that {product-proxy} can be freely used to support migrations without downtime between any combination of CQL clusters of any type. In addition to {astra-db}, examples include {cass-reg} or {dse}. ==== == Can you illustrate the overall workflow and phases of a migration? @@ -88,7 +88,7 @@ Before {company} {product} was available, migrating client applications between Free and Pay As You Go plan users do not have support access and must raise questions in the {astra-ui} chat. https://www.datastax.com/products/luna[Luna] is a subscription to the {cass} support and expertise at {company}. -For any observed problems with the {product-proxy}, submit a {product-proxy-repo}/issues[GitHub Issue] in the {product-proxy} GitHub repo. +For any observed problems with {product-proxy}, submit a {product-proxy-repo}/issues[GitHub Issue] in the {product-proxy} GitHub repo. Additional examples serve as templates, from which you can learn about migrations. {company} does not assume responsibility for making the templates work for specific use cases. @@ -105,7 +105,7 @@ To submit PRs, you must for first agree to the https://cla.datastax.com/[{compan * {product-proxy-repo}[{product-proxy}] repo. -* {product-automation-repo}[{product-automation}] repo for the Ansible-based {product-automation}, which includes the {product-utility}. +* {product-automation-repo}[{product-automation}] repo for the Ansible-based {product-automation}, which includes {product-utility}. * {cass-migrator-repo}[cassandra-data-migrator] repo for the tool that supports migrating larger data quantities as well as detailed verifications and reconciliation options. @@ -116,11 +116,11 @@ To submit PRs, you must for first agree to the https://cla.datastax.com/[{compan Yes, and here's a summary: -* For application-to-proxy TLS, the application is the TLS client and the {product-proxy} is the TLS server. +* For application-to-proxy TLS, the application is the TLS client and {product-proxy} is the TLS server. One-way TLS and Mutual TLS are both supported. -* For proxy-to-cluster TLS, the {product-proxy} acts as the TLS client and the cluster as the TLS server. +* For proxy-to-cluster TLS, {product-proxy} acts as the TLS client and the cluster as the TLS server. One-way TLS and Mutual TLS are both supported. -* When the {product-proxy} connects to {astra-db} clusters, it always implicitly uses Mutual TLS. +* When {product-proxy} connects to {astra-db} clusters, it always implicitly uses Mutual TLS. This is done through the {scb} and does not require any extra configuration. For TLS details, see xref:tls.adoc[]. diff --git a/modules/ROOT/pages/feasibility-checklists.adoc b/modules/ROOT/pages/feasibility-checklists.adoc index adb36997..c8ce515d 100644 --- a/modules/ROOT/pages/feasibility-checklists.adoc +++ b/modules/ROOT/pages/feasibility-checklists.adoc @@ -14,7 +14,7 @@ If your database doesn't meet these requirements, you can still complete the mig //TODO: V5 status: https://github.com/datastax/zdm-proxy/blob/main/faq.md#what-versions-of-apache-cassandra-or-cql-compatible-data-stores-does-the-zdm-proxy-support {product-proxy} technically doesn't support `v5`. If `v5` is requested, the proxy handles protocol negotiation so that the client application properly downgrades the protocol version to `v4`. -This means that any client application using a recent driver that supports protocol version `v5` can be migrated using the {product-proxy} (as long as it does not use v5-specific functionality). +This means that any client application using a recent driver that supports protocol version `v5`can be migrated using{product-proxy} as long as the application doesn't use v5-specific functionality. === Thrift is not supported by {product-proxy} @@ -91,7 +91,7 @@ For upgrade instructions, see xref:ROOT:manage-proxy-instances.adoc#_upgrade_the If a client application only sends `SELECT` statements to a database connection then you may find that {product-proxy} terminates these read-only connections periodically, which may result in request errors if the driver is not configured to retry these requests in these conditions. This happens because {astra-db} terminates idle connections after some inactivity period (usually around 10 minutes). -If {astra-db} is your target, and a client connection is only sending read requests to the {product-proxy}, then the {astra-db} connection that is paired to that client connection will remain idle and will be eventually terminated. +If {astra-db} is your target, and a client connection is only sending read requests to {product-proxy}, then the {astra-db} connection that is paired to that client connection will remain idle and will be eventually terminated. A potential workaround is to not connect these read-only client applications to {product-proxy}, but you need to ensure that these client applications switch reads to the target at any point after all the data has been migrated and all validation and reconciliation has completed. @@ -100,7 +100,7 @@ You can also implement some kind of meaningless write request that the applicati ==== Version 2.1.0 and newer -This issue is solved in version 2.1.0 of the {product-proxy}, which introduces periodic heartbeats to keep alive idle cluster connections. +This issue is solved in version 2.1.0 of {product-proxy}, which introduces periodic heartbeats to keep alive idle cluster connections. We strongly recommend using version 2.1.0 (or newer) to benefit from this improvement, especially if you have a read-only workload. [[non-idempotent-operations]] @@ -206,7 +206,7 @@ Storage/table compression doesn't affect the client application or {product-prox While the authenticator has to be supported, the *authorizer* does not affect client applications or {product-proxy} so you should be able to use any kind of authorizer configuration on both of your clusters. -The authentication configuration on each cluster can be different between the origin and target clustesr, as the {product-proxy} treats them independently. +The authentication configuration on each cluster can be different between the origin and target clustesr, as {product-proxy} treats them independently. [[cql-function-replacement]] == Server-side non-deterministic functions in the primary key diff --git a/modules/ROOT/pages/glossary.adoc b/modules/ROOT/pages/glossary.adoc index 7586b12a..52146546 100644 --- a/modules/ROOT/pages/glossary.adoc +++ b/modules/ROOT/pages/glossary.adoc @@ -96,9 +96,9 @@ It is the opposite of the <>. == {product-automation} An Ansible-based tool that allows you to deploy and manage the {product-proxy} instances and associated monitoring stack. -To simplify its setup, the suite includes the {product-utility}. +To simplify its setup, the suite includes {product-utility}. This interactive utility creates a Docker container acting as the Ansible Control Host. -The Ansible playbooks constitute the {product-automation}. +The Ansible playbooks constitute {product-automation}. [[zdm-proxy]] == {product-proxy} diff --git a/modules/ROOT/pages/introduction.adoc b/modules/ROOT/pages/introduction.adoc index 90d69d14..1dc95ea7 100644 --- a/modules/ROOT/pages/introduction.adoc +++ b/modules/ROOT/pages/introduction.adoc @@ -102,7 +102,7 @@ image:migration-phase3ra.png["Migration Phase 3."] === Phase 4: Route reads to the target cluster -In this phase, read routing on the {product-proxy} is switched to teh target cluster so that all reads are executed on the target. +In this phase, read routing on {product-proxy} is switched to the target cluster so that all reads are executed on the target. Writes are still sent to both clusters. At this point, the target becomes the primary cluster. @@ -111,7 +111,7 @@ image:migration-phase4ra9.png["Migration Phase 4."] === Phase 5: Connect directly to the target cluster -In this phase, move your client applications off the {product-proxy} and connect them directly to the target cluster. +In this phase, you move your client applications off {product-proxy} and connect them directly to the target cluster. Once this happens, the migration is complete, and you now exclusively use the target cluster. diff --git a/modules/ROOT/pages/manage-proxy-instances.adoc b/modules/ROOT/pages/manage-proxy-instances.adoc index af0cf313..8c2ff379 100644 --- a/modules/ROOT/pages/manage-proxy-instances.adoc +++ b/modules/ROOT/pages/manage-proxy-instances.adoc @@ -74,7 +74,7 @@ You can view the logs for a single proxy instance, or you can use a playbook to === View the logs -The {product-proxy} runs as a Docker container on each proxy host. +{product-proxy} runs as a Docker container on each proxy host. Its logs can be viewed by connecting to a proxy host and running the following command. [source,bash] @@ -142,7 +142,7 @@ At the start of the migration, the primary cluster is the origin cluster because In Phase 4 of the migration, once all the existing data has been transferred and any validation/reconciliation step has been successfully executed, you can switch the primary cluster to be the target cluster. ** Valid values: `ORIGIN`, `TARGET`. * `read_mode`: -** This variable determines how reads are handled by the {product-proxy}. +** This variable determines how reads are handled by {product-proxy}. ** Valid values: *** `PRIMARY_ONLY`: reads are only sent synchronously to the primary cluster. This is the default behavior. *** `DUAL_ASYNC_ON_SECONDARY`: reads are sent synchronously to the primary cluster and also asynchronously to the secondary cluster. @@ -159,21 +159,21 @@ Other, rarely changed variables: * Target username/password in `vars/zdm_proxy_cluster_config.yml` * Advanced configuration variables in `vars/zdm_proxy_advanced_config.yml`: ** `zdm_proxy_max_clients_connections`: -*** Maximum number of client connections that the {product-proxy} should accept. +*** Maximum number of client connections that {product-proxy} should accept. Each client connection results in additional cluster connections and causes the allocation of several in-memory structures, so this variable can be tweaked to cap the total number on each instance. A high number of client connections per proxy instance may cause some performance degradation, especially at high throughput. *** Defaults to `1000`. ** `replace_cql_functions`: -*** Whether the {product-proxy} should replace standard CQL function calls in write requests with a value computed at proxy level. +*** Whether {product-proxy} should replace standard CQL function calls in write requests with a value computed at proxy level. *** Currently, only the replacement of `now()` is supported. *** Boolean value. Disabled by default. Enabling this will have a noticeable performance impact. ** `zdm_proxy_request_timeout_ms`: *** Global timeout (in ms) of a request at proxy level. -*** This variable determines how long the {product-proxy} will wait for one cluster (in case of reads) or both clusters (in case of writes) to reply to a request. -If this timeout is reached, the {product-proxy} will abandon that request and no longer consider it as pending, thus freeing up the corresponding internal resources. -Note that, in this case, the {product-proxy} will not return any result or error: when the client application's own timeout is reached, the driver will time out the request on its side. +*** This variable determines how long {product-proxy} will wait for one cluster (in case of reads) or both clusters (in case of writes) to reply to a request. +If this timeout is reached, {product-proxy} will abandon that request and no longer consider it as pending, thus freeing up the corresponding internal resources. +Note that, in this case, {product-proxy} will not return any result or error: when the client application's own timeout is reached, the driver will time out the request on its side. *** Defaults to `10000` ms. If your client application has a higher client-side timeout because it is expected to generate requests that take longer to complete, you need to increase this timeout accordingly. ** `origin_connection_timeout_ms` and `target_connection_timeout_ms`: @@ -182,7 +182,7 @@ If your client application has a higher client-side timeout because it is expect ** `async_handshake_timeout_ms`: *** Timeout (in ms) when performing the initialization (handshake) of a proxy-to-secondary cluster connection that will be used solely for asynchronous dual reads. *** If this timeout occurs, the asynchronous reads will not be sent. -This has no impact on the handling of synchronous requests: the {product-proxy} will continue to handle all synchronous reads and writes normally. +This has no impact on the handling of synchronous requests: {product-proxy} will continue to handle all synchronous reads and writes normally. *** Defaults to `4000` ms. ** `heartbeat_interval_ms`: *** Frequency (in ms) with which heartbeats will be sent on cluster connections (i.e. all control and request connections to the origin and the target). @@ -196,7 +196,7 @@ This is not recommended. ** [[zdm_proxy_max_stream_ids]]`zdm_proxy_max_stream_ids`: *** In the CQL protocol every request has a unique id, named stream id. -This variable allows you to tune the maximum pool size of the available stream ids managed by the {product-proxy} per client connection. +This variable allows you to tune the maximum pool size of the available stream ids managed by {product-proxy} per client connection. In the application client, the stream ids are managed internally by the driver, and in most drivers the max number is 2048 (the same default value used in the proxy). If you have a custom driver configuration with a higher value, you should change this property accordingly. *** Defaults to `2048`. diff --git a/modules/ROOT/pages/metrics.adoc b/modules/ROOT/pages/metrics.adoc index a6613905..6f5bbd74 100644 --- a/modules/ROOT/pages/metrics.adoc +++ b/modules/ROOT/pages/metrics.adoc @@ -1,18 +1,18 @@ = Leverage metrics provided by {product-proxy} :page-tag: migration,zdm,zero-downtime,metrics -This topic provides detailed information about the metrics captured by the {product-proxy} and explains how to interpret the metrics. +This topic provides detailed information about the metrics captured by {product-proxy} and explains how to interpret the metrics. == Benefits -The {product-proxy} gathers a large number of metrics, which allows you to gain deep insights into how it is operating with regard to its communication with client applications and clusters, as well as its request handling. +{product-proxy} gathers a large number of metrics, which allows you to gain deep insights into how it is operating with regard to its communication with client applications and clusters, as well as its request handling. -Having visibility on all aspects of the {product-proxy}'s behavior is extremely important in the context of a migration of critical client applications, and is a great help in building confidence in the process and troubleshooting any issues. -For this reason, we strongly encourage you to monitor the {product-proxy}, either by deploying the self-contained monitoring stack provided by the {product-automation} or by importing the pre-built Grafana dashboards in your own monitoring infrastructure. +Having visibility on all aspects of {product-proxy}'s behavior is extremely important in the context of a migration of critical client applications, and is a great help in building confidence in the process and troubleshooting any issues. +For this reason, we strongly encourage you to monitor {product-proxy}, either by deploying the self-contained monitoring stack provided by {product-automation} or by importing the pre-built Grafana dashboards in your own monitoring infrastructure. == Retrieving the {product-proxy} metrics -{product-proxy} exposes an HTTP endpoint that returns metrics in the Prometheus format. +{product-proxy} exposes an HTTP endpoint that returns metrics in the Prometheus format. {product-automation} can deploy Prometheus and Grafana, configuring them automatically, as explained xref:deploy-proxy-monitoring.adoc#_setting_up_the_monitoring_stack[here]. The Grafana dashboards are ready to go with metrics that are being scraped from the {product-proxy} instances. @@ -34,10 +34,10 @@ image::zdm-grafana-proxy-dashboard1.png[Grafana dashboard shows three categories * Latency + -** Read Latency: Total latency measured by the {product-proxy} per read request, including post-processing, such as response aggregation. +** Read Latency: Total latency measured by {product-proxy} per read request, including post-processing, such as response aggregation. This metric has two labels: `reads_origin` and `reads_target`. The label that has data depends on which cluster is receiving the reads, which is the current xref:glossary.adoc#_primary_cluster[primary cluster]. -** Write Latency: Total latency measured by the {product-proxy} per write request, including post-processing, such as response aggregation. +** Write Latency: Total latency measured by {product-proxy} per write request, including post-processing, such as response aggregation. This metric is measured as the total latency across both clusters for a single xref:ROOT:components.adoc#how-zdm-proxy-handles-reads-and-writes[bifurcated write request]. * Throughput (same structure as the previous latency metrics): @@ -49,7 +49,7 @@ This metric is measured as the total latency across both clusters for a single x * Number of client connections * Prepared Statement cache: -** Cache Misses: meaning, a prepared statement was sent to the {product-proxy}, but it wasn't on its cache, so the proxy returned an `UNPREPARED` response to make the driver send the `PREPARE` request again. +** Cache Misses: meaning, a prepared statement was sent to {product-proxy}, but it wasn't on its cache, so the proxy returned an `UNPREPARED` response to make the driver send the `PREPARE` request again. ** Number of cached prepared statements. * Request Failure Rates: the number of request failures per interval. @@ -109,7 +109,7 @@ These metrics track the following information for asynchronous read requests: * Number of dedicated connections per node for the cluster receiving the asynchronous read requests * Number of errors per node, separated by error type -=== Insights via the {product-proxy} metrics +=== Insights from the {product-proxy} metrics Some examples of problems manifesting on these metrics: diff --git a/modules/ROOT/pages/setup-ansible-playbooks.adoc b/modules/ROOT/pages/setup-ansible-playbooks.adoc index aefbbcec..5f1a338a 100644 --- a/modules/ROOT/pages/setup-ansible-playbooks.adoc +++ b/modules/ROOT/pages/setup-ansible-playbooks.adoc @@ -1,7 +1,7 @@ -= Set up the {product-automation} with {product-utility} += Set up {product-automation} with {product-utility} :page-tag: migration,zdm,zero-downtime,zdm-automation,zdm-proxy,ansible -This page explains how to use the {product-utility} to set up the Ansible Control Host container for the {product-automation}. +This page explains how to use {product-utility} to set up the Ansible Control Host container for {product-automation}. After completing the setup tasks in {product-utility}, see the xref:deploy-proxy-monitoring.adoc[next topic] for subsequent steps to use {product-automation}, which you will use to deploy {product-proxy} instances and the monitoring stack. @@ -9,11 +9,11 @@ Once completed, you will have a working and fully monitored {product-proxy} depl == Introduction -The {product-automation} uses **Ansible**, which deploys and configures the {product-proxy} instances and monitoring stack via playbooks. +{product-automation} uses **Ansible**, which deploys and configures the {product-proxy} instances and monitoring stack via playbooks. This step expects that the infrastructure has been already provisioned. See xref:deployment-infrastructure.adoc[Deployment and infrastructure considerations], which include the infrastructure requirements. -Configuring a machine to serve as the Ansible Control Host is very easy using the {product-utility}. +To Configure a machine to serve as the Ansible Control Host, you can use {product-utility}. This is a Golang (Go) executable program that runs anywhere. This utility prompts you for a few configuration values, with helpful embedded explanations and error handling, then automatically creates the Ansible Control Host container ready for you to use. From this container, you will be able to easily configure and run the {product-automation} Ansible playbooks. @@ -22,7 +22,7 @@ image::docker-container-and-zdm-utility.png[{product-proxy} connections from Doc == Prerequisites -. You must have already provisioned the {product-short} infrastructure, which means you must have the server machines ready, and know their IP addresses. +. You must have already provisioned the {product-proxy} infrastructure, which means you must have the server machines ready, and know their IP addresses. These can be in the cloud provider of your choice or on-premise. . Docker needs to be installed on the machine that will be running the Ansible Control Host container. For comprehensive installation instructions, see the https://docs.docker.com/engine/install/#server[Docker documentation]. @@ -67,8 +67,8 @@ The jumphost can be, for example, a Linux server machine that is able to access The jumphost will serve three purposes: * Accessing the {product-proxy} machines. -* Running the Ansible Control Host container, from which the {product-automation} can be run. -* Running the {product-short} monitoring stack, which uses Prometheus and Grafana to expose the metrics of all the {product-proxy} instances in a preconfigured dashboard. +* Running the Ansible Control Host container, from which you can run {product-automation}. +* Running the {product-proxy} monitoring stack, which uses Prometheus and Grafana to expose the metrics of all the {product-proxy} instances in a preconfigured dashboard. [TIP] ==== @@ -80,12 +80,12 @@ Let's get started. == Proxy deployment setup on the jumphost -To run the {product-automation}, the Ansible Control Host needs to be able to connect to all other instances of the {product-proxy} deployment. +To run {product-automation}, the Ansible Control Host needs to be able to connect to all other instances of the {product-proxy} deployment. For this reason, it needs to have the SSH key required by those instances. === Add SSH keys to the jumphost -From your local machine, transfer (`scp`) the SSH private key for the {product-short} deployment to the jumphost. +From your local machine, transfer (`scp`) the SSH private key for the {product-proxy} deployment to the jumphost. Example: [source,bash] @@ -100,7 +100,7 @@ Now connect to the jumphost. ssh -F jumphost ---- -== Run the {product-utility} +== Run {product-utility} . From the jumphost, download the latest {product-utility} executable from the {product-automation-repo}/releases[{product-automation} GitHub repository] {product-automation-shield}. + @@ -120,7 +120,7 @@ wget https://github.com/datastax/zdm-proxy-automation/releases/download/v2.3.0/z tar -xvf zdm-util-linux-amd64-v2.3.0.tgz ---- -. Run the {product-utility}: +. Run {product-utility}: + [source,bash] ---- @@ -131,11 +131,11 @@ The utility prompts you for a few configuration values, then creates and initial [TIP] ==== -The {product-utility} will store the configuration that you provide into a file named `ansible_container_init_config` in the current directory. +{product-utility} will store the configuration that you provide into a file named `ansible_container_init_config` in the current directory. If you run the utility again, it will detect the file and ask you if you wish to use that configuration or discard it. If the configuration is not fully valid, you will be prompted for the missing or invalid parameters only. -You can also pass a custom configuration file to the {product-utility} with the optional command-line parameter `-utilConfigFile`. +You can also pass a custom configuration file to {product-utility} with the optional command-line parameter `-utilConfigFile`. For example: [source,bash] @@ -146,11 +146,11 @@ For example: [NOTE] ==== -The {product-utility} will validate each variable that you enter. +{product-utility} will validate each variable that you enter. In case of invalid variables, it will display specific messages to help you fix the problem. You have five attempts to enter valid variables. -You can always run the {product-utility} again, if necessary. +You can always run {product-utility} again, if necessary. ==== . Enter the path to, and name of, the SSH private key to access the proxy hosts: @@ -169,16 +169,16 @@ You can always run the {product-utility} again, if necessary. . You're asked if you have an existing Ansible inventory file. If you do, and you transferred it to the jumphost, you can just specify it. -If you do not, the {product-utility} will create one based on your answers to prompts and save it. +If you do not, product-utility} will create one based on your answers to prompts and save it. Here we'll assume that you do not have one. Enter `n`. + The created file will be named `zdm_ansible_inventory` in your working directory. -. Next, indicate if this deployment is for local testing and evaluation (such as when you're creating a demo or just experimenting with the {product-proxy}). +. Next, indicate if this deployment is for local testing and evaluation (such as when you're creating a demo or just experimenting with {product-proxy}). In this example, we'll enter `n` because this scenario is for a production deployment. . Now enter at least three proxy private IP addresses for the machines that will run the {product-proxy} instances, for a production deployment. (If we had indicated above that we're doing local testing in dev, only one proxy would have been required.) -Example values entered at the {product-utility}'s prompt, for production: +Example values entered at the {product-utility} prompt, for production: + [source,bash] ---- @@ -190,7 +190,7 @@ Example values entered at the {product-utility}'s prompt, for production: To finish entering private IP addresses, simply press ENTER at the prompt. . Optionally, when prompted, you can enter the private IP address of your Monitoring instance, which will use Prometheus to store data and Grafana to visualize it into a preconfigured dashboard. -It is strongly recommended exposing the {product-proxy} metrics in the preconfigured dashboard that ships with the {product-automation} for easy monitoring. +It is strongly recommended to expose the {product-proxy} metrics in the preconfigured dashboard that ships with {product-automation} for easy monitoring. You can skip this step if you haven't decided which machine to use for monitoring, or if you wish to use your own monitoring stack. + [NOTE] @@ -199,16 +199,16 @@ We highly recommend that you configure a monitoring instance, unless you intend For migrations that may run for multiple days, it is essential that you use metrics to understand the performance and health of the {product-proxy} instances. You cannot rely solely on information in the logs. -They report connection or protocol errors, but do not give you enough information on how the {product-proxy} is working and how each cluster is responding. +They report connection or protocol errors, but do not give you enough information on how {product-proxy} is working and how each cluster is responding. Metrics, however, provide especially helpful data and the graphs show you how they vary over time. The monitoring stack ships with preconfigured Grafana dashboards that are automatically set up as part of the monitoring deployment. For details about the metrics you can observe in these preconfigured Grafana dashboards, see xref:ROOT:metrics.adoc[]. ==== + -You can choose to deploy the monitoring stack on the jumphost or on a different machine, as long as it can connect to the {product-proxy} instances over TCP on ports 9100 (to collect host-level metrics) and on the port on which the {product-proxy} exposes its own metrics, typically 14001. +You can choose to deploy the monitoring stack on the jumphost or on a different machine, as long as it can connect to the {product-proxy} instances over TCP on ports 9100 (to collect host-level metrics) and on the port on which {product-proxy} exposes its own metrics, typically 14001. + -In this example, we'll enter the same IP of the Ansible control host (the jumphost machine on which we're running the {product-utility}). +In this example, we'll enter the same IP of the Ansible control host (the jumphost machine on which we're running {product-utility}). Example: + [source,bash] @@ -216,7 +216,7 @@ Example: 172.18.100.128 ---- -At this point, the {product-utility}: +At this point, {product-utility}: * Has created the Ansible Inventory to the default file, `zdm_ansible_inventory`. * Has written the {product-utility} configuration to the default file, `ansible_container_init_config`. @@ -227,7 +227,7 @@ image::zdm-go-utility-results3.png[A summary of the configuration provided is di If you agree, enter `Y` to proceed. -The {product-utility} now: +{product-utility} now: * Creates and downloads the image of the Ansible Docker container for you. * Creates, configures and starts the Ansible Control Host container. @@ -237,8 +237,8 @@ image::zdm-go-utility-success3.png[Ansible Docker container success messages] [NOTE] ==== -Depending on your circumstances, you can make different choices in the {product-utility}, which will result in a path that is slightly different to the one explained here. +Depending on your circumstances, you can make different choices in the {product-utility} configuration, which will result in a path that is slightly different to the one explained here. The utility will guide you through the process with meaningful, self-explanatory messages and help you rectify any issue that you may encounter. -The successful outcome will always be a configured Ansible Control Host container ready to run the {product-automation}. +The successful outcome will always be a configured Ansible Control Host container ready to run {product-automation}. ==== \ No newline at end of file diff --git a/modules/ROOT/pages/tls.adoc b/modules/ROOT/pages/tls.adoc index 3a137d87..62c2be28 100644 --- a/modules/ROOT/pages/tls.adoc +++ b/modules/ROOT/pages/tls.adoc @@ -4,24 +4,20 @@ {product-proxy} supports proxy-to-cluster and application-to-proxy TLS encryption. -The TLS configuration is an optional part of the initial {product-proxy} configuration. -See the information here in this topic, and then refer to the {product-automation} topics that cover: - -* xref:setup-ansible-playbooks.adoc[] -* xref:deploy-proxy-monitoring.adoc[] +The TLS configuration is an optional part of the initial {product-proxy} configuration, which includes xref:setup-ansible-playbooks.adoc[] and xref:deploy-proxy-monitoring.adoc[]. == Introduction -* All TLS configuration is optional. Enable TLS between the {product-proxy} and any cluster that requires it, and/or between your client application and the {product-proxy} if required. +* All TLS configuration is optional. Enable TLS between {product-proxy} and any cluster that requires it, and/or between your client application and {product-proxy}, if required. -* Proxy-to-cluster TLS can be configured between the {product-proxy} and either or both the origin and target clusters, as desired. -Each set of configurations is independent of the other. When using proxy-to-cluster TLS, the {product-proxy} acts as the TLS client and the cluster as the TLS server. +* Proxy-to-cluster TLS can be configured between {product-proxy} and either or both the origin and target clusters, as desired. +Each set of configurations is independent of the other. When using proxy-to-cluster TLS, {product-proxy} acts as the TLS client and the cluster as the TLS server. One-way TLS and Mutual TLS are both supported and can be enabled depending on each cluster's requirements. -* When using application-to-proxy TLS, your client application is the TLS client and the {product-proxy} is the TLS server. +* When using application-to-proxy TLS, your client application is the TLS client and {product-proxy} is the TLS server. One-way TLS and Mutual TLS are both supported. -* When the {product-proxy} connects to {astra-db}, it always implicitly uses Mutual TLS. +* When {product-proxy} connects to {astra-db}, it always implicitly uses Mutual TLS. This is done through the {scb} and does not require any extra configuration. [[_retrieving_files_from_a_jks_keystore]] @@ -30,7 +26,7 @@ This is done through the {scb} and does not require any extra configuration. If you are already using TLS between your client application and the origin cluster, then the files needed to configure TLS will already be used in the client application's configuration (TLS client files) and the origin's configuration (TLS Server files). In some cases, these files may be contained in a JKS keystore. -The {product-proxy} does not accept a JKS keystore, requiring the raw files instead. +{product-proxy} does not accept a JKS keystore, requiring the raw files instead. To view the files contained in a JKS keystore and their aliases: @@ -51,8 +47,8 @@ For more details, see the https://docs.oracle.com/javase/8/docs/technotes/tools/ == Proxy to self-managed cluster TLS -Here's how to configure TLS between the {product-proxy} and a self-managed cluster ({cass} or {dse-short}). -In this case the {product-proxy} acts as the TLS client and the cluster acts as the TLS server. +Here's how to configure TLS between {product-proxy} and a self-managed cluster ({cass} or {dse-short}). +In this case {product-proxy} acts as the TLS client and the cluster acts as the TLS server. The files required to configure proxy-to-cluster TLS are: @@ -118,8 +114,8 @@ For Mutual TLS only, leave unset otherwise. == Application-to-proxy TLS -Here are the steps to enable TLS between your client application and the {product-proxy} if required. -In this case, your client application is the TLS client and the {product-proxy} is the TLS server. +Here are the steps to enable TLS between your client application and {product-proxy}, if required. +In this case, your client application is the TLS client and {product-proxy} is the TLS server. The files required by the proxy to configure application-to-proxy TLS are: @@ -155,7 +151,7 @@ Optional: defaults to `false` ( = one-way TLS ), can be set to `true` to enable [TIP] ==== -Remember that in this case, the {product-proxy} is the TLS server; thus the word `server` in these variable names. +Remember that in this case, {product-proxy} is the TLS server; thus the word `server` in these variable names. ==== == Apply the configuration diff --git a/modules/ROOT/pages/troubleshooting-scenarios.adoc b/modules/ROOT/pages/troubleshooting-scenarios.adoc index d6182f29..2c4e214e 100644 --- a/modules/ROOT/pages/troubleshooting-scenarios.adoc +++ b/modules/ROOT/pages/troubleshooting-scenarios.adoc @@ -80,9 +80,9 @@ msg=Invalid or unsupported protocol version (5)).\"\n","stream":"stderr","time": === Cause Protocol errors like these are a normal part of the handshake process where the protocol version is being negotiated. -These protocol version downgrades happen when either the {product-proxy} or at least one of the clusters doesn't support the version requested by the client. +These protocol version downgrades happen when either {product-proxy} or at least one of the clusters doesn't support the version requested by the client. -V5 downgrades are enforced by the {product-proxy} but any other downgrade is requested by one of the clusters when they don't support the version that the client requested. +V5 downgrades are enforced by {product-proxy} but any other downgrade is requested by one of the clusters when they don't support the version that the client requested. The proxy supports V3, V4, DSE_V1 and DSE_V2. //// @@ -133,7 +133,7 @@ time="2022-10-01T19:58:15+01:00" level=error msg="Couldn't start proxy, retrying === Cause -The control connections of the {product-proxy} don't perform protocol version negotiation, they only attempt to use protocol version 3. +The control connections of {product-proxy} don't perform protocol version negotiation, they only attempt to use protocol version 3. If one of the origin clusters doesn't support at least V3 (e.g. {cass-short} 2.0, {dse-short} 4.6), then {product-short} cannot be used for that migration at the moment. We plan to introduce support for {cass-short} 2.0 and {dse-short} 4.6 very soon. @@ -176,15 +176,15 @@ If the proxy is able to start up -- that is, this message can be seen in the log then the authentication error is happening when a client application tries to open a connection to the proxy. In this case, the issue is with the Client credentials so the application itself is using invalid credentials (incorrect username/password or insufficient permissions). -Note that the proxy startup message has log level `INFO` so if the configured log level on the proxy is `warning` or `error`, you will have to rely on other ways to know whether the {product-proxy} started correctly. +Note that the proxy startup message has log level `INFO` so if the configured log level on the proxy is `warning` or `error`, you will have to rely on other ways to know whether {product-proxy} started correctly. You can check if the docker container is running (or process if docker isn't being used) or if there is a log message similar to `Error launching proxy`. -== The {product-proxy} listens on a custom port, and all applications are able to connect to one proxy instance only +== {product-proxy} listens on a custom port, and all applications are able to connect to one proxy instance only === Symptoms -The {product-proxy} is listening on a custom port (not 9042) and: +{product-proxy} is listening on a custom port (not 9042) and: * The Grafana dashboard shows only one proxy instance receiving all the connections from the application. * Only one proxy instance has log messages such as `level=info msg="Accepted connection from 10.4.77.210:39458"`. @@ -248,7 +248,7 @@ Consider a case where you deploy the metrics component of our {product-automatio === Cause -The {product-automation} specifies a custom set of credentials instead of relying on the `admin/admin` ones that are typically the default for Grafana deployments. +{product-automation} specifies a custom set of credentials instead of relying on the `admin/admin` ones that are typically the default for Grafana deployments. === Solution or Workaround @@ -298,7 +298,7 @@ For client connections, each proxy instance cycles through its "assigned nodes" _(The "assigned nodes" are a different subset of the cluster nodes for each proxy instance, generally non-overlapping between proxy instances so as to avoid any interference with the load balancing already in place at client-side driver level. The assigned nodes are not necessarily contact points: even discovered nodes undergo assignment to proxy instances.)_ -In the example above, the {product-proxy} doesn't have connectivity to 10.0.63.20, which was chosen as the origin node for the incoming client connection, but it was able to connect to 10.0.63.163 during startup. +In the example above, {product-proxy} doesn't have connectivity to 10.0.63.20, which was chosen as the origin node for the incoming client connection, but it was able to connect to 10.0.63.163 during startup. === Solution or Workaround @@ -308,13 +308,13 @@ Ensure that network connectivity exists and is stable between the {product-proxy === Symptoms -After a {product-proxy} has been unavailable for some time and it gets back up, the client application takes too long to reconnect. +After a {product-proxy} instance has been unavailable for some time and it gets back up, the client application takes too long to reconnect. -There should never be a reason to stop a {product-proxy} instance other than a configuration change but maybe the proxy crashed or the user tried to do a configuration change and took a long time to get the {product-proxy} back up. +There should never be a reason to stop a {product-proxy} instance other than a configuration change but maybe the proxy crashed or the user tried to do a configuration change and took a long time to get the {product-proxy} instance back up. === Cause -The {product-proxy} does not send topology events to the client applications so the time it takes for the driver to reconnect to a {product-proxy} instance is determined by the reconnection policy. +{product-proxy} does not send topology events to the client applications so the time it takes for the driver to reconnect to a {product-proxy} instance is determined by the reconnection policy. === Solution or Workaround @@ -322,7 +322,7 @@ Restart the client application to force an immediate reconnect. If you expect {product-proxy} instances to go down frequently, change the reconnection policy on the driver so that the interval between reconnection attempts has a shorter limit. -== Error with {astra} DevOps API when using the {product-automation} +== Error with {astra} DevOps API when using {product-automation} === Symptoms @@ -347,7 +347,7 @@ xref:astra-db-serverless:databases:secure-connect-bundle.adoc[Download the {astr === Symptoms -The {product-proxy} doesn't start and the following appears on the proxy logs: +{product-proxy} doesn't start and the following appears on the proxy logs: [source,log] ---- @@ -359,7 +359,7 @@ metadata service (Astra) returned not successful status code There are two possible causes for this: -* The credentials that the {product-proxy} is using for {astra-db} don't have sufficient permissions. +* The credentials that {product-proxy} is using for {astra-db} don't have sufficient permissions. * The {astra-db} database is hibernated or otherwise unavailable. === Solution or Workaround @@ -437,7 +437,7 @@ If a client application only sends reads through a connection then the target cl This issue has been fixed in {product-proxy} 2.1.0. We encourage you to upgrade to that version or greater. By default, {product-proxy} now sends heartbeats after 30 seconds of inactivity on a cluster connection, to keep it alive. -You can tune the heartbeat interval with the Ansible configuration variable `heartbeat_insterval_ms`, or by directly setting the `ZDM_HEARTBEAT_INTERVAL_MS` environment variable if you do not use the {product-automation}. +You can tune the heartbeat interval with the Ansible configuration variable `heartbeat_insterval_ms`, or by directly setting the `ZDM_HEARTBEAT_INTERVAL_MS` environment variable if you do not use {product-automation}. == Performance degradation with {product-proxy} @@ -475,7 +475,7 @@ In contrast, prepared statements are parsed once, and then reused on subsequent If you are using simple statements, consider using prepared statements as the best first step. Increasing the number of proxies might help, but only if the VMs resources (CPU, RAM or network IO) are near capacity. -The {product-proxy} doesn't use a lot of RAM, but it uses a lot of CPU and network IO. +{product-proxy} doesn't use a lot of RAM but it uses a lot of CPU and network IO. Deploying the proxy instances on VMs with faster CPUs and faster network IO might help, but only your own tests will reveal whether it helps, because it depends on the workload type and details about your environment such as network/VPC configurations, hardware, and so on. diff --git a/modules/ROOT/pages/troubleshooting-tips.adoc b/modules/ROOT/pages/troubleshooting-tips.adoc index 80b468e1..91418761 100644 --- a/modules/ROOT/pages/troubleshooting-tips.adoc +++ b/modules/ROOT/pages/troubleshooting-tips.adoc @@ -12,7 +12,7 @@ You can also contact your {company} account representative or {support-url}[{com [#proxy-logs] == {product-proxy} logs -The {product-proxy} logs can help you troubleshoot issues with {product}. +{product-proxy} logs can help you troubleshoot issues with {product}. === Set the {product-proxy} log level @@ -23,23 +23,23 @@ The default log level is `INFO`, which is adequate for most logging. If you need more detail for temporary troubleshooting, you can set the log level to `DEBUG`. However, this can slightly degrade performance, and {company} recommends that you revert to `INFO` logging as soon as possible. -How you set the log level depends on how you deployed the {product-proxy}: +How you set the log level depends on how you deployed {product-proxy}: -* If you used {product-automation} to deploy the {product-proxy}, set `log_level` in `vars/zdm_proxy_core_config.yml`. +* If you used {product-automation} to deploy {product-proxy}, set `log_level` in `vars/zdm_proxy_core_config.yml`. + You can change this value in a rolling fashion by editing the variable and running the `rolling_update_zdm_proxy.yml` playbook. For more information, see xref:manage-proxy-instances.adoc#change-mutable-config-variable[Change a mutable configuration variable]. -* If you didn't use {product-automation} to deploy the {product-proxy}, set the `ZDM_LOG_LEVEL` environment variable on each proxy instance, and then restart each instance. +* If you didn't use {product-automation} to deploy {product-proxy}, set the `ZDM_LOG_LEVEL` environment variable on each proxy instance, and then restart each instance. === Retrieve the {product-proxy} log files //TODO: Reconcile with manage-proxy-instance.adoc content. -If you used the {product-automation} to deploy {product-proxy}, then you can get logs for a single proxy instance, and you can use a playbook to retrieve logs for all instances. +If you used {product-automation} to deploy {product-proxy}, then you can get logs for a single proxy instance, and you can use a playbook to retrieve logs for all instances. For instructions and more information, see xref:ROOT:manage-proxy-instances.adoc#access-the-proxy-logs[Access the proxy logs]. -If you did not use the {product-automation} to deploy {product-proxy}, you might have to access the logs another way. +If you did not use {product-automation} to deploy {product-proxy}, you might have to access the logs another way. For example, if you used Docker, you can use the following command to export a container's logs to a `log.txt` file: [source,bash] @@ -59,7 +59,7 @@ However, if you enable `DEBUG` logging, `debug` messages can help you find the s * `level=warn`: Reports an event that wasn't fatal to the overall process, but could indicate an issue with an individual request or connection. -* `level=error`: Indicates an issue with the {product-proxy}, client application, or clusters. +* `level=error`: Indicates an issue with {product-proxy}, the client application, or the clusters. These messages require further examination. If the meaning of a `warn` or `error` message isn't clear, you can submit an issue in the {product-proxy-repo}/issues[{product-proxy} GitHub repository]. @@ -70,7 +70,7 @@ Here are the most common messages in the {product-proxy} logs. ==== {product-proxy} startup message -If the log level doesn't filter out `info` entries, you can look for a `Proxy started` log message to verify that the {product-proxy} started correctly. +If the log level doesn't filter out `info` entries, you can look for a `Proxy started` log message to verify that {product-proxy} started correctly. For example: [source,json] @@ -130,7 +130,7 @@ time="2023-01-13T13:37:28+01:00" level=info msg="Parsed configuration: ..." This message is logged immediately before the long `Parsed configuration` string. -You can also pass the `-version` flag to the {product-proxy} to print the version. +You can also pass the `-version` flag to {product-proxy} to print the version. For example, you can use the following Docker command: [source,bash] @@ -178,7 +178,7 @@ For example, you might compare `cluster_name` to ensure that all instances are c To report an issue or get additional support, submit an issue in the {product-short} component GitHub repositories: * {product-proxy-repo}/issues[{product-proxy} repository] -* {product-automation-repo}/issues[{product-automation} repository] (includes {product-automation} and the {product-utility}) +* {product-automation-repo}/issues[{product-automation} repository] (includes {product-automation} and {product-utility}) * {cass-migrator-repo}/issues[{cass-migrator} repository] * {dsbulk-migrator-repo}/issues[{dsbulk-migrator} repository] From 6d3d5523820ce3e8d000cdc0a9f83d57c883733a Mon Sep 17 00:00:00 2001 From: "April I. Murphy" <36110273+aimurphy@users.noreply.github.com> Date: Tue, 3 Jun 2025 11:04:23 -0700 Subject: [PATCH 7/8] Update modules/ROOT/pages/setup-ansible-playbooks.adoc --- modules/ROOT/pages/setup-ansible-playbooks.adoc | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/modules/ROOT/pages/setup-ansible-playbooks.adoc b/modules/ROOT/pages/setup-ansible-playbooks.adoc index 5f1a338a..f86b7dbc 100644 --- a/modules/ROOT/pages/setup-ansible-playbooks.adoc +++ b/modules/ROOT/pages/setup-ansible-playbooks.adoc @@ -13,7 +13,7 @@ Once completed, you will have a working and fully monitored {product-proxy} depl This step expects that the infrastructure has been already provisioned. See xref:deployment-infrastructure.adoc[Deployment and infrastructure considerations], which include the infrastructure requirements. -To Configure a machine to serve as the Ansible Control Host, you can use {product-utility}. +To configure a machine to serve as the Ansible Control Host, you can use {product-utility}. This is a Golang (Go) executable program that runs anywhere. This utility prompts you for a few configuration values, with helpful embedded explanations and error handling, then automatically creates the Ansible Control Host container ready for you to use. From this container, you will be able to easily configure and run the {product-automation} Ansible playbooks. From ab97253bff8c38ead48702175c2bbd660a1d2e94 Mon Sep 17 00:00:00 2001 From: "April I. Murphy" <36110273+aimurphy@users.noreply.github.com> Date: Tue, 3 Jun 2025 16:05:21 -0700 Subject: [PATCH 8/8] Apply suggestions from code review Co-authored-by: brian-f --- modules/ROOT/pages/change-read-routing.adoc | 4 ++-- modules/ROOT/pages/components.adoc | 2 +- modules/ROOT/pages/connect-clients-to-proxy.adoc | 2 +- modules/ROOT/pages/deployment-infrastructure.adoc | 2 +- modules/ROOT/pages/migrate-and-validate-data.adoc | 4 ++-- modules/ROOT/pages/setup-ansible-playbooks.adoc | 2 +- modules/ROOT/pages/troubleshooting-scenarios.adoc | 10 +++++----- modules/ROOT/pages/troubleshooting-tips.adoc | 2 +- 8 files changed, 14 insertions(+), 14 deletions(-) diff --git a/modules/ROOT/pages/change-read-routing.adoc b/modules/ROOT/pages/change-read-routing.adoc index c44bed1b..e16a7889 100644 --- a/modules/ROOT/pages/change-read-routing.adoc +++ b/modules/ROOT/pages/change-read-routing.adoc @@ -69,9 +69,9 @@ This is not a required step, but you may wish to do it for peace of mind. ==== Issuing a `DESCRIBE` or a read to any system table through {product-proxy} isn't a valid verification. -{product-proxy} handles reads to system tables differently, by intercepting them and always routing them to the origin, in some cases partly populating them at proxy level. +{product-proxy} handles reads to system tables differently, by intercepting them and always routing them to the origin, in some cases partly populating them at the proxy level. -This means that system reads aren't representative of how {product-proxy} routes regular user reads. +This means that system reads don't represent how {product-proxy} routes regular user reads. Even after you switched the configuration to read the target cluster as the primary cluster, all system reads still go to the origin. Although `DESCRIBE` requests are not system requests, they are also generally resolved in a different way to regular requests, and should not be used as a means to verify the read routing behavior. diff --git a/modules/ROOT/pages/components.adoc b/modules/ROOT/pages/components.adoc index 9678eacf..945408fc 100644 --- a/modules/ROOT/pages/components.adoc +++ b/modules/ROOT/pages/components.adoc @@ -3,7 +3,7 @@ :description: Learn about {company} migration tools. :page-tag: migration,zdm,zero-downtime,zdm-proxy,components -The {company} {product} ({product-short}) toolkit includes {product-proxy}, {product-utility}, and {product-automation}, and several data migration tools. +The {company} {product} ({product-short}) toolkit includes {product-proxy}, {product-utility}, {product-automation}, and several data migration tools. For live migrations, {product-proxy} orchestrates activity-in-transition on your clusters. {product-utility} and {product-automation} facilitate the deployment and management of {product-proxy}. diff --git a/modules/ROOT/pages/connect-clients-to-proxy.adoc b/modules/ROOT/pages/connect-clients-to-proxy.adoc index ab0fe423..809d9166 100644 --- a/modules/ROOT/pages/connect-clients-to-proxy.adoc +++ b/modules/ROOT/pages/connect-clients-to-proxy.adoc @@ -148,7 +148,7 @@ For information about {astra-db} credentials in your {product-proxy} configurati === Disable client-side compression with {product-proxy} Client applications must not enable client-side compression when connecting through {product-proxy}, as this is not currently supported. -This is disabled by default in all drivers, but if it was enabled in your client application configuration it will have to be temporarily disabled when connecting to {product-proxy}. +This is disabled by default in all drivers, but if it was enabled in your client application configuration, it will have to be temporarily disabled when connecting to {product-proxy}. === {product-proxy} ignores token-aware routing diff --git a/modules/ROOT/pages/deployment-infrastructure.adoc b/modules/ROOT/pages/deployment-infrastructure.adoc index a641e5fe..87c91cae 100644 --- a/modules/ROOT/pages/deployment-infrastructure.adoc +++ b/modules/ROOT/pages/deployment-infrastructure.adoc @@ -22,7 +22,7 @@ image::zdm-during-migration3.png[Connectivity between client applications, proxy == Infrastructure requirements -To deploy {product-proxy} and its companion monitoring stack, you will have to provision infrastructure that meets the following requirements. +To deploy {product-proxy} and its companion monitoring stack, you must provision infrastructure that meets the following requirements. [[_machines]] === Machines diff --git a/modules/ROOT/pages/migrate-and-validate-data.adoc b/modules/ROOT/pages/migrate-and-validate-data.adoc index bc8b603b..e46eb0f0 100644 --- a/modules/ROOT/pages/migrate-and-validate-data.adoc +++ b/modules/ROOT/pages/migrate-and-validate-data.adoc @@ -10,7 +10,7 @@ To move and validate data, you can use a dedicated data migration tool, such as == {sstable-sideloader} -{sstable-sideloader} is a service running in {astra-db} that imports data from snapshots of your existing {cass-short}-based cluster. +{sstable-sideloader} is a service running in {astra-db} that imports data from snapshots of your existing {cass-reg}-based cluster. This tool is exclusively for migrations that move data to {astra-db}. You can use {sstable-sideloader} alone or with {product-proxy}. @@ -19,7 +19,7 @@ For more information, see xref:sideloader:sideloader-zdm.adoc[]. == {cass-migrator} -You can use {cass-migrator} ({cass-migrator-short}) for data migration and validation between {cass-reg}-based databases. +You can use {cass-migrator} ({cass-migrator-short}) for data migration and validation between {cass-short}-based databases. It offers extensive functionality and configuration options to support large and complex migrations as well as post-migration data validation. You can use {cass-migrator-short} by itself, with {product-proxy}, or for data validation after using another data migration tool. diff --git a/modules/ROOT/pages/setup-ansible-playbooks.adoc b/modules/ROOT/pages/setup-ansible-playbooks.adoc index f86b7dbc..e4ae3741 100644 --- a/modules/ROOT/pages/setup-ansible-playbooks.adoc +++ b/modules/ROOT/pages/setup-ansible-playbooks.adoc @@ -9,7 +9,7 @@ Once completed, you will have a working and fully monitored {product-proxy} depl == Introduction -{product-automation} uses **Ansible**, which deploys and configures the {product-proxy} instances and monitoring stack via playbooks. +{product-automation} uses **Ansible**, which deploys and configures the {product-proxy} instances and monitoring stack using playbooks. This step expects that the infrastructure has been already provisioned. See xref:deployment-infrastructure.adoc[Deployment and infrastructure considerations], which include the infrastructure requirements. diff --git a/modules/ROOT/pages/troubleshooting-scenarios.adoc b/modules/ROOT/pages/troubleshooting-scenarios.adoc index 2c4e214e..f5ae48d8 100644 --- a/modules/ROOT/pages/troubleshooting-scenarios.adoc +++ b/modules/ROOT/pages/troubleshooting-scenarios.adoc @@ -176,7 +176,7 @@ If the proxy is able to start up -- that is, this message can be seen in the log then the authentication error is happening when a client application tries to open a connection to the proxy. In this case, the issue is with the Client credentials so the application itself is using invalid credentials (incorrect username/password or insufficient permissions). -Note that the proxy startup message has log level `INFO` so if the configured log level on the proxy is `warning` or `error`, you will have to rely on other ways to know whether {product-proxy} started correctly. +Note that the proxy startup message has log level `INFO`, so if the configured log level on the proxy is `warning` or `error`, you must rely on other ways to know whether {product-proxy} started correctly. You can check if the docker container is running (or process if docker isn't being used) or if there is a log message similar to `Error launching proxy`. @@ -298,7 +298,7 @@ For client connections, each proxy instance cycles through its "assigned nodes" _(The "assigned nodes" are a different subset of the cluster nodes for each proxy instance, generally non-overlapping between proxy instances so as to avoid any interference with the load balancing already in place at client-side driver level. The assigned nodes are not necessarily contact points: even discovered nodes undergo assignment to proxy instances.)_ -In the example above, {product-proxy} doesn't have connectivity to 10.0.63.20, which was chosen as the origin node for the incoming client connection, but it was able to connect to 10.0.63.163 during startup. +In the example above, {product-proxy} doesn't have connectivity to 10.0.63.20, which was chosen as the origin node for the incoming client connection, but it connected to 10.0.63.163 during startup. === Solution or Workaround @@ -310,11 +310,11 @@ Ensure that network connectivity exists and is stable between the {product-proxy After a {product-proxy} instance has been unavailable for some time and it gets back up, the client application takes too long to reconnect. -There should never be a reason to stop a {product-proxy} instance other than a configuration change but maybe the proxy crashed or the user tried to do a configuration change and took a long time to get the {product-proxy} instance back up. +There should never be a reason to stop a {product-proxy} instance other than a configuration change, but maybe the proxy crashed or the user tried to do a configuration change and took a long time to get the {product-proxy} instance back up. === Cause -{product-proxy} does not send topology events to the client applications so the time it takes for the driver to reconnect to a {product-proxy} instance is determined by the reconnection policy. +{product-proxy} does not send topology events to the client applications, so the reconnection policy determines the time required for the driver to reconnect to a {product-proxy} instance. === Solution or Workaround @@ -475,7 +475,7 @@ In contrast, prepared statements are parsed once, and then reused on subsequent If you are using simple statements, consider using prepared statements as the best first step. Increasing the number of proxies might help, but only if the VMs resources (CPU, RAM or network IO) are near capacity. -{product-proxy} doesn't use a lot of RAM but it uses a lot of CPU and network IO. +{product-proxy} doesn't use a lot of RAM, but it uses a lot of CPU and network IO. Deploying the proxy instances on VMs with faster CPUs and faster network IO might help, but only your own tests will reveal whether it helps, because it depends on the workload type and details about your environment such as network/VPC configurations, hardware, and so on. diff --git a/modules/ROOT/pages/troubleshooting-tips.adoc b/modules/ROOT/pages/troubleshooting-tips.adoc index 91418761..eabd0665 100644 --- a/modules/ROOT/pages/troubleshooting-tips.adoc +++ b/modules/ROOT/pages/troubleshooting-tips.adoc @@ -30,7 +30,7 @@ How you set the log level depends on how you deployed {product-proxy}: You can change this value in a rolling fashion by editing the variable and running the `rolling_update_zdm_proxy.yml` playbook. For more information, see xref:manage-proxy-instances.adoc#change-mutable-config-variable[Change a mutable configuration variable]. -* If you didn't use {product-automation} to deploy {product-proxy}, set the `ZDM_LOG_LEVEL` environment variable on each proxy instance, and then restart each instance. +* If you didn't use {product-automation} to deploy {product-proxy}, set the `ZDM_LOG_LEVEL` environment variable on each proxy instance and then restart each instance. === Retrieve the {product-proxy} log files