From 3d4d5e9fc2a7f61476739f636a93987580b45215 Mon Sep 17 00:00:00 2001 From: mich-elle-luna Date: Tue, 27 May 2025 13:40:56 -0700 Subject: [PATCH 1/9] Add syncer error recovery troubleshooting documentation - Add troubleshooting section for unrecoverable syncer errors - Document exit code 4 behavior and DMC response - Provide recovery procedures for regular and Active-Active databases - Include REST API and crdb-cli recovery methods - Add clear examples with placeholder values Resolves DOC-1554 --- .../rs/databases/active-active/syncer.md | 52 +++++++++++++++++-- 1 file changed, 48 insertions(+), 4 deletions(-) diff --git a/content/operate/rs/databases/active-active/syncer.md b/content/operate/rs/databases/active-active/syncer.md index 5394e3d5ea..2fdcd9c5b9 100644 --- a/content/operate/rs/databases/active-active/syncer.md +++ b/content/operate/rs/databases/active-active/syncer.md @@ -28,7 +28,7 @@ When a new primary is appointed, the replication ID changes, but a partial sync In a partial sync, the backlog of operations since the offset are transferred as raw operations. -In a full sync, the data from the primary is transferred to the replica as an RDB file which is followed by a partial sync. +In a full sync, the data from the primary is transferred to the replica as an RDB file which is followed by a partial sync. Partial synchronization requires a backlog large enough to store the data operations until connection is restored. See [replication backlog]({{< relref "/operate/rs/databases/active-active/manage#replication-backlog" >}}) for more info on changing the replication backlog size. @@ -36,11 +36,11 @@ Partial synchronization requires a backlog large enough to store the data operat In the case of an Active-Active database: -- Multiple past replication IDs and offsets are stored to allow for multiple syncs -- The [Active-Active replication backlog]({{< relref "/operate/rs/databases/active-active/manage#replication-backlog" >}}) is also sent to the replica during a full sync. +- Multiple past replication IDs and offsets are stored to allow for multiple syncs +- The [Active-Active replication backlog]({{< relref "/operate/rs/databases/active-active/manage#replication-backlog" >}}) is also sent to the replica during a full sync. {{< warning >}} -Full sync triggers heavy data transfers between geo-replicated instances of an Active-Active database. +Full sync triggers heavy data transfers between geo-replicated instances of an Active-Active database. {{< /warning >}} An Active-Active database uses partial synchronization in the following situations: @@ -53,4 +53,48 @@ An Active-Active database uses partial synchronization in the following situatio {{< note >}} Synchronization of data from the primary shard to the replica shard is always a full synchronization. +{{< /note >}} + +## Troubleshooting syncer errors + +### Unrecoverable syncer errors + +Some syncer errors are unrecoverable and cause the syncer to exit with exit code 4. When this occurs, the Database Management Component (DMC) automatically sets the `crdt_sync` or `replica_sync` value to `stopped`. + +### Recovery procedures + +To re-enable the syncer after an unrecoverable error: + +#### For regular databases + +Use the cluster REST API to enable sync: + +```sh +curl -v -k -u : -X PUT \ + -H "Content-Type: application/json" \ + -d '{"sync":"enabled"}' \ + http://:8080/v1/bdbs/ +``` + +#### For Active-Active databases (CRDB) + +For Active-Active databases, you have two options: + +1. **Call the API on all participating clusters:** + + ```sh + curl -v -k -u : -X PUT \ + -H "Content-Type: application/json" \ + -d '{"sync":"enabled"}' \ + http://:8080/v1/bdbs/ + ``` + +2. **Use crdb-cli (recommended):** + + ```sh + crdb-cli crdb update --crdb-guid --force + ``` + +{{< note >}} +Replace ``, ``, ``, ``, and `` with your actual values. {{< /note >}} \ No newline at end of file From f611dcead26bfa9bc29498ac994d8fb7cd83d6ff Mon Sep 17 00:00:00 2001 From: mich-elle-luna <153109578+mich-elle-luna@users.noreply.github.com> Date: Thu, 29 May 2025 09:27:42 -0700 Subject: [PATCH 2/9] Update content/operate/rs/databases/active-active/syncer.md Co-authored-by: Rachel Elledge <86307637+rrelledge@users.noreply.github.com> --- content/operate/rs/databases/active-active/syncer.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/content/operate/rs/databases/active-active/syncer.md b/content/operate/rs/databases/active-active/syncer.md index 2fdcd9c5b9..acd19e3d5b 100644 --- a/content/operate/rs/databases/active-active/syncer.md +++ b/content/operate/rs/databases/active-active/syncer.md @@ -59,7 +59,7 @@ Synchronization of data from the primary shard to the replica shard is always a ### Unrecoverable syncer errors -Some syncer errors are unrecoverable and cause the syncer to exit with exit code 4. When this occurs, the Database Management Component (DMC) automatically sets the `crdt_sync` or `replica_sync` value to `stopped`. +Some syncer errors are unrecoverable and cause the syncer to exit with exit code 4. When this occurs, the Data Management Controller (DMC) automatically sets the `crdt_sync` or `replica_sync` value to `stopped`. ### Recovery procedures From 6b41247aba93608f52d7a4212b702c1d2332ae28 Mon Sep 17 00:00:00 2001 From: mich-elle-luna <153109578+mich-elle-luna@users.noreply.github.com> Date: Thu, 29 May 2025 09:28:12 -0700 Subject: [PATCH 3/9] Update content/operate/rs/databases/active-active/syncer.md Co-authored-by: Rachel Elledge <86307637+rrelledge@users.noreply.github.com> --- content/operate/rs/databases/active-active/syncer.md | 7 ++----- 1 file changed, 2 insertions(+), 5 deletions(-) diff --git a/content/operate/rs/databases/active-active/syncer.md b/content/operate/rs/databases/active-active/syncer.md index acd19e3d5b..a901061805 100644 --- a/content/operate/rs/databases/active-active/syncer.md +++ b/content/operate/rs/databases/active-active/syncer.md @@ -61,13 +61,10 @@ Synchronization of data from the primary shard to the replica shard is always a Some syncer errors are unrecoverable and cause the syncer to exit with exit code 4. When this occurs, the Data Management Controller (DMC) automatically sets the `crdt_sync` or `replica_sync` value to `stopped`. -### Recovery procedures +#### Restart syncer for regular databases -To re-enable the syncer after an unrecoverable error: +To restart a regular database's syncer after an unrecoverable error, [update the database configuration]({{}}) with the REST API to enable `sync`: -#### For regular databases - -Use the cluster REST API to enable sync: ```sh curl -v -k -u : -X PUT \ From 36188926a8d782816163442c1a9c2c8304274cc2 Mon Sep 17 00:00:00 2001 From: mich-elle-luna <153109578+mich-elle-luna@users.noreply.github.com> Date: Thu, 29 May 2025 09:28:23 -0700 Subject: [PATCH 4/9] Update content/operate/rs/databases/active-active/syncer.md Co-authored-by: Rachel Elledge <86307637+rrelledge@users.noreply.github.com> --- content/operate/rs/databases/active-active/syncer.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/content/operate/rs/databases/active-active/syncer.md b/content/operate/rs/databases/active-active/syncer.md index a901061805..46fee6bc6d 100644 --- a/content/operate/rs/databases/active-active/syncer.md +++ b/content/operate/rs/databases/active-active/syncer.md @@ -73,7 +73,7 @@ curl -v -k -u : -X PUT \ http://:8080/v1/bdbs/ ``` -#### For Active-Active databases (CRDB) +#### Restart syncer for Active-Active databases For Active-Active databases, you have two options: From e216da36f51f317b7fcad2ad8653e384bf956425 Mon Sep 17 00:00:00 2001 From: mich-elle-luna <153109578+mich-elle-luna@users.noreply.github.com> Date: Thu, 29 May 2025 09:29:01 -0700 Subject: [PATCH 5/9] Update content/operate/rs/databases/active-active/syncer.md Co-authored-by: Rachel Elledge <86307637+rrelledge@users.noreply.github.com> --- content/operate/rs/databases/active-active/syncer.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/content/operate/rs/databases/active-active/syncer.md b/content/operate/rs/databases/active-active/syncer.md index 46fee6bc6d..853669749a 100644 --- a/content/operate/rs/databases/active-active/syncer.md +++ b/content/operate/rs/databases/active-active/syncer.md @@ -75,9 +75,9 @@ curl -v -k -u : -X PUT \ #### Restart syncer for Active-Active databases -For Active-Active databases, you have two options: +To restart an Active-Active database's syncer after an unrecoverable error, use one of the following methods. -1. **Call the API on all participating clusters:** +- For each participating cluster, [update the database configuration]({{}}) with the REST API to enable `sync`: ```sh curl -v -k -u : -X PUT \ From a2a5c6d99f8d7d6078a61779ba7cdd14be26fe21 Mon Sep 17 00:00:00 2001 From: mich-elle-luna <153109578+mich-elle-luna@users.noreply.github.com> Date: Thu, 29 May 2025 09:29:08 -0700 Subject: [PATCH 6/9] Update content/operate/rs/databases/active-active/syncer.md Co-authored-by: Rachel Elledge <86307637+rrelledge@users.noreply.github.com> --- content/operate/rs/databases/active-active/syncer.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/content/operate/rs/databases/active-active/syncer.md b/content/operate/rs/databases/active-active/syncer.md index 853669749a..46124d3a95 100644 --- a/content/operate/rs/databases/active-active/syncer.md +++ b/content/operate/rs/databases/active-active/syncer.md @@ -86,7 +86,7 @@ To restart an Active-Active database's syncer after an unrecoverable error, use http://:8080/v1/bdbs/ ``` -2. **Use crdb-cli (recommended):** +- Run [`crdb-cli crdb update`]({{}}): ```sh crdb-cli crdb update --crdb-guid --force From d35bacb4e59ea349739a9660887755902a17c67a Mon Sep 17 00:00:00 2001 From: mich-elle-luna <153109578+mich-elle-luna@users.noreply.github.com> Date: Thu, 29 May 2025 09:29:25 -0700 Subject: [PATCH 7/9] Update content/operate/rs/databases/active-active/syncer.md Co-authored-by: Rachel Elledge <86307637+rrelledge@users.noreply.github.com> --- content/operate/rs/databases/active-active/syncer.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/content/operate/rs/databases/active-active/syncer.md b/content/operate/rs/databases/active-active/syncer.md index 46124d3a95..f267c9b705 100644 --- a/content/operate/rs/databases/active-active/syncer.md +++ b/content/operate/rs/databases/active-active/syncer.md @@ -69,8 +69,8 @@ To restart a regular database's syncer after an unrecoverable error, [update the ```sh curl -v -k -u : -X PUT \ -H "Content-Type: application/json" \ - -d '{"sync":"enabled"}' \ - http://:8080/v1/bdbs/ + -d '{"sync": "enabled"}' \ + https://:/v1/bdbs/ ``` #### Restart syncer for Active-Active databases From 407c2ca92326b05fbe53cf91dd3720b5d0ddfcdd Mon Sep 17 00:00:00 2001 From: mich-elle-luna <153109578+mich-elle-luna@users.noreply.github.com> Date: Thu, 29 May 2025 09:29:35 -0700 Subject: [PATCH 8/9] Update content/operate/rs/databases/active-active/syncer.md Co-authored-by: Rachel Elledge <86307637+rrelledge@users.noreply.github.com> --- content/operate/rs/databases/active-active/syncer.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/content/operate/rs/databases/active-active/syncer.md b/content/operate/rs/databases/active-active/syncer.md index f267c9b705..64c72e7296 100644 --- a/content/operate/rs/databases/active-active/syncer.md +++ b/content/operate/rs/databases/active-active/syncer.md @@ -82,8 +82,8 @@ To restart an Active-Active database's syncer after an unrecoverable error, use ```sh curl -v -k -u : -X PUT \ -H "Content-Type: application/json" \ - -d '{"sync":"enabled"}' \ - http://:8080/v1/bdbs/ + -d '{"sync": "enabled"}' \ + https://:/v1/bdbs/ ``` - Run [`crdb-cli crdb update`]({{}}): From f622a01c3f2dcb038b7804265ef573a804033a53 Mon Sep 17 00:00:00 2001 From: mich-elle-luna <153109578+mich-elle-luna@users.noreply.github.com> Date: Thu, 29 May 2025 09:29:48 -0700 Subject: [PATCH 9/9] Update content/operate/rs/databases/active-active/syncer.md Co-authored-by: Rachel Elledge <86307637+rrelledge@users.noreply.github.com> --- content/operate/rs/databases/active-active/syncer.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/content/operate/rs/databases/active-active/syncer.md b/content/operate/rs/databases/active-active/syncer.md index 64c72e7296..9b0ef68d6d 100644 --- a/content/operate/rs/databases/active-active/syncer.md +++ b/content/operate/rs/databases/active-active/syncer.md @@ -93,5 +93,5 @@ To restart an Active-Active database's syncer after an unrecoverable error, use ``` {{< note >}} -Replace ``, ``, ``, ``, and `` with your actual values. +Replace ``, ``, ``, ``, ``, and `` with your actual values. {{< /note >}} \ No newline at end of file