Merge pull request #242024 from MikeRayMSFT/20230619-troubleshoot-arc-sql-mi

prmerger-automator[bot] · web-flow · commit a0ab5c0c7ba8 · 2023-06-21T17:05:27.000Z
Audit failover group troubleshooting for Arc SQL MI
diff --git a/articles/azure-arc/data/troubleshoot-managed-instance.md b/articles/azure-arc/data/troubleshoot-managed-instance.md
@@ -109,7 +109,7 @@ kubectl -n $nameSpace get sqlmi $sqlmiName -o jsonpath-as-json='{.status}'
 
 **Results**
 
-The state should be `Ready`. If the value isn't `Ready`, you need to wait. If state is error, get the message field, collect logs, and contact support. See [Collecting the logs](#collecting-the-logs).
+The state should be `Ready`. If the value isn't `Ready`, you need to wait. If state is error, get the message field, collect logs, and contact support. See [Collect the logs](#collect-the-logs).
 
 ### Check the routing label for stateful set
 The routing label for stateful set is used to route external endpoint to a matched pod. The name of the label is `role.ag.mssql.microsoft.com`.
@@ -122,7 +122,7 @@ kubectl -n $nameSpace get pods $sqlmiName-2 -o jsonpath-as-json='{.metadata.labe
 
 **Results**
 
-If you didn't find primary, kill the pod that doesn't have any `role.ag.mssql.microsoft.com` label. If this doesn't resolve the issue, collect logs and contact support. See [Collecting the logs](#collecting-the-logs).
+If you didn't find primary, kill the pod that doesn't have any `role.ag.mssql.microsoft.com` label. If this doesn't resolve the issue, collect logs and contact support. See [Collect the logs](#collect-the-logs).
 
 ### Get Replica state from local container connection
 
@@ -138,7 +138,7 @@ kubectl exec -ti -n $nameSpace $sqlmiName-2 -c arc-sqlmi -- /opt/mssql-tools/bin
 
 All replicas should be connected & healthy. Here is the detailed description of the query results [sys.dm_hadr_availability_replica_states](/sql/relational-databases/system-dynamic-management-views/sys-dm-hadr-availability-replica-states-transact-sql).
 
-If you find it isn't synchronized or not connected unexpectedly, try to kill the pod which has the problem. If problem persists, collect logs and contact support. See [Collecting the logs](#collecting-the-logs).
+If you find it isn't synchronized or not connected unexpectedly, try to kill the pod which has the problem. If problem persists, collect logs and contact support. See [Collect the logs](#collect-the-logs).
 
 > [!NOTE]
 > If there are some large database in the instance, the seeding process to secondary could take a while. If this happens, wait for seeding to complete.
@@ -155,7 +155,7 @@ kubectl exec -ti -n $nameSpace $sqlmiName-2 -c arc-sqlmi -- /opt/mssql-tools/bin
 
 **Results**
 
-You should get `ServerName` from `Listener` of each replica. If you can't get `ServerName`, kill the pods which have the problem. If the problem persists after recovery, collect logs and contact support. See [Collecting the logs](#collecting-the-logs).
+You should get `ServerName` from `Listener` of each replica. If you can't get `ServerName`, kill the pods which have the problem. If the problem persists after recovery, collect logs and contact support. See [Collect the logs](#collect-the-logs).
 
 ### Check Kubernetes network connection
 
@@ -186,12 +186,9 @@ You should be able to connect to exposed external port (which has been confirmed
 
 You can use any client like `SqlCmd`, SQL Server Management Studio (SSMS), or Azure Data Studio (ADS) to test this out.
 
-## Collecting the logs
+## Connection between failover groups is lost
 
-If the previous steps all succeeded without any problem and you still can't log in, collect the logs and contact support
-
-### Connection between Failover groups is lost
-If the Failover groups between primary and geo-secondary Arc SQL Managed instances is configured to be in `sync` mode and the connection is lost for whatever reason for an extended period of time, then the logs on the primary Arc SQL managed instance cannot be truncated until the transactions are sent to the geo-secondary. This could lead to the logs filling up and potentially running out of space on the primary site. To break out of this situation, remove the failover groups and re-configure when the connection between the sites is re-established. 
+If the failover groups between primary and geo-secondary Arc SQL Managed instances is configured to be in `sync` mode and the connection is lost for whatever reason for an extended period of time, then the logs on the primary Arc SQL managed instance cannot be truncated until the transactions are sent to the geo-secondary. This could lead to the logs filling up and potentially running out of space on the primary site. To break out of this situation, remove the failover groups and re-configure when the connection between the sites is re-established. 
 
 The failover groups can be removed on both primary as well as secondary site as follows:
 
@@ -204,6 +201,10 @@ and if the data controller is deployed in `direct` mode, provide the `sharedname
 
 Once the failover group on the primary site is deleted, logs can be truncated to free up space.
 
+## Collect the logs
+
+If the previous steps all succeeded without any problem and you still can't log in, collect the logs and contact support
+
 ### Collection controller logs
 
 ```console