Skip to content

Commit a0ab5c0

Browse files
Merge pull request #242024 from MikeRayMSFT/20230619-troubleshoot-arc-sql-mi
Audit failover group troubleshooting for Arc SQL MI
2 parents de488dc + 21896e7 commit a0ab5c0

File tree

1 file changed

+10
-9
lines changed

1 file changed

+10
-9
lines changed

articles/azure-arc/data/troubleshoot-managed-instance.md

Lines changed: 10 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -109,7 +109,7 @@ kubectl -n $nameSpace get sqlmi $sqlmiName -o jsonpath-as-json='{.status}'
109109

110110
**Results**
111111

112-
The state should be `Ready`. If the value isn't `Ready`, you need to wait. If state is error, get the message field, collect logs, and contact support. See [Collecting the logs](#collecting-the-logs).
112+
The state should be `Ready`. If the value isn't `Ready`, you need to wait. If state is error, get the message field, collect logs, and contact support. See [Collect the logs](#collect-the-logs).
113113

114114
### Check the routing label for stateful set
115115
The routing label for stateful set is used to route external endpoint to a matched pod. The name of the label is `role.ag.mssql.microsoft.com`.
@@ -122,7 +122,7 @@ kubectl -n $nameSpace get pods $sqlmiName-2 -o jsonpath-as-json='{.metadata.labe
122122

123123
**Results**
124124

125-
If you didn't find primary, kill the pod that doesn't have any `role.ag.mssql.microsoft.com` label. If this doesn't resolve the issue, collect logs and contact support. See [Collecting the logs](#collecting-the-logs).
125+
If you didn't find primary, kill the pod that doesn't have any `role.ag.mssql.microsoft.com` label. If this doesn't resolve the issue, collect logs and contact support. See [Collect the logs](#collect-the-logs).
126126

127127
### Get Replica state from local container connection
128128

@@ -138,7 +138,7 @@ kubectl exec -ti -n $nameSpace $sqlmiName-2 -c arc-sqlmi -- /opt/mssql-tools/bin
138138

139139
All replicas should be connected & healthy. Here is the detailed description of the query results [sys.dm_hadr_availability_replica_states](/sql/relational-databases/system-dynamic-management-views/sys-dm-hadr-availability-replica-states-transact-sql).
140140

141-
If you find it isn't synchronized or not connected unexpectedly, try to kill the pod which has the problem. If problem persists, collect logs and contact support. See [Collecting the logs](#collecting-the-logs).
141+
If you find it isn't synchronized or not connected unexpectedly, try to kill the pod which has the problem. If problem persists, collect logs and contact support. See [Collect the logs](#collect-the-logs).
142142

143143
> [!NOTE]
144144
> If there are some large database in the instance, the seeding process to secondary could take a while. If this happens, wait for seeding to complete.
@@ -155,7 +155,7 @@ kubectl exec -ti -n $nameSpace $sqlmiName-2 -c arc-sqlmi -- /opt/mssql-tools/bin
155155

156156
**Results**
157157

158-
You should get `ServerName` from `Listener` of each replica. If you can't get `ServerName`, kill the pods which have the problem. If the problem persists after recovery, collect logs and contact support. See [Collecting the logs](#collecting-the-logs).
158+
You should get `ServerName` from `Listener` of each replica. If you can't get `ServerName`, kill the pods which have the problem. If the problem persists after recovery, collect logs and contact support. See [Collect the logs](#collect-the-logs).
159159

160160
### Check Kubernetes network connection
161161

@@ -186,12 +186,9 @@ You should be able to connect to exposed external port (which has been confirmed
186186

187187
You can use any client like `SqlCmd`, SQL Server Management Studio (SSMS), or Azure Data Studio (ADS) to test this out.
188188

189-
## Collecting the logs
189+
## Connection between failover groups is lost
190190

191-
If the previous steps all succeeded without any problem and you still can't log in, collect the logs and contact support
192-
193-
### Connection between Failover groups is lost
194-
If the Failover groups between primary and geo-secondary Arc SQL Managed instances is configured to be in `sync` mode and the connection is lost for whatever reason for an extended period of time, then the logs on the primary Arc SQL managed instance cannot be truncated until the transactions are sent to the geo-secondary. This could lead to the logs filling up and potentially running out of space on the primary site. To break out of this situation, remove the failover groups and re-configure when the connection between the sites is re-established.
191+
If the failover groups between primary and geo-secondary Arc SQL Managed instances is configured to be in `sync` mode and the connection is lost for whatever reason for an extended period of time, then the logs on the primary Arc SQL managed instance cannot be truncated until the transactions are sent to the geo-secondary. This could lead to the logs filling up and potentially running out of space on the primary site. To break out of this situation, remove the failover groups and re-configure when the connection between the sites is re-established.
195192

196193
The failover groups can be removed on both primary as well as secondary site as follows:
197194

@@ -204,6 +201,10 @@ and if the data controller is deployed in `direct` mode, provide the `sharedname
204201

205202
Once the failover group on the primary site is deleted, logs can be truncated to free up space.
206203

204+
## Collect the logs
205+
206+
If the previous steps all succeeded without any problem and you still can't log in, collect the logs and contact support
207+
207208
### Collection controller logs
208209

209210
```console

0 commit comments

Comments
 (0)