Skip to content

Commit da64b59

Browse files
authored
Merge pull request #230309 from MikeRayMSFT/20230310-arc-data-troubleshoot
20230310 arc data troubleshoot
2 parents d8b66d6 + 327ff6c commit da64b59

File tree

2 files changed

+220
-0
lines changed

2 files changed

+220
-0
lines changed

articles/azure-arc/data/toc.yml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -126,6 +126,8 @@ items:
126126
href: troubleshoot-guide.md
127127
- name: Get logs
128128
href: troubleshooting-get-logs.md
129+
- name: Troubleshoot deployments
130+
href: troubleshoot-managed-instance.md
129131
- name: Azure Arc-enabled SQL Managed Instance
130132
items:
131133
- name: Overview
Lines changed: 218 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,218 @@
1+
---
2+
title: Troubleshoot connection to failover group - Azure Arc-enabled SQL Managed Instance
3+
description: Describes how to troubleshoot issues with connections to failover group resources in Azure Arc-enabled data services
4+
author: MikeRayMSFT
5+
ms.author: mikeray
6+
ms.topic: troubleshooting-general
7+
ms.date: 03/15/2023
8+
---
9+
10+
# Troubleshoot Azure Arc-enabled SQL Managed Instance deployments
11+
12+
This article identifies potential issues, and describes how to diagnose root causes for these issues for deployments of Azure Arc-enabled data services.
13+
14+
## Connection to Azure Arc-enabled SQL Managed Instance failover group
15+
16+
This section describes how to troubleshoot issues connecting to a failover group.
17+
18+
### Check failover group connections & synchronization state
19+
20+
```console
21+
kubectl -n $nameSpace get fog $fogName -o jsonpath-as-json='{.status}'
22+
```
23+
24+
**Results**:
25+
26+
On each side, there are two replicas for one failover group. Check the value of `connectedState`, and `synchronizationState` for each replica.
27+
28+
If one of `connectedState` isn't equal to `CONNECTED`, see the instructions under [Check parameters](#check-parameters).
29+
30+
If one of `synchronizationState` isn't equal to `HEALTHY`, focus on the instance which `synchronizationState` isn't equal to `HEALTHY`". Refer to [Can't connect to Arc-enabled SQL Managed Instance](#cant-connect-to-arc-enabled-sql-managed-instance) for how to debug.
31+
32+
### Check parameters
33+
34+
On both geo-primary and geo-secondary, check failover spec against `$sqlmiName` instance on other side.
35+
36+
### Command on local
37+
38+
Run the following command against the local instance to get the spec for the local instance.
39+
40+
```console
41+
kubectl -n $nameSpace get fog $fogName -o jsonpath-as-json='{.spec}'
42+
```
43+
44+
### Command on remote
45+
46+
Run the following command against the remote instance:
47+
48+
```console
49+
kubectl -n $nameSpace get sqlmi $sqlmiName -o jsonpath-as-json='{.status.highAvailability.mirroringCertificate}'
50+
kubectl -n $nameSpace get sqlmi $sqlmiName -o jsonpath-as-json='{.status.endpoints.mirroring}'
51+
```
52+
53+
**Results**:
54+
55+
Compare the results from the remote instance with the results from the local instance.
56+
57+
* `partnerMirroringURL`, and `partnerMirroringCert` from the local instance has to match remote instance values from:
58+
* `kubectl -n $nameSpace get sqlmi $sqlmiName -o jsonpath-as-json='{.status.endpoints.mirroring}'`
59+
* `kubectl -n $nameSpace get sqlmi $sqlmiName -o jsonpath-as-json='{.status.highAvailability.mirroringCertificate}'`
60+
61+
* `partnerMI` from `kubectl -n $nameSpace get fog $fogName -o jsonpath-as-json='{.spec}'` has to match with `$sqlmiName` from remote instance.
62+
63+
* `sharedName` from `kubectl -n $nameSpace get fog $fogName -o jsonpath-as-json='{.spec}'` is optional. If it isn't presented, it's same as `sourceMI`. The `sharedName` from both site should be same if presented.
64+
65+
* Role from `kubectl -n $nameSpace get fog $fogName -o jsonpath-as-json='{.spec}'` should be different between two sites. One side should be primary, other should be secondary.
66+
67+
If any one of values described doesn't match the comparison, delete failover group on both sites and re-create.
68+
69+
If nothing is wrong, follow the instructions under [Check mirroring endpoints for both sides](#check-mirroring-endpoints-for-both-sides).
70+
71+
### Check mirroring endpoints for both sides
72+
73+
On both geo-primary and geo-secondary, checks external mirroring endpoint is exposed by following commands.
74+
75+
```console
76+
kubectl -n test get services $sqlmiName-external-svc -o jsonpath-as-json='{.spec.ports}'
77+
```
78+
79+
**Results**
80+
81+
* `port-mssql-mirroring` should be presented on the list. The failover group on the other side should use the same value for `partnerMirroringURL`. If the values don't match, correct the mistake and retry from the beginning.
82+
83+
### Verify SQL Server can reach external endpoint of another site
84+
85+
Although you can't ping mirroring endpoint of another site directly, use the following command to reach another side external endpoint of the SQL Server tabular data stream (TDS) port.
86+
87+
```console
88+
kubectl exec -ti -n $nameSpace $sqlmiName-0 -c arc-sqlmi -- /opt/mssql-tools/bin/sqlcmd -S $remotePrimaryEndpoint -U $remoteUser -P $remotePassword -Q "SELECT @@ServerName"
89+
```
90+
91+
**Results**
92+
93+
If SQL server can use external endpoint TDS, there is a good chance it can reach external mirroring endpoint because they are defined and activated in the same service, specifically `$sqlmiName-external-svc`.
94+
95+
## Can't connect to Arc-enabled SQL Managed Instance
96+
97+
This section identifies specific steps you can take to troubleshoot connections to Azure Arc-enabled SQL managed instances.
98+
99+
> [!NOTE]
100+
> You can't connect to an Azure Arc-enabled SQL Managed Instance if the instance license type is `DisasterRecovery`.
101+
102+
### Check the managed instance status
103+
104+
SQL Managed Instance (SQLMI) status info indicates if the instance is ready or not.
105+
106+
```console
107+
kubectl -n $nameSpace get sqlmi $sqlmiName -o jsonpath-as-json='{.status}'
108+
```
109+
110+
**Results**
111+
112+
The state should be `Ready`. If the value isn't `Ready`, you need to wait. If state is error, get the message field, collect logs, and contact support. See [Collecting the logs](#collecting-the-logs).
113+
114+
### Check the routing label for stateful set
115+
The routing label for stateful set is used to route external endpoint to a matched pod. The name of the label is `role.ag.mssql.microsoft.com`.
116+
117+
```console
118+
kubectl -n $nameSpace get pods $sqlmiName-0 -o jsonpath-as-json='{.metadata.labels}'
119+
kubectl -n $nameSpace get pods $sqlmiName-1 -o jsonpath-as-json='{.metadata.labels}'
120+
kubectl -n $nameSpace get pods $sqlmiName-2 -o jsonpath-as-json='{.metadata.labels}'
121+
```
122+
123+
**Results**
124+
125+
If you didn't find primary, kill the pod that doesn't have any `role.ag.mssql.microsoft.com` label. If this doesn't resolve the issue, collect logs and contact support. See [Collecting the logs](#collecting-the-logs).
126+
127+
### Get Replica state from local container connection
128+
129+
Use `localhost,1533` to connect sql in each replica of `statefulset`. This connection should always succeed. Use this connection to query the SQL HA replica state.
130+
131+
```console
132+
kubectl exec -ti -n $nameSpace $sqlmiName-0 -c arc-sqlmi -- /opt/mssql-tools/bin/sqlcmd -S localhost,1533 -U $User -P $Password -Q "SELECT * FROM sys.dm_hadr_availability_replica_states"
133+
kubectl exec -ti -n $nameSpace $sqlmiName-1 -c arc-sqlmi -- /opt/mssql-tools/bin/sqlcmd -S localhost,1533 -U $User -P $Password -Q "SELECT * FROM sys.dm_hadr_availability_replica_states"
134+
kubectl exec -ti -n $nameSpace $sqlmiName-2 -c arc-sqlmi -- /opt/mssql-tools/bin/sqlcmd -S localhost,1533 -U $User -P $Password -Q "SELECT * FROM sys.dm_hadr_availability_replica_states"
135+
```
136+
137+
**Results**
138+
139+
All replicas should be connected & healthy. Here is the detailed description of the query results [sys.dm_hadr_availability_replica_states](/sql/relational-databases/system-dynamic-management-views/sys-dm-hadr-availability-replica-states-transact-sql).
140+
141+
If you find it isn't synchronized or not connected unexpectedly, try to kill the pod which has the problem. If problem persists, collect logs and contact support. See [Collecting the logs](#collecting-the-logs).
142+
143+
> [!NOTE]
144+
> If there are some large database in the instance, the seeding process to secondary could take a while. If this happens, wait for seeding to complete.
145+
146+
## Check SQLMI SQL engine listener
147+
148+
SQL engine listener is the component which routes connections to the failover group.
149+
150+
```console
151+
kubectl exec -ti -n $nameSpace $sqlmiName-0 -c arc-sqlmi -- /opt/mssql-tools/bin/sqlcmd -S localhost,1433 -U $User -P $Password -Q "SELECT @@ServerName"
152+
kubectl exec -ti -n $nameSpace $sqlmiName-1 -c arc-sqlmi -- /opt/mssql-tools/bin/sqlcmd -S localhost,1433 -U $User -P $Password -Q "SELECT @@ServerName"
153+
kubectl exec -ti -n $nameSpace $sqlmiName-2 -c arc-sqlmi -- /opt/mssql-tools/bin/sqlcmd -S localhost,1433 -U $User -P $Password -Q "SELECT @@ServerName"
154+
```
155+
156+
**Results**
157+
158+
You should get `ServerName` from `Listener` of each replica. If you can't get `ServerName`, kill the pods which have the problem. If the problem persists after recovery, collect logs and contact support. See [Collecting the logs](#collecting-the-logs).
159+
160+
### Check Kubernetes network connection
161+
162+
Inside Kubernetes cluster, there is kubernetes network on top which allow communication between pods and routing. Check if SQLMI pods can communicate with each other via cluster IP. Run this for all the replicas.
163+
164+
165+
```console
166+
kubectl exec -ti -n $nameSpace $sqlmiName-0 -c arc-sqlmi -- /opt/mssql-tools/bin/sqlcmd -S $(kubectl -n test get service $sqlmiName-p-svc -o jsonpath={'.spec.clusterIP'}),1533 -U $User -P $Password -Q "SELECT @@ServerName"
167+
```
168+
169+
**Results**
170+
171+
You should be able to reach any Cluster IP address for the pods of stateful set from another pod. If this isn't the case, refer to [Kubernetes documentation - Cluster networking](https://kubernetes.io/docs/concepts/cluster-administration/networking/) for detailed information or get service provider to resolve the issue.
172+
173+
### Check the Kubernetes load balancer or `nodeport` services
174+
175+
Load balancer or `nodeport` services are the services that expose a service port to the external network.
176+
177+
```console
178+
kubectl -n $nameSpace expose pod $sqlmiName-0 --port=1533 --name=ha-$sqlmiName-0 --type=LoadBalancer
179+
kubectl -n $nameSpace expose pod $sqlmiName-1 --port=1533 --name=ha-$sqlmiName-1 --type=LoadBalancer
180+
kubectl -n $nameSpace expose pod $sqlmiName-2 --port=1533 --name=ha-$sqlmiName-2 --type=LoadBalancer
181+
```
182+
183+
**Results**
184+
185+
You should be able to connect to exposed external port (which has been confirmed from internal at step 3). If you can't connect to external port, refer to [Kubernetes documentation - Create an external load balancer](https://kubernetes.io/docs/tasks/access-application-cluster/create-external-load-balancer/) and get service provider help on the issues.
186+
187+
You can use any client like `SqlCmd`, SQL Server Management Studio (SSMS), or Azure Data Studio (ADS) to test this out.
188+
189+
## Collecting the logs
190+
191+
If the previous steps all succeeded without any problem and you still can't log in, collect the logs and contact support
192+
193+
### Collection controller logs
194+
195+
```console
196+
MyController=$(kubectl -n $nameSpace get pods --selector=app=controller -o jsonpath='{.items[*].metadata.name}')
197+
kubectl -n $nameSpace cp $MyController:/var/log/controller $localFolder/controller -c controller
198+
```
199+
200+
### Get SQL Server and supervisor logs for each replica
201+
202+
Run the following command for each replica to get SQL Server and supervisor logs
203+
204+
```console
205+
kubectl -n $nameSpace cp $sqlmiName-0:/var/opt/mssql/log $localFolder/$sqlmiName-0/log -c arc-sqlmi
206+
kubectl -n $nameSpace cp $sqlmiName-0:/var/log/arc-ha-supervisor $localFolder/$sqlmiName-0/arc-ha-supervisor -c arc-ha-supervisor
207+
```
208+
209+
### Get orchestrator logs
210+
211+
```console
212+
kubectl -n $nameSpace cp $sqlmiName-ha-0:/var/log $localFolder/$sqlmiName-ha-0/log -c arc-ha-orchestrator
213+
```
214+
215+
216+
## Next steps
217+
218+
[Get logs to troubleshoot Azure Arc-enabled data services](troubleshooting-get-logs.md)

0 commit comments

Comments
 (0)