|
| 1 | +--- |
| 2 | +title: Troubleshoot connection to failover group - Azure Arc-enabled SQL Managed Instance |
| 3 | +description: Describes how to troubleshoot issues with connections to failover group resources in Azure Arc-enabled data services |
| 4 | +author: MikeRayMSFT |
| 5 | +ms.author: mikeray |
| 6 | +ms.topic: troubleshooting-general |
| 7 | +ms.date: 03/15/2023 |
| 8 | +--- |
| 9 | + |
| 10 | +# Troubleshoot Azure Arc-enabled SQL Managed Instance deployments |
| 11 | + |
| 12 | +This article identifies potential issues, and describes how to diagnose root causes for these issues for deployments of Azure Arc-enabled data services. |
| 13 | + |
| 14 | +## Connection to Azure Arc-enabled SQL Managed Instance failover group |
| 15 | + |
| 16 | +This section describes how to troubleshoot issues connecting to a failover group. |
| 17 | + |
| 18 | +### Check failover group connections & synchronization state |
| 19 | + |
| 20 | +```console |
| 21 | +kubectl -n $nameSpace get fog $fogName -o jsonpath-as-json='{.status}' |
| 22 | +``` |
| 23 | + |
| 24 | +**Results**: |
| 25 | + |
| 26 | +On each side, there are two replicas for one failover group. Check the value of `connectedState`, and `synchronizationState` for each replica. |
| 27 | + |
| 28 | +If one of `connectedState` isn't equal to `CONNECTED`, see the instructions under [Check parameters](#check-parameters). |
| 29 | + |
| 30 | +If one of `synchronizationState` isn't equal to `HEALTHY`, focus on the instance which `synchronizationState` isn't equal to `HEALTHY`". Refer to [Can't connect to Arc-enabled SQL Managed Instance](#cant-connect-to-arc-enabled-sql-managed-instance) for how to debug. |
| 31 | + |
| 32 | +### Check parameters |
| 33 | + |
| 34 | +On both geo-primary and geo-secondary, check failover spec against `$sqlmiName` instance on other side. |
| 35 | + |
| 36 | +### Command on local |
| 37 | + |
| 38 | +Run the following command against the local instance to get the spec for the local instance. |
| 39 | + |
| 40 | +```console |
| 41 | +kubectl -n $nameSpace get fog $fogName -o jsonpath-as-json='{.spec}' |
| 42 | +``` |
| 43 | + |
| 44 | +### Command on remote |
| 45 | + |
| 46 | +Run the following command against the remote instance: |
| 47 | + |
| 48 | +```console |
| 49 | +kubectl -n $nameSpace get sqlmi $sqlmiName -o jsonpath-as-json='{.status.highAvailability.mirroringCertificate}' |
| 50 | +kubectl -n $nameSpace get sqlmi $sqlmiName -o jsonpath-as-json='{.status.endpoints.mirroring}' |
| 51 | +``` |
| 52 | + |
| 53 | +**Results**: |
| 54 | + |
| 55 | +Compare the results from the remote instance with the results from the local instance. |
| 56 | + |
| 57 | +* `partnerMirroringURL`, and `partnerMirroringCert` from the local instance has to match remote instance values from: |
| 58 | + * `kubectl -n $nameSpace get sqlmi $sqlmiName -o jsonpath-as-json='{.status.endpoints.mirroring}'` |
| 59 | + * `kubectl -n $nameSpace get sqlmi $sqlmiName -o jsonpath-as-json='{.status.highAvailability.mirroringCertificate}'` |
| 60 | + |
| 61 | +* `partnerMI` from `kubectl -n $nameSpace get fog $fogName -o jsonpath-as-json='{.spec}'` has to match with `$sqlmiName` from remote instance. |
| 62 | + |
| 63 | +* `sharedName` from `kubectl -n $nameSpace get fog $fogName -o jsonpath-as-json='{.spec}'` is optional. If it isn't presented, it's same as `sourceMI`. The `sharedName` from both site should be same if presented. |
| 64 | + |
| 65 | +* Role from `kubectl -n $nameSpace get fog $fogName -o jsonpath-as-json='{.spec}'` should be different between two sites. One side should be primary, other should be secondary. |
| 66 | + |
| 67 | +If any one of values described doesn't match the comparison, delete failover group on both sites and re-create. |
| 68 | + |
| 69 | +If nothing is wrong, follow the instructions under [Check mirroring endpoints for both sides](#check-mirroring-endpoints-for-both-sides). |
| 70 | + |
| 71 | +### Check mirroring endpoints for both sides |
| 72 | + |
| 73 | +On both geo-primary and geo-secondary, checks external mirroring endpoint is exposed by following commands. |
| 74 | + |
| 75 | +```console |
| 76 | +kubectl -n test get services $sqlmiName-external-svc -o jsonpath-as-json='{.spec.ports}' |
| 77 | +``` |
| 78 | + |
| 79 | +**Results** |
| 80 | + |
| 81 | +* `port-mssql-mirroring` should be presented on the list. The failover group on the other side should use the same value for `partnerMirroringURL`. If the values don't match, correct the mistake and retry from the beginning. |
| 82 | + |
| 83 | +### Verify SQL Server can reach external endpoint of another site |
| 84 | + |
| 85 | +Although you can't ping mirroring endpoint of another site directly, use the following command to reach another side external endpoint of the SQL Server tabular data stream (TDS) port. |
| 86 | + |
| 87 | +```console |
| 88 | +kubectl exec -ti -n $nameSpace $sqlmiName-0 -c arc-sqlmi -- /opt/mssql-tools/bin/sqlcmd -S $remotePrimaryEndpoint -U $remoteUser -P $remotePassword -Q "SELECT @@ServerName" |
| 89 | +``` |
| 90 | + |
| 91 | +**Results** |
| 92 | + |
| 93 | +If SQL server can use external endpoint TDS, there is a good chance it can reach external mirroring endpoint because they are defined and activated in the same service, specifically `$sqlmiName-external-svc`. |
| 94 | + |
| 95 | +## Can't connect to Arc-enabled SQL Managed Instance |
| 96 | + |
| 97 | +This section identifies specific steps you can take to troubleshoot connections to Azure Arc-enabled SQL managed instances. |
| 98 | + |
| 99 | +> [!NOTE] |
| 100 | +> You can't connect to an Azure Arc-enabled SQL Managed Instance if the instance license type is `DisasterRecovery`. |
| 101 | +
|
| 102 | +### Check the managed instance status |
| 103 | + |
| 104 | +SQL Managed Instance (SQLMI) status info indicates if the instance is ready or not. |
| 105 | + |
| 106 | +```console |
| 107 | +kubectl -n $nameSpace get sqlmi $sqlmiName -o jsonpath-as-json='{.status}' |
| 108 | +``` |
| 109 | + |
| 110 | +**Results** |
| 111 | + |
| 112 | +The state should be `Ready`. If the value isn't `Ready`, you need to wait. If state is error, get the message field, collect logs, and contact support. See [Collecting the logs](#collecting-the-logs). |
| 113 | + |
| 114 | +### Check the routing label for stateful set |
| 115 | +The routing label for stateful set is used to route external endpoint to a matched pod. The name of the label is `role.ag.mssql.microsoft.com`. |
| 116 | + |
| 117 | +```console |
| 118 | +kubectl -n $nameSpace get pods $sqlmiName-0 -o jsonpath-as-json='{.metadata.labels}' |
| 119 | +kubectl -n $nameSpace get pods $sqlmiName-1 -o jsonpath-as-json='{.metadata.labels}' |
| 120 | +kubectl -n $nameSpace get pods $sqlmiName-2 -o jsonpath-as-json='{.metadata.labels}' |
| 121 | +``` |
| 122 | + |
| 123 | +**Results** |
| 124 | + |
| 125 | +If you didn't find primary, kill the pod that doesn't have any `role.ag.mssql.microsoft.com` label. If this doesn't resolve the issue, collect logs and contact support. See [Collecting the logs](#collecting-the-logs). |
| 126 | + |
| 127 | +### Get Replica state from local container connection |
| 128 | + |
| 129 | +Use `localhost,1533` to connect sql in each replica of `statefulset`. This connection should always succeed. Use this connection to query the SQL HA replica state. |
| 130 | + |
| 131 | +```console |
| 132 | +kubectl exec -ti -n $nameSpace $sqlmiName-0 -c arc-sqlmi -- /opt/mssql-tools/bin/sqlcmd -S localhost,1533 -U $User -P $Password -Q "SELECT * FROM sys.dm_hadr_availability_replica_states" |
| 133 | +kubectl exec -ti -n $nameSpace $sqlmiName-1 -c arc-sqlmi -- /opt/mssql-tools/bin/sqlcmd -S localhost,1533 -U $User -P $Password -Q "SELECT * FROM sys.dm_hadr_availability_replica_states" |
| 134 | +kubectl exec -ti -n $nameSpace $sqlmiName-2 -c arc-sqlmi -- /opt/mssql-tools/bin/sqlcmd -S localhost,1533 -U $User -P $Password -Q "SELECT * FROM sys.dm_hadr_availability_replica_states" |
| 135 | +``` |
| 136 | + |
| 137 | +**Results** |
| 138 | + |
| 139 | +All replicas should be connected & healthy. Here is the detailed description of the query results [sys.dm_hadr_availability_replica_states](/sql/relational-databases/system-dynamic-management-views/sys-dm-hadr-availability-replica-states-transact-sql). |
| 140 | + |
| 141 | +If you find it isn't synchronized or not connected unexpectedly, try to kill the pod which has the problem. If problem persists, collect logs and contact support. See [Collecting the logs](#collecting-the-logs). |
| 142 | + |
| 143 | +> [!NOTE] |
| 144 | +> If there are some large database in the instance, the seeding process to secondary could take a while. If this happens, wait for seeding to complete. |
| 145 | +
|
| 146 | +## Check SQLMI SQL engine listener |
| 147 | + |
| 148 | +SQL engine listener is the component which routes connections to the failover group. |
| 149 | + |
| 150 | +```console |
| 151 | +kubectl exec -ti -n $nameSpace $sqlmiName-0 -c arc-sqlmi -- /opt/mssql-tools/bin/sqlcmd -S localhost,1433 -U $User -P $Password -Q "SELECT @@ServerName" |
| 152 | +kubectl exec -ti -n $nameSpace $sqlmiName-1 -c arc-sqlmi -- /opt/mssql-tools/bin/sqlcmd -S localhost,1433 -U $User -P $Password -Q "SELECT @@ServerName" |
| 153 | +kubectl exec -ti -n $nameSpace $sqlmiName-2 -c arc-sqlmi -- /opt/mssql-tools/bin/sqlcmd -S localhost,1433 -U $User -P $Password -Q "SELECT @@ServerName" |
| 154 | +``` |
| 155 | + |
| 156 | +**Results** |
| 157 | + |
| 158 | +You should get `ServerName` from `Listener` of each replica. If you can't get `ServerName`, kill the pods which have the problem. If the problem persists after recovery, collect logs and contact support. See [Collecting the logs](#collecting-the-logs). |
| 159 | + |
| 160 | +### Check Kubernetes network connection |
| 161 | + |
| 162 | +Inside Kubernetes cluster, there is kubernetes network on top which allow communication between pods and routing. Check if SQLMI pods can communicate with each other via cluster IP. Run this for all the replicas. |
| 163 | + |
| 164 | + |
| 165 | +```console |
| 166 | +kubectl exec -ti -n $nameSpace $sqlmiName-0 -c arc-sqlmi -- /opt/mssql-tools/bin/sqlcmd -S $(kubectl -n test get service $sqlmiName-p-svc -o jsonpath={'.spec.clusterIP'}),1533 -U $User -P $Password -Q "SELECT @@ServerName" |
| 167 | +``` |
| 168 | + |
| 169 | +**Results** |
| 170 | + |
| 171 | +You should be able to reach any Cluster IP address for the pods of stateful set from another pod. If this isn't the case, refer to [Kubernetes documentation - Cluster networking](https://kubernetes.io/docs/concepts/cluster-administration/networking/) for detailed information or get service provider to resolve the issue. |
| 172 | + |
| 173 | +### Check the Kubernetes load balancer or `nodeport` services |
| 174 | + |
| 175 | +Load balancer or `nodeport` services are the services that expose a service port to the external network. |
| 176 | + |
| 177 | +```console |
| 178 | +kubectl -n $nameSpace expose pod $sqlmiName-0 --port=1533 --name=ha-$sqlmiName-0 --type=LoadBalancer |
| 179 | +kubectl -n $nameSpace expose pod $sqlmiName-1 --port=1533 --name=ha-$sqlmiName-1 --type=LoadBalancer |
| 180 | +kubectl -n $nameSpace expose pod $sqlmiName-2 --port=1533 --name=ha-$sqlmiName-2 --type=LoadBalancer |
| 181 | +``` |
| 182 | + |
| 183 | +**Results** |
| 184 | + |
| 185 | +You should be able to connect to exposed external port (which has been confirmed from internal at step 3). If you can't connect to external port, refer to [Kubernetes documentation - Create an external load balancer](https://kubernetes.io/docs/tasks/access-application-cluster/create-external-load-balancer/) and get service provider help on the issues. |
| 186 | + |
| 187 | +You can use any client like `SqlCmd`, SQL Server Management Studio (SSMS), or Azure Data Studio (ADS) to test this out. |
| 188 | + |
| 189 | +## Collecting the logs |
| 190 | + |
| 191 | +If the previous steps all succeeded without any problem and you still can't log in, collect the logs and contact support |
| 192 | + |
| 193 | +### Collection controller logs |
| 194 | + |
| 195 | +```console |
| 196 | +MyController=$(kubectl -n $nameSpace get pods --selector=app=controller -o jsonpath='{.items[*].metadata.name}') |
| 197 | +kubectl -n $nameSpace cp $MyController:/var/log/controller $localFolder/controller -c controller |
| 198 | +``` |
| 199 | + |
| 200 | +### Get SQL Server and supervisor logs for each replica |
| 201 | + |
| 202 | +Run the following command for each replica to get SQL Server and supervisor logs |
| 203 | + |
| 204 | +```console |
| 205 | +kubectl -n $nameSpace cp $sqlmiName-0:/var/opt/mssql/log $localFolder/$sqlmiName-0/log -c arc-sqlmi |
| 206 | +kubectl -n $nameSpace cp $sqlmiName-0:/var/log/arc-ha-supervisor $localFolder/$sqlmiName-0/arc-ha-supervisor -c arc-ha-supervisor |
| 207 | +``` |
| 208 | + |
| 209 | +### Get orchestrator logs |
| 210 | + |
| 211 | +```console |
| 212 | +kubectl -n $nameSpace cp $sqlmiName-ha-0:/var/log $localFolder/$sqlmiName-ha-0/log -c arc-ha-orchestrator |
| 213 | +``` |
| 214 | + |
| 215 | + |
| 216 | +## Next steps |
| 217 | + |
| 218 | +[Get logs to troubleshoot Azure Arc-enabled data services](troubleshooting-get-logs.md) |
0 commit comments