|
| 1 | +--- |
| 2 | +title: Troubleshoot Azure Operator Nexus Cluster Heartbeat Connection Status shows Disconnected |
| 3 | +description: Provide steps to investigate and possibly resolve circumstances that are preventing the Cluster from sending heartbeats to the Cluster Manager. |
| 4 | +ms.service: azure-operator-nexus |
| 5 | +ms.custom: troubleshooting |
| 6 | +ms.topic: troubleshooting |
| 7 | +ms.date: 10/09/2024 |
| 8 | +ms.author: omarrivera |
| 9 | +author: omarrivera |
| 10 | +--- |
| 11 | +# Troubleshoot Azure Operator Nexus Cluster Heartbeat Connection Status shows Disconnected |
| 12 | + |
| 13 | +This guide attempts to provide steps to troubleshoot a Cluster is shown to have `clusterConnectionStatus` with a value of `Disconnected`. |
| 14 | + |
| 15 | +> [!CAUTION] |
| 16 | +> The `ClusterConnectionStatus` is likely a symptom or signal and not the root cause and this guide will not be able to provide answers for all scenarios. |
| 17 | +> The focus and purpose of this guide is to provide common issues and signals that can be inspected to determine where the issue might be. |
| 18 | +## Understanding the Issue |
| 19 | + |
| 20 | +Cluster Managers ensure continuous Cluster network connectivity through a heartbeat agent running within the target Cluster. |
| 21 | +The cluster-heartbeat agent sends periodic HTTP messages to the Cluster Manager and expects an acknowledgment response as well. |
| 22 | +A Cluster has the property `ClusterConnectionStatus` which is set to the value `Connected` as the heartbeats are continuously received and acknowledged. |
| 23 | + |
| 24 | +The `ClusterConnectionStatus` becomes `Connected` once the cluster is in a healthy state and network connectivity issues are resolved. |
| 25 | +If the Cluster is expected to be healthy but the `ClusterConnectionStatus` remains in `Disconnected` state [contact support] after following the steps in this guide. |
| 26 | + |
| 27 | +> [!IMPORTANT] |
| 28 | +> `ClusterConnectionStatus` is **not** the same as Arc Connected Kubernetes Clusters. |
| 29 | +The command can be used to see the value of `ClsuterConnectionStatus` and it is visible in Azure Portal in the Cluster resource's JSON view. |
| 30 | + |
| 31 | +```azurecli |
| 32 | +az networkcloud cluster show --subscription "$SUBSCRIPTION_ID" -g "$CLUSTER_RG" -n "$CLUSTER_NAME" --output table --query "{ClusterConnectionStatus:clusterConnectionStatus}" |
| 33 | +ClusterConnectionStatus |
| 34 | +------------------------- |
| 35 | +Connected |
| 36 | +``` |
| 37 | + |
| 38 | +The following table shows which status is displayed depending on the state of the undercloud cluster: |
| 39 | + |
| 40 | +| Status | Definition | |
| 41 | +|----------------|-----------------------------------------------------------------------------------------------------------------------| |
| 42 | +| `Connected` | Heartbeats received, indicates healthy cluster and cluster manager connectivity | |
| 43 | +| `Disconnected` | Heartbeats missed for __over 5 minutes__, indicates likely connectivity issue between Cluster Manager and Cluster | |
| 44 | +| `Timeout` | Heartbeats missed for __over 2 minutes but less than 5 minutes__, cluster connectivity is uncertain possibly degraded | |
| 45 | +| `Undefined` | Cluster not yet deployed or running a version without the heartbeats feature | |
| 46 | + |
| 47 | +## Basic Investigation Steps |
| 48 | + |
| 49 | +### 1. Ensure Network Connectivity for the Cluster |
| 50 | + |
| 51 | +TODO - what steps could be done here? |
| 52 | + |
| 53 | +### Other possible causes to evaluate |
| 54 | + |
| 55 | +- Are there recent changes to the Managed Identity permissions for the Cluster Manager or Cluster? |
| 56 | + - The Managed Identities (MI) and their permissions are used for service-to-service authentication. A change in the permissions results in authentication failures for the heartbeat messages. Cluster Managers must both receive and acknowledge heartbeats failure to do so will also result in a `ClusterConnectionStatus` of `Disconnected`. |
| 57 | + |
| 58 | +[!include[stillHavingIssues](./includes/contact-support.md)] |
| 59 | + |
| 60 | +[contact support]: https://portal.azure.com/?#blade/Microsoft_Azure_Support/HelpAndSupportBlade |
0 commit comments