Skip to content

Commit 03118e0

Browse files
committed
adding updates to the cluster heartbeat connection status
1 parent ebca591 commit 03118e0

File tree

4 files changed

+59
-29
lines changed

4 files changed

+59
-29
lines changed

articles/operator-nexus/TOC.yml

Lines changed: 12 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -310,7 +310,6 @@
310310
href: howto-kubernetes-cluster-install-microsoft-defender.md
311311
- name: Kubernetes cluster features
312312
href: howto-kubernetes-cluster-features.md
313-
314313
- name: Nexus Virtual Machine
315314
expanded: false
316315
items:
@@ -378,7 +377,18 @@
378377
href: troubleshoot-dns-issues.md
379378
- name: Troubleshoot TWAMP (UDP) not working
380379
href: troubleshoot-twamp-udp-not-working.md
381-
- name: Cluster or BMM
380+
- name: Cluster
381+
expanded: false
382+
items:
383+
- name: Troubleshoot Accepted Cluster Resource
384+
href: troubleshoot-accepted-cluster-hydration.md
385+
- name: Troubleshoot Control Plane Quorum
386+
href: troubleshoot-control-plane-quorum.md
387+
- name: Troubleshoot ETCD Cluster quorum loss and recovery
388+
href: troubleshoot-etcd-cluster-possible-quorum-lost.md
389+
- name: Troubleshoot Cluster heartbeat connection status disconnected
390+
href: troubleshoot-cluster-heartbeat-connection-status-disconnected.md
391+
- name: Bare Metal Machine
382392
expanded: false
383393
items:
384394
- name: Troubleshoot Bare Metal Server Problems
@@ -391,14 +401,8 @@
391401
href: troubleshoot-bare-metal-machine-degraded.md
392402
- name: Troubleshoot Warning status
393403
href: troubleshoot-bare-metal-machine-warning.md
394-
- name: Troubleshoot Control Plane Quorum
395-
href: troubleshoot-control-plane-quorum.md
396-
- name: Troubleshoot Accepted Cluster Resource
397-
href: troubleshoot-accepted-cluster-hydration.md
398404
- name: Troubleshoot Out of Memory Pods
399405
href: troubleshoot-memory-limits.md
400-
- name: Troubleshoot Cluster heartbeat connection status disconnected
401-
href: troubleshoot-cluster-heartbeat-connection-status-disconnected.md
402406
- name: Troubleshoot Bare Metal Machine in not ready state
403407
href: troubleshoot-bare-metal-machine-not-ready-state.md
404408
- name: Tenant Workload

articles/operator-nexus/includes/contact-support.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,10 +1,11 @@
11
---
22
author: omarrivera
33
ms.author: omarrivera
4-
ms.date: 10/09/2024
4+
ms.date: 04/28/2025
55
ms.topic: include
66
ms.service: azure-operator-nexus
77
---
8+
89
## Still Having Issues?
910

1011
If the steps outlined didn't provide a path to resolve the issue or if you still have questions [contact support].
Loading

articles/operator-nexus/troubleshoot-cluster-heartbeat-connection-status-disconnected.md

Lines changed: 45 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -4,36 +4,38 @@ description: Provide steps to investigate and possibly resolve circumstances tha
44
ms.service: azure-operator-nexus
55
ms.custom: troubleshooting
66
ms.topic: troubleshooting
7-
ms.date: 10/09/2024
7+
ms.date: 04/28/2025
88
ms.author: omarrivera
99
author: omarrivera
1010
---
1111
# Troubleshoot Azure Operator Nexus Cluster Heartbeat Connection Status shows Disconnected
1212

13-
This guide attempts to provide steps to troubleshoot a Cluster is shown to have `clusterConnectionStatus` with a value of `Disconnected`.
13+
This guide attempts to provide steps to troubleshoot a Cluster with a `clusterConnectionStatus` in `Disconnected` state.
14+
For a Cluster, the `ClusterConnectionStatus` represents the stability in the connection between the on-premises Cluster and its ability to reach the Cluster Manager.
15+
16+
> [!IMPORTANT]
17+
> The `ClusterConnectionStatus` **doesn't** represent or is related to the health or connectivity of the Arc Connected Kubernetes Cluster.
18+
> The `ClusterConnectionStatus` indicates that the Cluster is successful in sending heartbeats and receiving acknowledgment from the Cluster Manager.
1419
1520
> [!CAUTION]
16-
> The `ClusterConnectionStatus` is likely a symptom or signal and not the root cause and this guide will not be able to provide answers for all scenarios.
17-
> The focus and purpose of this guide is to provide common issues and signals that can be inspected to determine where the issue might be.
18-
## Understanding the Issue
21+
> The information the `ClusterConnectionStatus` provides is an indication of a symptom of instability, not the root cause.
22+
> This guide focuses on identifying basic signals and components that might help locate the problem but might not cover all scenarios.
1923
20-
Cluster Managers ensure continuous Cluster network connectivity through a heartbeat agent running within the target Cluster.
21-
The cluster-heartbeat agent sends periodic HTTP messages to the Cluster Manager and expects an acknowledgment response as well.
22-
A Cluster has the property `ClusterConnectionStatus` which is set to the value `Connected` as the heartbeats are continuously received and acknowledged.
24+
[!include[prereq-az-cli](./includes/baremetal-machines/prerequisites-azure-cli-bare-metal-machine-actions.md)]
2325

24-
The `ClusterConnectionStatus` becomes `Connected` once the cluster is in a healthy state and network connectivity issues are resolved.
25-
If the Cluster is expected to be healthy but the `ClusterConnectionStatus` remains in `Disconnected` state [contact support] after following the steps in this guide.
26+
## Understanding the ClusterConnectionStatus signal
2627

27-
> [!IMPORTANT]
28-
> `ClusterConnectionStatus` is **not** the same as Arc Connected Kubernetes Clusters.
29-
The command can be used to see the value of `ClsuterConnectionStatus` and it is visible in Azure Portal in the Cluster resource's JSON view.
28+
The `ClusterConnectionStatus` represents the ability for the on-premises Cluster to successfully send heartbeats and receive acknowledgments from the Cluster Manager.
29+
The continuous heartbeat messages are meant to detect the network connection health between the on-premises Cluster and the corresponding Cluster Manager.
30+
The `ClusterConnectionStatus` **isn't** the same as the connectivity of the Arc Connected Kubernetes Cluster.
31+
If there's network related issues, it's possible that the Arc Connected Kubernetes Cluster might also be affected.
3032

31-
```azurecli
32-
az networkcloud cluster show --subscription "$SUBSCRIPTION_ID" -g "$CLUSTER_RG" -n "$CLUSTER_NAME" --output table --query "{ClusterConnectionStatus:clusterConnectionStatus}"
33-
ClusterConnectionStatus
34-
-------------------------
35-
Connected
36-
```
33+
A Cluster resource has the property `ClusterConnectionStatus` which is set to the value `Connected` as the heartbeats are continuously received and acknowledged.
34+
The `ClusterConnectionStatus` becomes `Connected` once the Cluster is in a healthy state and network connectivity issues are resolved.
35+
The Cluster shows `Timeout` only as a transitional state between `Connected` and `Disconnected`.
36+
The Cluster `ClusterConnectionStatus` value becomes `Disconnected` as Cluster Manager detects continuously missed heartbeats.
37+
38+
During the Cluster deployment process, the Cluster is in `Undefined` state until the Cluster is fully deployed and operational.
3739

3840
The following table shows which status is displayed depending on the state of the undercloud cluster:
3941

@@ -42,7 +44,28 @@ The following table shows which status is displayed depending on the state of th
4244
| `Connected` | Heartbeats received, indicates healthy cluster and cluster manager connectivity |
4345
| `Disconnected` | Heartbeats missed for __over 5 minutes__, indicates likely connectivity issue between Cluster Manager and Cluster |
4446
| `Timeout` | Heartbeats missed for __over 2 minutes but less than 5 minutes__, cluster connectivity is uncertain possibly degraded |
45-
| `Undefined` | Cluster not yet deployed or running a version without the heartbeats feature |
47+
| `Undefined` | Cluster not yet deployed or running a version without the heartbeats feature |
48+
49+
## Check the ClusterConnectionStatus
50+
51+
The value of `ClusterConnectionStatus` is visible in the Azure portal in the Cluster resource view.
52+
53+
![!include[clusterConnectionStatus](./includes/cluster-connection-status.md)]
54+
55+
Or, you can use the Azure CLI to see the value of `ClusterConnectionStatus`:
56+
57+
```azurecli
58+
az networkcloud cluster show \
59+
-g "$CLUSTER_RG" \
60+
-n "$CLUSTER_NAME" \
61+
--subscription "$SUBSCRIPTION_ID" \
62+
--query "{ClusterConnectionStatus:clusterConnectionStatus}" \
63+
--output table
64+
65+
ClusterConnectionStatus
66+
-------------------------
67+
Connected
68+
```
4669

4770
## Basic Investigation Steps
4871

@@ -55,6 +78,8 @@ TODO - what steps could be done here?
5578
- Are there recent changes to the Managed Identity permissions for the Cluster Manager or Cluster?
5679
- The Managed Identities (MI) and their permissions are used for service-to-service authentication. A change in the permissions results in authentication failures for the heartbeat messages. Cluster Managers must both receive and acknowledge heartbeats failure to do so will also result in a `ClusterConnectionStatus` of `Disconnected`.
5780

81+
If the Cluster is expected to be healthy but the `ClusterConnectionStatus` remains in `Disconnected` state [contact support] after following the steps in this guide.
82+
5883
[!include[stillHavingIssues](./includes/contact-support.md)]
5984

6085
[contact support]: https://portal.azure.com/?#blade/Microsoft_Azure_Support/HelpAndSupportBlade

0 commit comments

Comments
 (0)