You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/operator-nexus/troubleshoot-cluster-heartbeat-connection-status-disconnected.md
+13-11Lines changed: 13 additions & 11 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -23,13 +23,13 @@ For a Cluster, the `ClusterConnectionStatus` represents the stability in the con
23
23
## Understanding the Cluster connection status signal
24
24
25
25
The `ClusterConnectionStatus` represents the ability of the on-premises Cluster to send heartbeats and receive acknowledgments from the Cluster Manager, indicating the health of the network connection between them.
26
-
`ClusterConnectionStatus` distinct from the connectivity of the Arc Connected Kubernetes Cluster, though network issues may affect both.
26
+
`ClusterConnectionStatus` distinct from the connectivity of the Arc Connected Kubernetes Cluster, though network issues affect both.
27
27
28
28
A Cluster resource has the property `ClusterConnectionStatus` which is set to the value `Connected` as the heartbeats are continuously received and acknowledged.
29
29
The `ClusterConnectionStatus` becomes `Connected` once the Cluster is in a healthy state and network connectivity issues are resolved.
30
30
The Cluster shows `Timeout` only as a transitional state between `Connected` and `Disconnected`.
31
31
The Cluster `ClusterConnectionStatus` value becomes `Disconnected` as Cluster Manager detects continuously missed heartbeats.
32
-
Once the cluster is a healthy state and there no network connectivity issues, the `ClusterConnectionStatus`will automatically move to `Connected`
32
+
Once the cluster is a healthy state and there no network connectivity issues, the `ClusterConnectionStatus` automatically moves to `Connected`
33
33
34
34
During the Cluster deployment process, the Cluster is in `Undefined` state until the Cluster is fully deployed and operational.
35
35
@@ -65,7 +65,7 @@ Connected
65
65
66
66
## Common investigation steps
67
67
68
-
The Cluster resource might be affected by infrastructure networking issues (such as DNS, BGP, InfraProxy, etct.), permission changes in the Managed Identity, or other issues that might not be obvious at first.
68
+
Infrastructure networking issues, permission changes in the Managed Identity, or other issues that might not be obvious at first, affect the Cluster resource connection status.
69
69
The following sections provide some common investigation steps and references to help troubleshoot.
70
70
71
71
> [!IMPORTANT]
@@ -74,31 +74,33 @@ The following sections provide some common investigation steps and references to
74
74
75
75
### Cluster Network Fabric health and connectivity
76
76
77
-
It is useful to start with the Network Fabric [controller][Network Fabric Controller] and [services][Network Fabric Services] resources.
78
-
Verify the [network configuration][How to Configure Network Fabric], firewall rules, and any other network-related settings that might be affecting the connectivity.
79
-
Ensure there have not been any recent cabling or network configuration changes that could affect the network connectivity.
77
+
It's useful to start with the Network Fabric [controller][Network Fabric Controller] and [services][Network Fabric Services] resources.
78
+
Verify the [network configuration][How to Configure Network Fabric], including rack cabling, IP addresses, DNS settings, routing rules, firewall rules, and any other network-related settings that might be affecting the connectivity.
80
79
81
80
[How to Configure Network Fabric]: https://learn.microsoft.com/en-us/azure/operator-nexus/howto-configure-network-fabric
-[How to configure diagnostic settings and monitor configuration differences in Nexus Network Fabric](https://learn.microsoft.com/en-us/azure/operator-nexus/howto-configure-diagnostic-settings-monitor-configuration-differences)
-[How to monitor interface In and Out packet rate for network fabric devices](https://learn.microsoft.com/en-us/azure/operator-nexus/howto-monitor-interface-packet-rate)
91
91
92
92
### Recent changes to the Managed Identity permissions
93
93
94
-
- Are there recent changes to the Managed Identity permissions for the Cluster Manager or Cluster?
95
-
- The Managed Identities (MI) and their permissions are used for service-to-service authentication. A change in the permissions results in authentication failures for the heartbeat messages. Cluster Managers must both receive and acknowledge heartbeats failure to do so will also result in a `ClusterConnectionStatus` of `Disconnected`.
94
+
Changes to the Managed Identity permissions for the Cluster Manager or Cluster can affect the Cluster's ability to authenticate against the Cluster Manager.
95
+
The Managed Identities (MI) and their permissions are used for service-to-service authentication.
96
+
A change in the permissions results in authentication failures for the heartbeat messages.
97
+
Even when network connectivity is healthy the Cluster's `ClusterConnectionStatus` shows `Disconnected` when heartbeats aren't successfully received and acknowledged.
96
98
97
99
### Check control-plane BareMetal Machines health
98
100
99
101
The control-plane BareMetal Machines host the component that emits the heartbeats to the Cluster Manager.
100
-
In most cases, the pods running on the control-plane will reschedule automatically to a differnent BareMetal Machine within the control-plane node pool.
101
-
However, if the BareMetal Machines are not healthy, the pods will not be able to reschedule and the Cluster will be unable to send heartbeats.
102
+
In most cases, the pods running on the control-plane reschedule automatically to a different BareMetal Machine within the control-plane node pool.
103
+
However, if the BareMetal Machines aren't healthy, the pods can't reschedule and the Cluster is unable to send heartbeats.
102
104
103
105
To check the BareMetal Machines, use the following command:
0 commit comments