Skip to content

Commit 19ad681

Browse files
committed
updates to article cluster connection status
1 parent 352d828 commit 19ad681

File tree

1 file changed

+13
-11
lines changed

1 file changed

+13
-11
lines changed

articles/operator-nexus/troubleshoot-cluster-heartbeat-connection-status-disconnected.md

Lines changed: 13 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -23,13 +23,13 @@ For a Cluster, the `ClusterConnectionStatus` represents the stability in the con
2323
## Understanding the Cluster connection status signal
2424

2525
The `ClusterConnectionStatus` represents the ability of the on-premises Cluster to send heartbeats and receive acknowledgments from the Cluster Manager, indicating the health of the network connection between them.
26-
`ClusterConnectionStatus` distinct from the connectivity of the Arc Connected Kubernetes Cluster, though network issues may affect both.
26+
`ClusterConnectionStatus` distinct from the connectivity of the Arc Connected Kubernetes Cluster, though network issues affect both.
2727

2828
A Cluster resource has the property `ClusterConnectionStatus` which is set to the value `Connected` as the heartbeats are continuously received and acknowledged.
2929
The `ClusterConnectionStatus` becomes `Connected` once the Cluster is in a healthy state and network connectivity issues are resolved.
3030
The Cluster shows `Timeout` only as a transitional state between `Connected` and `Disconnected`.
3131
The Cluster `ClusterConnectionStatus` value becomes `Disconnected` as Cluster Manager detects continuously missed heartbeats.
32-
Once the cluster is a healthy state and there no network connectivity issues, the `ClusterConnectionStatus` will automatically move to `Connected`
32+
Once the cluster is a healthy state and there no network connectivity issues, the `ClusterConnectionStatus` automatically moves to `Connected`
3333

3434
During the Cluster deployment process, the Cluster is in `Undefined` state until the Cluster is fully deployed and operational.
3535

@@ -65,7 +65,7 @@ Connected
6565

6666
## Common investigation steps
6767

68-
The Cluster resource might be affected by infrastructure networking issues (such as DNS, BGP, InfraProxy, etct.), permission changes in the Managed Identity, or other issues that might not be obvious at first.
68+
Infrastructure networking issues, permission changes in the Managed Identity, or other issues that might not be obvious at first, affect the Cluster resource connection status.
6969
The following sections provide some common investigation steps and references to help troubleshoot.
7070

7171
> [!IMPORTANT]
@@ -74,31 +74,33 @@ The following sections provide some common investigation steps and references to
7474
7575
### Cluster Network Fabric health and connectivity
7676

77-
It is useful to start with the Network Fabric [controller][Network Fabric Controller] and [services][Network Fabric Services] resources.
78-
Verify the [network configuration][How to Configure Network Fabric], firewall rules, and any other network-related settings that might be affecting the connectivity.
79-
Ensure there have not been any recent cabling or network configuration changes that could affect the network connectivity.
77+
It's useful to start with the Network Fabric [controller][Network Fabric Controller] and [services][Network Fabric Services] resources.
78+
Verify the [network configuration][How to Configure Network Fabric], including rack cabling, IP addresses, DNS settings, routing rules, firewall rules, and any other network-related settings that might be affecting the connectivity.
8079

8180
[How to Configure Network Fabric]: https://learn.microsoft.com/en-us/azure/operator-nexus/howto-configure-network-fabric
8281
[Network Fabric Controller]: https://learn.microsoft.com/en-us/azure/operator-nexus/concepts-network-fabric-controller
8382
[Network Fabric Services]: https://learn.microsoft.com/en-us/azure/operator-nexus/concepts-network-fabric-services
8483

8584
Evaluate any configured monitoring or metrics for the Network Fabric resources.
86-
See the following links for more information:
85+
For more information, see the following links:
86+
8787
- [Nexus Network Fabric configuration monitoring overview](https://learn.microsoft.com/en-us/azure/operator-nexus/concepts-network-fabric-configuration-monitoring)
8888
- [How to configure diagnostic settings and monitor configuration differences in Nexus Network Fabric](https://learn.microsoft.com/en-us/azure/operator-nexus/howto-configure-diagnostic-settings-monitor-configuration-differences)
8989
- [Azure Operator Nexus Network Fabric internal network BGP metrics](https://learn.microsoft.com/en-us/azure/operator-nexus/concepts-internal-network-bgp-metrics)
9090
- [How to monitor interface In and Out packet rate for network fabric devices](https://learn.microsoft.com/en-us/azure/operator-nexus/howto-monitor-interface-packet-rate)
9191

9292
### Recent changes to the Managed Identity permissions
9393

94-
- Are there recent changes to the Managed Identity permissions for the Cluster Manager or Cluster?
95-
- The Managed Identities (MI) and their permissions are used for service-to-service authentication. A change in the permissions results in authentication failures for the heartbeat messages. Cluster Managers must both receive and acknowledge heartbeats failure to do so will also result in a `ClusterConnectionStatus` of `Disconnected`.
94+
Changes to the Managed Identity permissions for the Cluster Manager or Cluster can affect the Cluster's ability to authenticate against the Cluster Manager.
95+
The Managed Identities (MI) and their permissions are used for service-to-service authentication.
96+
A change in the permissions results in authentication failures for the heartbeat messages.
97+
Even when network connectivity is healthy the Cluster's `ClusterConnectionStatus` shows `Disconnected` when heartbeats aren't successfully received and acknowledged.
9698

9799
### Check control-plane BareMetal Machines health
98100

99101
The control-plane BareMetal Machines host the component that emits the heartbeats to the Cluster Manager.
100-
In most cases, the pods running on the control-plane will reschedule automatically to a differnent BareMetal Machine within the control-plane node pool.
101-
However, if the BareMetal Machines are not healthy, the pods will not be able to reschedule and the Cluster will be unable to send heartbeats.
102+
In most cases, the pods running on the control-plane reschedule automatically to a different BareMetal Machine within the control-plane node pool.
103+
However, if the BareMetal Machines aren't healthy, the pods can't reschedule and the Cluster is unable to send heartbeats.
102104

103105
To check the BareMetal Machines, use the following command:
104106

0 commit comments

Comments
 (0)