Skip to content

Commit 2cadc51

Browse files
authored
Merge pull request #290899 from v-saambe/main
Create Troubleshoot LACP Bonding
2 parents bd6d5ef + 1a2c0bd commit 2cadc51

File tree

2 files changed

+116
-60
lines changed

2 files changed

+116
-60
lines changed

articles/operator-nexus/TOC.yml

Lines changed: 64 additions & 60 deletions
Original file line numberDiff line numberDiff line change
@@ -39,19 +39,19 @@
3939
- name: Route Policy
4040
expanded: false
4141
items:
42-
- name: Route Policy Overview
43-
href: concepts-nexus-route-policies-overview.md
44-
- name: IP Prefixes Overview
45-
href: concepts-nexus-ip-prefix.md
42+
- name: Route Policy Overview
43+
href: concepts-nexus-route-policies-overview.md
44+
- name: IP Prefixes Overview
45+
href: concepts-nexus-ip-prefix.md
4646
- name: Isolation Domains
4747
expanded: false
4848
items:
49-
- name: Isolation Domains overview
50-
href: concepts-isolation-domain.md
51-
- name: Isolation Domain configuration
52-
href: reference-isolation-domain-configuration.md
53-
- name: Technical requirements for Isolation Domains
54-
href: reference-isolation-domain-technical-requirements.md
49+
- name: Isolation Domains overview
50+
href: concepts-isolation-domain.md
51+
- name: Isolation Domain configuration
52+
href: reference-isolation-domain-configuration.md
53+
- name: Technical requirements for Isolation Domains
54+
href: reference-isolation-domain-technical-requirements.md
5555
- name: Access Control Lists
5656
href: concepts-access-control-lists.md
5757
- name: Nexus Kubernetes Cluster
@@ -157,60 +157,61 @@
157157
- name: Network Fabric
158158
expanded: false
159159
items:
160-
- name: Isolation Domain
161-
href: howto-configure-isolation-domain.md
162-
- name: Isolation Domain Configuration Examples
163-
href: reference-isolation-domain-configuration-examples.md
164-
- name: Network Fabric Route Policy
165-
href: how-to-route-policy.md
166-
- name: IP Prefixes
167-
href: how-to-ip-prefixes.md
168-
- name: Network Packet Broker
169-
href: howto-configure-network-packet-broker.md
170-
- name: Validate cables for Nexus Network Fabric
171-
href: how-to-validate-cables.md
172-
- name: Creating Access Control Lists (ACLs)
173-
href: howto-create-access-control-list-for-network-to-network-interconnects.md
174-
- name: Apply ACLs to Network-to-Network Interconnects (NNI)
175-
href: howto-apply-access-control-list-to-network-to-network-interconnects.md
176-
- name: Updating ACL on Network-for-Network Interconnects (NNI)
177-
href: howto-update-access-control-list-for-network-to-network-interconnects.md
178-
- name: Delete ACLs associated with Network-to-Network Interconnects (NNI)
179-
href: howto-delete-access-control-list-network-to-network-interconnect.md
180-
- name: How to Configure Diagnostic Settings and Monitor Configuration Differences in Nexus Network Fabric
181-
href: howto-configure-diagnostic-settings-monitor-configuration-differences.md
182-
- name: How to Delete L3 Isolation Domains in Azure Nexus Network Fabric
183-
href: howto-delete-layer-3-isolation-domains.md
184-
- name: How to monitor interface In and Out packet rate for network fabric devices
185-
href: howto-monitor-interface-packet-rate.md
186-
- name: How to Delete L3 Isolation Domains in Azure Nexus Network Fabric
187-
href: howto-delete-layer-3-isolation-domains.md
188-
- name: Cross-subscription deployments and required RBAC for Network Fabric
189-
href: concepts-cross-subscription-deployments-required-rbac-for-network-fabric.md
190-
- name: How to replace network devices in Azure Operator Nexus Network Fabric
191-
href: howto-replace-network-devices.md
192-
- name: How to put a device into maintenance mode
193-
href: howto-put-device-in-maintenance-mode.md
194-
- name: How to upgrade Network Fabric
195-
href: howto-upgrade-nexus-fabric.md
160+
- name: Isolation Domain
161+
href: howto-configure-isolation-domain.md
162+
- name: Isolation Domain Configuration Examples
163+
href: reference-isolation-domain-configuration-examples.md
164+
- name: Network Fabric Route Policy
165+
href: how-to-route-policy.md
166+
- name: IP Prefixes
167+
href: how-to-ip-prefixes.md
168+
- name: Network Packet Broker
169+
href: howto-configure-network-packet-broker.md
170+
- name: Validate cables for Nexus Network Fabric
171+
href: how-to-validate-cables.md
172+
- name: Creating Access Control Lists (ACLs)
173+
href: howto-create-access-control-list-for-network-to-network-interconnects.md
174+
- name: Apply ACLs to Network-to-Network Interconnects (NNI)
175+
href: howto-apply-access-control-list-to-network-to-network-interconnects.md
176+
- name: Updating ACL on Network-for-Network Interconnects (NNI)
177+
href: howto-update-access-control-list-for-network-to-network-interconnects.md
178+
- name: Delete ACLs associated with Network-to-Network Interconnects (NNI)
179+
href: howto-delete-access-control-list-network-to-network-interconnect.md
180+
- name: How to Configure Diagnostic Settings and Monitor Configuration Differences
181+
in Nexus Network Fabric
182+
href: howto-configure-diagnostic-settings-monitor-configuration-differences.md
183+
- name: How to Delete L3 Isolation Domains in Azure Nexus Network Fabric
184+
href: howto-delete-layer-3-isolation-domains.md
185+
- name: How to monitor interface In and Out packet rate for network fabric devices
186+
href: howto-monitor-interface-packet-rate.md
187+
- name: How to Delete L3 Isolation Domains in Azure Nexus Network Fabric
188+
href: howto-delete-layer-3-isolation-domains.md
189+
- name: Cross-subscription deployments and required RBAC for Network Fabric
190+
href: concepts-cross-subscription-deployments-required-rbac-for-network-fabric.md
191+
- name: How to replace network devices in Azure Operator Nexus Network Fabric
192+
href: howto-replace-network-devices.md
193+
- name: How to put a device into maintenance mode
194+
href: howto-put-device-in-maintenance-mode.md
195+
- name: How to upgrade Network Fabric
196+
href: howto-upgrade-nexus-fabric.md
196197
- name: Cluster
197198
expanded: false
198199
items:
199200
- name: BareMetal Actions
200201
expanded: false
201202
items:
202-
- name: BareMetal BMM Access Setup
203-
href: howto-baremetal-bmm-ssh.md
204-
- name: BareMetal BMC Access Setup
205-
href: howto-baremetal-bmc-ssh.md
206-
- name: BareMetal Functions
207-
href: howto-baremetal-functions.md
208-
- name: BareMetal Run-Read Execution
209-
href: howto-baremetal-run-read.md
210-
- name: BareMetal Run-Data-Extract Execution
211-
href: howto-baremetal-run-data-extract.md
212-
- name: Running BareMetal actions directly with nexusctl
213-
href: howto-baremetal-nexusctl.md
203+
- name: BareMetal BMM Access Setup
204+
href: howto-baremetal-bmm-ssh.md
205+
- name: BareMetal BMC Access Setup
206+
href: howto-baremetal-bmc-ssh.md
207+
- name: BareMetal Functions
208+
href: howto-baremetal-functions.md
209+
- name: BareMetal Run-Read Execution
210+
href: howto-baremetal-run-read.md
211+
- name: BareMetal Run-Data-Extract Execution
212+
href: howto-baremetal-run-data-extract.md
213+
- name: Running BareMetal actions directly with nexusctl
214+
href: howto-baremetal-nexusctl.md
214215
- name: Nexus Kubernetes cluster
215216
expanded: false
216217
items:
@@ -308,9 +309,11 @@
308309
href: troubleshoot-accepted-cluster-hydration.md
309310
- name: Troubleshoot Out of Memory Pods
310311
href: troubleshoot-memory-limits.md
312+
- name: Troubleshoot LACP Bonding
313+
href: troubleshoot-lacp-bonding.md
311314
- name: Storage Array
312315
expanded: false
313-
items:
316+
items: null
314317
- name: Tenant Workload
315318
expanded: false
316319
items:
@@ -327,7 +330,8 @@
327330
href: troubleshoot-internet-host-virtual-machine.md
328331
- name: Troubleshoot VM errors after BMM restart
329332
href: troubleshoot-vm-error-after-reboot.md
330-
- name: Troubleshooting dual-stack configuration issues for Nexus Kubernetes cluster
333+
- name: Troubleshooting dual-stack configuration issues for Nexus Kubernetes
334+
cluster
331335
href: troubleshoot-kubernetes-cluster-dual-stack-configuration.md
332336
- name: FAQ
333337
href: azure-operator-nexus-faq.md
Lines changed: 52 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,52 @@
1+
---
2+
title: "Azure Operator Nexus: Networking"
3+
description: Checking LACP Bonding on Physical Hosts.
4+
author: keithritchie73
5+
ms.author: keithritchie
6+
ms.service: azure-operator-nexus
7+
ms.custom: azure-operator-nexus
8+
ms.topic: troubleshooting
9+
ms.date: 11/15/2024
10+
---
11+
12+
# Checking LACP Bonding on Physical Hosts
13+
14+
On physical host startup, the two Mellanox cards are LACP bonded to a pair of Arista switches. If LACP isn't properly negotiated between the server's cards and the switches, it can cause strange packet loss or load balancing behavior. These errors might not be noticeable until a tenant workload attempts to pass traffic and is due to the hashing/load balancing nature of LACP.
15+
16+
## Diagnosis
17+
18+
If, LACP isn't negotiated correctly traffic loss can occur. But traffic can pass for some flows too. This behavior can manifest itself as a vm that can't get on the network, or even oam/storage outages.
19+
20+
## Checking LACP Bonding
21+
22+
To check the LACP bonding status on a physical host run the following command. For control plane hosts, use file 8a_pf_bond as there's only one Mellanox card on those hosts. For worker hosts, use either 4b_pf_bond or 98_pf_bond to check its two cards.
23+
24+
```bash
25+
# cat /proc/net/bonding/8a_pf_bond
26+
```
27+
28+
### Interpreting the results
29+
30+
Key validations to check in the /proc/net/bonding/ output are:
31+
32+
For Bond level (the top part):
33+
34+
1. MII Status: up - Is the entire bond up
35+
2. LACP active: on - Is LACP active
36+
3. Aggregator ID: 1 - The top level aggregator ID should match both replicas. See each port for its aggregator ID.
37+
4. System MAC address: 42:56:86:9c:81:89 - Is there a System MAC defined. If a bond isn't negotiated this will be undefined or all zeros, e.g 00:00:00:00:00:00
38+
39+
For each port:
40+
41+
1. MII Status: up - Is the interface up
42+
2. Aggregator ID: 1 - Both replicas should have the same aggregator ID
43+
3. details partner lacp pdu: port state 61 - The value is a bit mask that represents the LACP negotiation state on that port. Generally 61 and 63 are what we want. [See](https://movingpackets.net/2017/10/17/decoding-lacp-port-state)
44+
45+
### Fixing the issue
46+
47+
The most common causes for these LACP issues are host/switch miswiring or mismatched LACP/MLAG configuration on the Arista switches. Investigate the situation by tracing out and repairing any wiring issues. If the wiring is correct, then determine if the switch LACP/MLAG configuration is incorrect.
48+
49+
## Further information
50+
51+
If you still have questions, [contact support](https://portal.azure.com/?#blade/Microsoft_Azure_Support/HelpAndSupportBlade).
52+
For more information about Support plans, see [Azure Support plans](https://azure.microsoft.com/support/plans/response/).

0 commit comments

Comments
 (0)