Skip to content

Commit 230918f

Browse files
authored
Merge pull request #18231 from sethmanheim/akstsg-sk
Add AKS Arc connectivity TSG
2 parents f797db7 + 1d4ad4c commit 230918f

File tree

3 files changed

+87
-1
lines changed

3 files changed

+87
-1
lines changed

AKS-Arc/TOC.yml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -187,6 +187,8 @@
187187
href: network-validation-errors.md
188188
- name: Network validation error due to .local domain
189189
href: network-validation-error-local.md
190+
- name: BGP with FRR not working
191+
href: connectivity-troubleshoot.md
190192
- name: Reference
191193
items:
192194
- name: Azure CLI

AKS-Arc/aks-troubleshoot.md

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@ title: Troubleshoot common issues in AKS enabled by Azure Arc
33
description: Learn about common issues and workarounds in AKS enabled by Arc.
44
ms.topic: how-to
55
author: sethmanheim
6-
ms.date: 04/30/2025
6+
ms.date: 06/18/2025
77
ms.author: sethm
88
ms.lastreviewed: 04/01/2025
99
ms.reviewer: abha
@@ -36,13 +36,15 @@ The following sections describe known issues for AKS enabled by Azure Arc:
3636

3737
| AKS Arc operation | Issue |
3838
|------------------------|-------|
39+
| General network validation errors | [Troubleshoot network validation errors](network-validation-errors.md) |
3940
| Create validation | [Control plane configuration validation errors](control-plane-validation-errors.md) |
4041
| Create validation | [K8sVersionValidation error](cluster-k8s-version.md) |
4142
| Create validation | [KubeAPIServer unreachable error](kube-api-server-unreachable.md) |
4243
| Network configuration issues | [Use diagnostic checker](aks-arc-diagnostic-checker.md) |
4344
| Kubernetes steady state | [Resolve issues due to out-of-band deletion of storage volumes](delete-storage-volume.md) |
4445
| Release validation | [Azure Advisor upgrade recommendation message](azure-advisor-upgrade.md) |
4546
| Network validation | [Network validation error due to .local domain](network-validation-error-local.md) |
47+
| BGP with FRR not working | [Troubleshoot BGP with FRR in AKS Arc environments](connectivity-troubleshoot.md) |
4648

4749
## Next steps
4850

AKS-Arc/connectivity-troubleshoot.md

Lines changed: 82 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,82 @@
1+
---
2+
title: Troubleshoot BGP with FRR in AKS Arc environments
3+
description: Learn how to troubleshoot BGP connectivity issues when using MetalLB with FRR in AKS Arc deployments.
4+
author: sethmanheim
5+
ms.date: 06/19/2025
6+
ms.author: sethm
7+
ms.topic: troubleshooting
8+
ms.reviewer: srikantsarwa
9+
ms.lastreviewed: 06/19/2025
10+
11+
---
12+
13+
# BGP with FRR not working in AKS Arc environment
14+
15+
This article helps you identify and resolve Border Gateway Protocol (BGP) connectivity issues when using MetalLB with Free Range Routing (FRR) in Azure Kubernetes Service (AKS) Arc environments.
16+
17+
Use this guidance when BGP sessions fail to establish, external IP routing doesn't work correctly, or network connectivity to exposed services becomes unreliable in your AKS Arc deployment.
18+
19+
## Symptoms
20+
21+
In environments using MetalLB with FRR for BGP peering, you might experience the following issues:
22+
23+
- BGP sessions are not established or keep flapping, a condition where the BGP session repeatedly goes up and down, causing route instability. This behavior can be due to network issues, misconfigurations, or hardware problems. It can result in degraded performance or loss of service availability.
24+
- Services of type `LoadBalancer` don't receive properly routed external IPs.
25+
- Advertised routes are missing or not propagated to upstream routers.
26+
- Network connectivity to exposed services is inconsistent or unavailable.
27+
28+
These symptoms are often observed in specific hardware environments such as Hyper-Converged Infrastructure (HCI) or where strict network/security policies are enforced.
29+
30+
## Mitigation
31+
32+
If you encounter these issues with FRR, you can temporarily disable it using Azure CLI:
33+
34+
```azurecli
35+
# Retrieve the object ID for the managed identity
36+
$objID = az ad sp list --filter "appId eq '087fca6e-4606-4d41-b3f6-5ebdf75b8b4c'" --query "[].id" --output tsv
37+
38+
# Update the arcnetworking extension to disable FRR
39+
az k8s-extension update \
40+
--cluster-name $clusterName \
41+
-g $rgName \
42+
--cluster-type connectedClusters \
43+
--extension-type microsoft.arcnetworking \
44+
--config "k8sRuntimeFpaObjectId=$objID" \
45+
--config "metallb.speaker.frr.enabled=false" \
46+
-n arcnetworking
47+
```
48+
49+
## Troubleshooting steps
50+
51+
Use the following steps to diagnose and resolve BGP issues with MetalLB and FRR in your AKS Arc environment.
52+
53+
### Check BGP configuration
54+
55+
```azurecli
56+
kubectl get ipaddresspools -A -o yaml
57+
kubectl get bgppeers.metallb.io -A -o yaml
58+
kubectl get bgpadvertisements -A -o yaml
59+
```
60+
61+
### Collect logs from MetalLB speaker (FRR)
62+
63+
```azurecli
64+
# Get the list of MetalLB speaker pods
65+
kubectl get pods -n kube-system
66+
67+
# Speaker container logs
68+
kubectl logs -n kube-system arcnetworking-metallb-speaker-xxxxx -c speaker
69+
70+
# FRR container logs
71+
kubectl logs -n kube-system arcnetworking-metallb-speaker-xxxxx -c frr
72+
```
73+
74+
### Review TOR switch configuration
75+
76+
- Configuration and logs from the top-of-rack (TOR) switch or upstream router might be necessary.
77+
- These logs are hardware/vendor-specific and not covered in this guide.
78+
79+
## Next steps
80+
81+
[Official MetalLB troubleshooting guide](https://metallb.universe.tf/troubleshooting/#with-frr)
82+

0 commit comments

Comments
 (0)