|
| 1 | +--- |
| 2 | +title: Troubleshoot BGP with FRR in AKS Arc environments |
| 3 | +description: Learn how to troubleshoot BGP connectivity issues when using MetalLB with FRR in AKS Arc deployments. |
| 4 | +author: sethmanheim |
| 5 | +ms.date: 06/19/2025 |
| 6 | +ms.author: sethm |
| 7 | +ms.topic: troubleshooting |
| 8 | +ms.reviewer: srikantsarwa |
| 9 | +ms.lastreviewed: 06/19/2025 |
| 10 | + |
| 11 | +--- |
| 12 | + |
| 13 | +# BGP with FRR not working in AKS Arc environment |
| 14 | + |
| 15 | +This article helps you identify and resolve Border Gateway Protocol (BGP) connectivity issues when using MetalLB with Free Range Routing (FRR) in Azure Kubernetes Service (AKS) Arc environments. |
| 16 | + |
| 17 | +Use this guidance when BGP sessions fail to establish, external IP routing doesn't work correctly, or network connectivity to exposed services becomes unreliable in your AKS Arc deployment. |
| 18 | + |
| 19 | +## Symptoms |
| 20 | + |
| 21 | +In environments using MetalLB with FRR for BGP peering, you might experience the following issues: |
| 22 | + |
| 23 | +- BGP sessions are not established or keep flapping, a condition where the BGP session repeatedly goes up and down, causing route instability. This behavior can be due to network issues, misconfigurations, or hardware problems. It can result in degraded performance or loss of service availability. |
| 24 | +- Services of type `LoadBalancer` don't receive properly routed external IPs. |
| 25 | +- Advertised routes are missing or not propagated to upstream routers. |
| 26 | +- Network connectivity to exposed services is inconsistent or unavailable. |
| 27 | + |
| 28 | +These symptoms are often observed in specific hardware environments such as Hyper-Converged Infrastructure (HCI) or where strict network/security policies are enforced. |
| 29 | + |
| 30 | +## Mitigation |
| 31 | + |
| 32 | +If you encounter these issues with FRR, you can temporarily disable it using Azure CLI: |
| 33 | + |
| 34 | +```azurecli |
| 35 | +# Retrieve the object ID for the managed identity |
| 36 | +$objID = az ad sp list --filter "appId eq '087fca6e-4606-4d41-b3f6-5ebdf75b8b4c'" --query "[].id" --output tsv |
| 37 | +
|
| 38 | +# Update the arcnetworking extension to disable FRR |
| 39 | +az k8s-extension update \ |
| 40 | + --cluster-name $clusterName \ |
| 41 | + -g $rgName \ |
| 42 | + --cluster-type connectedClusters \ |
| 43 | + --extension-type microsoft.arcnetworking \ |
| 44 | + --config "k8sRuntimeFpaObjectId=$objID" \ |
| 45 | + --config "metallb.speaker.frr.enabled=false" \ |
| 46 | + -n arcnetworking |
| 47 | +``` |
| 48 | + |
| 49 | +## Troubleshooting steps |
| 50 | + |
| 51 | +Use the following steps to diagnose and resolve BGP issues with MetalLB and FRR in your AKS Arc environment. |
| 52 | + |
| 53 | +### Check BGP configuration |
| 54 | + |
| 55 | +```azurecli |
| 56 | +kubectl get ipaddresspools -A -o yaml |
| 57 | +kubectl get bgppeers.metallb.io -A -o yaml |
| 58 | +kubectl get bgpadvertisements -A -o yaml |
| 59 | +``` |
| 60 | + |
| 61 | +### Collect logs from MetalLB speaker (FRR) |
| 62 | + |
| 63 | +```azurecli |
| 64 | +# Get the list of MetalLB speaker pods |
| 65 | +kubectl get pods -n kube-system |
| 66 | +
|
| 67 | +# Speaker container logs |
| 68 | +kubectl logs -n kube-system arcnetworking-metallb-speaker-xxxxx -c speaker |
| 69 | +
|
| 70 | +# FRR container logs |
| 71 | +kubectl logs -n kube-system arcnetworking-metallb-speaker-xxxxx -c frr |
| 72 | +``` |
| 73 | + |
| 74 | +### Review TOR switch configuration |
| 75 | + |
| 76 | +- Configuration and logs from the top-of-rack (TOR) switch or upstream router might be necessary. |
| 77 | +- These logs are hardware/vendor-specific and not covered in this guide. |
| 78 | + |
| 79 | +## Next steps |
| 80 | + |
| 81 | +[Official MetalLB troubleshooting guide](https://metallb.universe.tf/troubleshooting/#with-frr) |
| 82 | + |
0 commit comments