Skip to content

Commit c28b38c

Browse files
committed
Add troubleshooting of guest cluster LB IP is not reachable
Signed-off-by: Jian Wang <[email protected]>
1 parent 7f83cd9 commit c28b38c

File tree

3 files changed

+66
-1
lines changed

3 files changed

+66
-1
lines changed

docs/rancher/cloud-provider.md

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -393,7 +393,9 @@ Harvester's built-in load balancer offers both **DHCP** and **Pool** modes, and
393393
394394
:::note
395395
396-
Modifying the `IPAM` mode isn't allowed. You must create a new service if you intend to change the `IPAM` mode.
396+
- Modifying the `IPAM` mode isn't allowed. You must create a new service if you intend to change the `IPAM` mode.
397+
398+
- Refer to [Guest Cluster Loadbalancer IP is not reachable](../troubleshooting/rancher.md#guest-cluster-loadbalancer-ip-is-not-reachable).
397399
398400
:::
399401

docs/troubleshooting/rancher.md

Lines changed: 63 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -83,3 +83,66 @@ Related issues:
8383
8484
- Harvester: [#7105](https://github.com/harvester/harvester/issues/7105) and [#7284](https://github.com/harvester/harvester/issues/7284)
8585
- Rancher: [#45628](https://github.com/rancher/rancher/issues/45628)
86+
87+
## Guest Cluster Loadbalancer IP is not reachable
88+
89+
### Issue Description
90+
91+
1. Create a new [guest cluster](../rancher/node/rke2-cluster.md#create-rke2-kubernetes-cluster) with the default `Container Network: Calico` and the default `Cloud Provider: Harvester`.
92+
93+
1. Deploy `nginx` on this new guest cluster via command `kubectl apply -f https://k8s.io/examples/application/deployment.yaml`.
94+
95+
1. Create a [Load Balancer](../rancher/cloud-provider.md#load-balancer-support), which selects backend nginx.
96+
97+
1. The service is ready with allocated IP from DHCP server or IPPool, but when clicking the link the page might fail to show.
98+
99+
![](/img/v1.5/troubleshooting/gc-lb-is-not-reachable.png)
100+
101+
### Root Cause
102+
103+
In below example, the guest cluster node(Harvester VM)'s IP is `10.115.1.46`, and later a new Loadbalancer IP `10.115.6.200` is added to a new interface like `vip-fd8c28ce (@enp1s0)`. However, the Loadbalancer IP is taken over by the `calio` controller. It caused the Loadbalancer IP is not reachable.
104+
105+
```sh
106+
$ ip -d link show dev vxlan.calico
107+
44: vxlan.calico: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
108+
link/ether 66:a7:41:00:1d:ba brd ff:ff:ff:ff:ff:ff promiscuity 0 allmulti 0 minmtu 68 maxmtu 65535
109+
info: Using default fan map value (33)
110+
vxlan id 4096 local 10.115.6.200 dev vip-8a928fa0 srcport 0 0 dstport 4789 nolearning ttl auto ageing 300 udpcsum noudp6zerocsumtx noudp6zerocsumrx addrgenmode eui64 numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535 tso_max_size 65536 tso_max_segs 65535 gro_max_size 65536
111+
112+
The IP 10.115.6.200 is from the vip-* interface.
113+
114+
```
115+
116+
### Workaround
117+
118+
For exsting clusters, run command `$ kubectl edit installation`, go to `.spec.calicoNetwork.nodeAddressAutodetectionV4`, remove any existing line like `firstFound: true`, add new line `skipInterface: vip.*` and save.
119+
120+
Wait a while, the daemonset `calico-system/calico-node` is rolling updated and then the related PODs take the node IP for VXLAN to use.
121+
122+
```sh
123+
$ ip -d link show dev vxlan.calico
124+
45: vxlan.calico: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
125+
link/ether 66:a7:41:00:1d:ba brd ff:ff:ff:ff:ff:ff promiscuity 0 allmulti 0 minmtu 68 maxmtu 65535
126+
info: Using default fan map value (33)
127+
vxlan id 4096 local 10.115.1.46 dev enp1s0 srcport 0 0 dstport 4789 nolearning ttl auto ageing 300 udpcsum noudp6zerocsumtx noudp6zerocsumrx addrgenmode eui64 numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535 tso_max_size 65536 tso_max_segs 65535 gro_max_size 65536
128+
129+
The IP 10.115.1.46 is from the node main nic enp1s0 interface.
130+
```
131+
132+
The Loadbalancer IP is reachable again.
133+
134+
135+
When creating new clusters on `Rancher Manager`, click **Add-on: Calico**, add following two lines to `.installation.calicoNetwork`. The `calico` controller won't take over the Loadbalancer IP accidentally.
136+
137+
```yaml
138+
installation:
139+
backend: VXLAN
140+
calicoNetwork:
141+
bgp: Disabled
142+
nodeAddressAutodetectionV4: // add this line
143+
skipInterface: vip.* // add this line
144+
```
145+
146+
### Related Issue
147+
148+
https://github.com/harvester/harvester/issues/8072
311 KB
Loading

0 commit comments

Comments
 (0)