-
Notifications
You must be signed in to change notification settings - Fork 94
Add troubleshooting of guest cluster LB IP is not reachable #909
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||||
|---|---|---|---|---|---|---|
|
|
@@ -83,3 +83,66 @@ Related issues: | |||||
|
|
||||||
| - Harvester: [#7105](https://github.com/harvester/harvester/issues/7105) and [#7284](https://github.com/harvester/harvester/issues/7284) | ||||||
| - Rancher: [#45628](https://github.com/rancher/rancher/issues/45628) | ||||||
|
|
||||||
| ## Guest Cluster Loadbalancer IP is not reachable | ||||||
|
|
||||||
| ### Issue Description | ||||||
|
|
||||||
| 1. Create a new [guest cluster](../rancher/node/rke2-cluster.md#create-rke2-kubernetes-cluster) with the default `Container Network: Calico` and the default `Cloud Provider: Harvester`. | ||||||
|
|
||||||
| 1. Deploy `nginx` on this new guest cluster via command `kubectl apply -f https://k8s.io/examples/application/deployment.yaml`. | ||||||
|
|
||||||
| 1. Create a [Load Balancer](../rancher/cloud-provider.md#load-balancer-support), which selects backend nginx. | ||||||
|
|
||||||
| 1. The service is ready with allocated IP from DHCP server or IPPool, but when clicking the link the page might fail to show. | ||||||
|
|
||||||
|  | ||||||
|
|
||||||
| ### Root Cause | ||||||
|
|
||||||
| In below example, the guest cluster node(Harvester VM)'s IP is `10.115.1.46`, and later a new Loadbalancer IP `10.115.6.200` is added to a new interface like `vip-fd8c28ce (@enp1s0)`. However, the Loadbalancer IP is taken over by the `calio` controller. It caused the Loadbalancer IP is not reachable. | ||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. At the end of it, the command below where we run
Suggested change
This might not be right, but context from which ip would be valid would make things clear |
||||||
|
|
||||||
| ```sh | ||||||
| $ ip -d link show dev vxlan.calico | ||||||
| 44: vxlan.calico: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000 | ||||||
| link/ether 66:a7:41:00:1d:ba brd ff:ff:ff:ff:ff:ff promiscuity 0 allmulti 0 minmtu 68 maxmtu 65535 | ||||||
| info: Using default fan map value (33) | ||||||
| vxlan id 4096 local 10.115.6.200 dev vip-8a928fa0 srcport 0 0 dstport 4789 nolearning ttl auto ageing 300 udpcsum noudp6zerocsumtx noudp6zerocsumrx addrgenmode eui64 numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535 tso_max_size 65536 tso_max_segs 65535 gro_max_size 65536 | ||||||
|
|
||||||
| The IP 10.115.6.200 is from the vip-* interface. | ||||||
|
|
||||||
| ``` | ||||||
|
|
||||||
| ### Workaround | ||||||
|
|
||||||
| For exsting clusters, run command `$ kubectl edit installation`, go to `.spec.calicoNetwork.nodeAddressAutodetectionV4`, remove any existing line like `firstFound: true`, add new line `skipInterface: vip.*` and save. | ||||||
|
|
||||||
| Wait a while, the daemonset `calico-system/calico-node` is rolling updated and then the related PODs take the node IP for VXLAN to use. | ||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Similar to above, from whet context would I run the |
||||||
|
|
||||||
| ```sh | ||||||
| $ ip -d link show dev vxlan.calico | ||||||
| 45: vxlan.calico: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000 | ||||||
| link/ether 66:a7:41:00:1d:ba brd ff:ff:ff:ff:ff:ff promiscuity 0 allmulti 0 minmtu 68 maxmtu 65535 | ||||||
| info: Using default fan map value (33) | ||||||
| vxlan id 4096 local 10.115.1.46 dev enp1s0 srcport 0 0 dstport 4789 nolearning ttl auto ageing 300 udpcsum noudp6zerocsumtx noudp6zerocsumrx addrgenmode eui64 numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535 tso_max_size 65536 tso_max_segs 65535 gro_max_size 65536 | ||||||
|
|
||||||
| The IP 10.115.1.46 is from the node main nic enp1s0 interface. | ||||||
| ``` | ||||||
|
|
||||||
| The Loadbalancer IP is reachable again. | ||||||
|
|
||||||
|
|
||||||
| When creating new clusters on `Rancher Manager`, click **Add-on: Calico**, add following two lines to `.installation.calicoNetwork`. The `calico` controller won't take over the Loadbalancer IP accidentally. | ||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Would a YAML window pop up when clicking on
Suggested change
Not sure whether YAML config window will appear or we edit through kubectl client object against the k8s so adding a short sentence would be helpful IMHO. |
||||||
|
|
||||||
| ```yaml | ||||||
| installation: | ||||||
| backend: VXLAN | ||||||
| calicoNetwork: | ||||||
| bgp: Disabled | ||||||
| nodeAddressAutodetectionV4: // add this line | ||||||
| skipInterface: vip.* // add this line | ||||||
| ``` | ||||||
|
|
||||||
| ### Related Issue | ||||||
|
|
||||||
| https://github.com/harvester/harvester/issues/8072 | ||||||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we state which version is affected by this? AIUI, it's cloud provider 107.0.1+up0.2.10 on Rancher 2.12 where kube-vip < v0.9.1, right?
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As far as I know this has been seen definitely on:
x-ref:
It may... "may"... have also briefly presented itself on an older maybe 0.2.900 version... but I can't find breadcrumbs at the moment that would lead me to a definitive "yes" on that...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The
bugfrom calico is causing the issue, I will checkcalicocode to mention the affected guest cluster versions likeRKE2 v1.33.5 +rke2r1.