Add troubleshooting of guest cluster LB IP is not reachable #909

w13915984028 · 2025-10-28T15:42:43Z

Problem:

LB on guest cluster is not working as expected sometimes.

Solution:

Update the related root cause and workaround to document.

Related Issue(s):

harvester/harvester#8072

Test plan:

Additional documentation or context

After the review on main is done, I will copy them to v1.5, v1.6 branches, to save time of reviewing and updating, thanks.

Signed-off-by: Jian Wang <[email protected]>

github-actions · 2025-10-28T16:24:07Z

Name	Link
🔨 Latest commit	`c28b38c`
😎 Deploy Preview	https://6900edf345c8fa1134eb1777--harvester-preview.netlify.app

ihcsim

Content LGTM.

ihcsim · 2025-10-28T17:22:03Z

docs/rancher/cloud-provider.md

-Modifying the `IPAM` mode isn't allowed. You must create a new service if you intend to change the `IPAM` mode.
+- Modifying the `IPAM` mode isn't allowed. You must create a new service if you intend to change the `IPAM` mode.
+
+- Refer to [Guest Cluster Loadbalancer IP is not reachable](../troubleshooting/rancher.md#guest-cluster-loadbalancer-ip-is-not-reachable).


Can we state which version is affected by this? AIUI, it's cloud provider 107.0.1+up0.2.10 on Rancher 2.12 where kube-vip < v0.9.1, right?

As far as I know this has been seen definitely on:

harvester-cloud-provider:0.2.1000

harvester-cloud-provider:0.2.1100

x-ref:

[BUG] RKE2 Pool & DHCP Guest Cluster LBs Not Working on Harvester v1.6-b192e41f-head + Rancher v2.12.2-alpha4 harvester#9183 (comment)

It may... "may"... have also briefly presented itself on an older maybe 0.2.900 version... but I can't find breadcrumbs at the moment that would lead me to a definitive "yes" on that...

The bug from calico is causing the issue, I will check calico code to mention the affected guest cluster versions like RKE2 v1.33.5 +rke2r1.

martindekov

I don't have much context so no intuition on what is what I think reviewing with that lens is helpful If I don't know what is going on and try to fix it. Added comments mainly around gray areas where I was asking myself - where should we run the command? where would this pop up to edit it? in places where it wasn't clear to me.

Overall LGTM though, thanks for the work Jian! Will do a follow up once you respond.

martindekov · 2025-10-29T09:01:33Z

docs/troubleshooting/rancher.md

+
+### Root Cause
+
+In below example, the guest cluster node(Harvester VM)'s IP is `10.115.1.46`, and later a new Loadbalancer IP `10.115.6.200` is added to a new interface like `vip-fd8c28ce (@enp1s0)`. However, the Loadbalancer IP is taken over by the `calio` controller. It caused the Loadbalancer IP is not reachable.


At the end of it, the command below where we run ip -d link show dev vxlan.calico can we explain in last sentence from what context the command should be executed? Harvester VM IP is 10.115.1.46 do you get a shell session inside before running the ip? Load balancer IP is unreachable so I'd suggest elaborating:

Suggested change

In below example, the guest cluster node(Harvester VM)'s IP is `10.115.1.46`, and later a new Loadbalancer IP `10.115.6.200` is added to a new interface like `vip-fd8c28ce (@enp1s0)`. However, the Loadbalancer IP is taken over by the `calio` controller. It caused the Loadbalancer IP is not reachable.

In below example, the guest cluster node(Harvester VM)'s IP is `10.115.1.46`, and later a new Loadbalancer IP `10.115.6.200` is added to a new interface like `vip-fd8c28ce (@enp1s0)`. However, the Loadbalancer IP is taken over by the `calio` controller. It caused the Loadbalancer IP is not reachable. Through a shell session using the original IP run the following.

This might not be right, but context from which ip would be valid would make things clear

martindekov · 2025-10-29T09:02:22Z

docs/troubleshooting/rancher.md

+
+For exsting clusters, run command `$ kubectl edit installation`, go to `.spec.calicoNetwork.nodeAddressAutodetectionV4`, remove any existing line like `firstFound: true`, add new line `skipInterface: vip.*` and save.
+
+Wait a while, the daemonset `calico-system/calico-node` is rolling updated and then the related PODs take the node IP for VXLAN to use.


Similar to above, from whet context would I run the ip below. Small sentence at the end would be helpful

martindekov · 2025-10-29T09:04:25Z

docs/troubleshooting/rancher.md

+The Loadbalancer IP is reachable again.
+
+
+When creating new clusters on `Rancher Manager`, click **Add-on: Calico**, add following two lines to `.installation.calicoNetwork`. The `calico` controller won't take over the Loadbalancer IP accidentally.


Would a YAML window pop up when clicking on Add-on: Calico in which yaml we edit the below suggestion?

Suggested change

When creating new clusters on `Rancher Manager`, click **Add-on: Calico**, add following two lines to `.installation.calicoNetwork`. The `calico` controller won't take over the Loadbalancer IP accidentally.

When creating new clusters on `Rancher Manager`, click **Add-on: Calico**, YAML configuration window will appear. Add following two lines to `.installation.calicoNetwork`. The `calico` controller won't take over the Loadbalancer IP accidentally.

Not sure whether YAML config window will appear or we edit through kubectl client object against the k8s so adding a short sentence would be helpful IMHO.

derhornspieler · 2025-11-13T15:49:09Z

Having a similar issue here to add another use case for study: harvester/harvester#9479. I can't get the LB on Harvester to assign a different CIDR, Virtual Machine Network or VLAN. It always defaults to the Cluster Management network.

github-actions bot requested review from akashraj4261, dariavladykina and jillian-maroket October 28, 2025 15:42

github-actions bot assigned w13915984028 Oct 28, 2025

w13915984028 mentioned this pull request Oct 28, 2025

[BUG] Load Balancer can not be reached after upgrade Rancher from v2.10.3 to v2.11.0 harvester/harvester#8072

Open

w13915984028 force-pushed the doc8072 branch from 499f41f to cb1ef43 Compare October 28, 2025 15:48

w13915984028 requested review from FrankYang0529, irishgordo, martindekov and noahgildersleeve October 28, 2025 15:49

w13915984028 force-pushed the doc8072 branch 4 times, most recently from e3df4c3 to da9a827 Compare October 28, 2025 15:55

Add troubleshooting of guest cluster LB IP is not reachable

c28b38c

Signed-off-by: Jian Wang <[email protected]>

w13915984028 force-pushed the doc8072 branch from da9a827 to c28b38c Compare October 28, 2025 15:59

ihcsim reviewed Oct 28, 2025

View reviewed changes

martindekov reviewed Oct 29, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add troubleshooting of guest cluster LB IP is not reachable #909

Add troubleshooting of guest cluster LB IP is not reachable #909

Uh oh!

w13915984028 commented Oct 28, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Oct 28, 2025

Uh oh!

ihcsim left a comment

Uh oh!

ihcsim Oct 28, 2025

Uh oh!

irishgordo Oct 28, 2025 •

edited

Loading

Uh oh!

w13915984028 Oct 29, 2025

Uh oh!

martindekov left a comment

Uh oh!

martindekov Oct 29, 2025

Uh oh!

martindekov Oct 29, 2025

Uh oh!

martindekov Oct 29, 2025

Uh oh!

derhornspieler commented Nov 13, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants


		### Root Cause

		In below example, the guest cluster node(Harvester VM)'s IP is `10.115.1.46`, and later a new Loadbalancer IP `10.115.6.200` is added to a new interface like `vip-fd8c28ce (@enp1s0)`. However, the Loadbalancer IP is taken over by the `calio` controller. It caused the Loadbalancer IP is not reachable.


		For exsting clusters, run command `$ kubectl edit installation`, go to `.spec.calicoNetwork.nodeAddressAutodetectionV4`, remove any existing line like `firstFound: true`, add new line `skipInterface: vip.*` and save.

		Wait a while, the daemonset `calico-system/calico-node` is rolling updated and then the related PODs take the node IP for VXLAN to use.

		The Loadbalancer IP is reachable again.


		When creating new clusters on `Rancher Manager`, click Add-on: Calico, add following two lines to `.installation.calicoNetwork`. The `calico` controller won't take over the Loadbalancer IP accidentally.

Add troubleshooting of guest cluster LB IP is not reachable #909

Are you sure you want to change the base?

Add troubleshooting of guest cluster LB IP is not reachable #909

Uh oh!

Conversation

w13915984028 commented Oct 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem:

Solution:

Related Issue(s):

Test plan:

Additional documentation or context

Uh oh!

github-actions bot commented Oct 28, 2025

Uh oh!

ihcsim left a comment

Choose a reason for hiding this comment

Uh oh!

ihcsim Oct 28, 2025

Choose a reason for hiding this comment

Uh oh!

irishgordo Oct 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

w13915984028 Oct 29, 2025

Choose a reason for hiding this comment

Uh oh!

martindekov left a comment

Choose a reason for hiding this comment

Uh oh!

martindekov Oct 29, 2025

Choose a reason for hiding this comment

Uh oh!

martindekov Oct 29, 2025

Choose a reason for hiding this comment

Uh oh!

martindekov Oct 29, 2025

Choose a reason for hiding this comment

Uh oh!

derhornspieler commented Nov 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

w13915984028 commented Oct 28, 2025 •

edited

Loading

irishgordo Oct 28, 2025 •

edited

Loading

derhornspieler commented Nov 13, 2025 •

edited

Loading