Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 3 additions & 1 deletion docs/rancher/cloud-provider.md
Original file line number Diff line number Diff line change
Expand Up @@ -393,7 +393,9 @@ Harvester's built-in load balancer offers both **DHCP** and **Pool** modes, and

:::note

Modifying the `IPAM` mode isn't allowed. You must create a new service if you intend to change the `IPAM` mode.
- Modifying the `IPAM` mode isn't allowed. You must create a new service if you intend to change the `IPAM` mode.

- Refer to [Guest Cluster Loadbalancer IP is not reachable](../troubleshooting/rancher.md#guest-cluster-loadbalancer-ip-is-not-reachable).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we state which version is affected by this? AIUI, it's cloud provider 107.0.1+up0.2.10 on Rancher 2.12 where kube-vip < v0.9.1, right?

Copy link
Contributor

@irishgordo irishgordo Oct 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I know this has been seen definitely on:

  • harvester-cloud-provider:0.2.1000
  • harvester-cloud-provider:0.2.1100

x-ref:

It may... "may"... have also briefly presented itself on an older maybe 0.2.900 version... but I can't find breadcrumbs at the moment that would lead me to a definitive "yes" on that...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The bug from calico is causing the issue, I will check calico code to mention the affected guest cluster versions like RKE2 v1.33.5 +rke2r1.


:::

Expand Down
63 changes: 63 additions & 0 deletions docs/troubleshooting/rancher.md
Original file line number Diff line number Diff line change
Expand Up @@ -83,3 +83,66 @@ Related issues:

- Harvester: [#7105](https://github.com/harvester/harvester/issues/7105) and [#7284](https://github.com/harvester/harvester/issues/7284)
- Rancher: [#45628](https://github.com/rancher/rancher/issues/45628)

## Guest Cluster Loadbalancer IP is not reachable

### Issue Description

1. Create a new [guest cluster](../rancher/node/rke2-cluster.md#create-rke2-kubernetes-cluster) with the default `Container Network: Calico` and the default `Cloud Provider: Harvester`.

1. Deploy `nginx` on this new guest cluster via command `kubectl apply -f https://k8s.io/examples/application/deployment.yaml`.

1. Create a [Load Balancer](../rancher/cloud-provider.md#load-balancer-support), which selects backend nginx.

1. The service is ready with allocated IP from DHCP server or IPPool, but when clicking the link the page might fail to show.

![](/img/v1.5/troubleshooting/gc-lb-is-not-reachable.png)

### Root Cause

In below example, the guest cluster node(Harvester VM)'s IP is `10.115.1.46`, and later a new Loadbalancer IP `10.115.6.200` is added to a new interface like `vip-fd8c28ce (@enp1s0)`. However, the Loadbalancer IP is taken over by the `calio` controller. It caused the Loadbalancer IP is not reachable.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At the end of it, the command below where we run ip -d link show dev vxlan.calico can we explain in last sentence from what context the command should be executed? Harvester VM IP is 10.115.1.46 do you get a shell session inside before running the ip? Load balancer IP is unreachable so I'd suggest elaborating:

Suggested change
In below example, the guest cluster node(Harvester VM)'s IP is `10.115.1.46`, and later a new Loadbalancer IP `10.115.6.200` is added to a new interface like `vip-fd8c28ce (@enp1s0)`. However, the Loadbalancer IP is taken over by the `calio` controller. It caused the Loadbalancer IP is not reachable.
In below example, the guest cluster node(Harvester VM)'s IP is `10.115.1.46`, and later a new Loadbalancer IP `10.115.6.200` is added to a new interface like `vip-fd8c28ce (@enp1s0)`. However, the Loadbalancer IP is taken over by the `calio` controller. It caused the Loadbalancer IP is not reachable. Through a shell session using the original IP run the following.

This might not be right, but context from which ip would be valid would make things clear


```sh
$ ip -d link show dev vxlan.calico
44: vxlan.calico: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
link/ether 66:a7:41:00:1d:ba brd ff:ff:ff:ff:ff:ff promiscuity 0 allmulti 0 minmtu 68 maxmtu 65535
info: Using default fan map value (33)
vxlan id 4096 local 10.115.6.200 dev vip-8a928fa0 srcport 0 0 dstport 4789 nolearning ttl auto ageing 300 udpcsum noudp6zerocsumtx noudp6zerocsumrx addrgenmode eui64 numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535 tso_max_size 65536 tso_max_segs 65535 gro_max_size 65536

The IP 10.115.6.200 is from the vip-* interface.

```

### Workaround

For exsting clusters, run command `$ kubectl edit installation`, go to `.spec.calicoNetwork.nodeAddressAutodetectionV4`, remove any existing line like `firstFound: true`, add new line `skipInterface: vip.*` and save.

Wait a while, the daemonset `calico-system/calico-node` is rolling updated and then the related PODs take the node IP for VXLAN to use.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar to above, from whet context would I run the ip below. Small sentence at the end would be helpful


```sh
$ ip -d link show dev vxlan.calico
45: vxlan.calico: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
link/ether 66:a7:41:00:1d:ba brd ff:ff:ff:ff:ff:ff promiscuity 0 allmulti 0 minmtu 68 maxmtu 65535
info: Using default fan map value (33)
vxlan id 4096 local 10.115.1.46 dev enp1s0 srcport 0 0 dstport 4789 nolearning ttl auto ageing 300 udpcsum noudp6zerocsumtx noudp6zerocsumrx addrgenmode eui64 numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535 tso_max_size 65536 tso_max_segs 65535 gro_max_size 65536

The IP 10.115.1.46 is from the node main nic enp1s0 interface.
```

The Loadbalancer IP is reachable again.


When creating new clusters on `Rancher Manager`, click **Add-on: Calico**, add following two lines to `.installation.calicoNetwork`. The `calico` controller won't take over the Loadbalancer IP accidentally.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would a YAML window pop up when clicking on Add-on: Calico in which yaml we edit the below suggestion?

Suggested change
When creating new clusters on `Rancher Manager`, click **Add-on: Calico**, add following two lines to `.installation.calicoNetwork`. The `calico` controller won't take over the Loadbalancer IP accidentally.
When creating new clusters on `Rancher Manager`, click **Add-on: Calico**, YAML configuration window will appear. Add following two lines to `.installation.calicoNetwork`. The `calico` controller won't take over the Loadbalancer IP accidentally.

Not sure whether YAML config window will appear or we edit through kubectl client object against the k8s so adding a short sentence would be helpful IMHO.


```yaml
installation:
backend: VXLAN
calicoNetwork:
bgp: Disabled
nodeAddressAutodetectionV4: // add this line
skipInterface: vip.* // add this line
```

### Related Issue

https://github.com/harvester/harvester/issues/8072
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.