-
-
Notifications
You must be signed in to change notification settings - Fork 62
Description
Description
Hi, I've been using this module without issue but i recently tried to disable kube-proxy replacement for my staging cluster in order to run Istio, bricking it in the process.
After completely tearing down the cluster and trying to bootstrap again I ran into 2 major issues. The first was 403's on several registries which seems to have been fixed by self hosting mirrors.
After that when destroying and trying to bootstrap once more, the 2nd node that gets created gets stuck in learner. Even after enabling kube-proxy replacement again:
k8s-staging-control-2 etcd Running Fail 10m16s ago Health check failed: etcdserver: rpc not supported for learner
talosctl etcd members -n k8s-staging-control-1
NODE ID HOSTNAME PEER URLS CLIENT URLS LEARNER
k8s-staging-control-1 8a94def43f7181ef k8s-staging-control-2 https://10.0.64.2:2380 https://10.0.64.2:2379 true
k8s-staging-control-1 c555db3b0993326f k8s-staging-control-1 https://10.0.64.1:2380 https://10.0.64.1:2379 false
While the 3rd node out of 3 control plane nodes gets stuck in Preparing:
k8s-staging-control-3 etcd Preparing ? 14m31s ago Running pre state
And after some time fails with this error:
k8s-staging-control-3 etcd Failed ? 28s ago Failed to run pre stage: failed to build initial etcd cluster: failed to build cluster arguments: 2 error(s) occurred:
error adding member: etcdserver: too many learner members in cluster
timeout
Something I did notice is that both nodes receive log "bootstrap request received" and try to spin up etcd pretty much at the same exact time. I'm not experienced enough to know if that's expected
I've tried manually removing the learner, resetting/rebooting nodes, stripping down my config, etc, to no effect.
Expected Behavior
2nd etcd node should be promoted to voting member with 3rd node to follow.
Actual Behavior
2nd node gets stuck as learner, 3rd node etcd service fails with error adding member: etcdserver: too many learner members in cluster
Minimal Module Configuration
module "kubernetes" {
source = "hcloud-k8s/kubernetes/hcloud"
version = "3.22.0"
# cluster_delete_protection = false
cluster_name = "k8s-staging"
hcloud_token = var.hcloud_token
kube_api_hostname = "k8s-staging-control-1"
control_plane_nodepools = [
{ name = "control", type = "cx33", location = "fsn1", count = 3 }
]
cert_manager_enabled = true
longhorn_enabled = true
longhorn_default_storage_class = true
talos_image_extensions = ["siderolabs/tailscale"]
control_plane_config_patches = [
{
apiVersion = "v1alpha1"
kind = "ExtensionServiceConfig"
name = "tailscale"
environment = [
"TS_AUTHKEY=${var.tailscale_authkey}"
]
}
]
talos_registries = {
mirrors = {
// mirrors pointing to a tailscale IP
}
}
// required for Istio
# cilium_kube_proxy_replacement_enabled = false
cilium_socket_lb_host_namespace_only_enabled = true
cilium_helm_values = {
cni = {
exclusive = false
# chainingMode = "none" // prevents istio/cilium from infinitely overwriting the cni config file
}
# devices: "eth+" // set this manually so talos doesn't try to run eBPF datapath on the tailscale interface
# // recommended by cilium
# bpfClockProbe = true
# bpf = {
# distributedLRU = {
# enabled = true
# }
# mapDynamicSizeRatio = 0.08
# }
}
}Relevant Output
apologies for the spam, i'm not sure what information is most relevant
etcd logs for node 2:
k8s-staging-control-2: {"level":"warn","ts":"2026-02-19T22:50:50.612583Z","caller":"embed/config_logging.go:188","msg":"rejected connection on client endpoint","remote-addr":"[::1]:55098","server-name":"localhost","error":"EOF"}
k8s-staging-control-2: {"level":"error","ts":"2026-02-19T22:50:51.504366Z","caller":"etcdserver/server.go:2090","msg":"Validation on configuration change failed","shouldApplyV3":true,"error":"membership: too many learner members in cluster","stacktrace":"go.etcd.io/etcd/server/v3/etcdserver.(*EtcdServer).applyConfChange\n\tgo.etcd.io/etcd/server/v3/etcdserver/server.go:2090\ngo.etcd.io/etcd/server/v3/etcdserver.(*EtcdServer).apply\n\tgo.etcd.io/etcd/server/v3/etcdserver/server.go:1918\ngo.etcd.io/etcd/server/v3/etcdserver.(*EtcdServer).applyEntries\n\tgo.etcd.io/etcd/server/v3/etcdserver/server.go:1210\ngo.etcd.io/etcd/server/v3/etcdserver.(*EtcdServer).applyAll\n\tgo.etcd.io/etcd/server/v3/etcdserver/server.go:985\ngo.etcd.io/etcd/server/v3/etcdserver.(*EtcdServer).run.func6\n\tgo.etcd.io/etcd/server/v3/etcdserver/server.go:855\ngo.etcd.io/etcd/pkg/v3/schedule.job.Do\n\tgo.etcd.io/etcd/pkg/v3@v3.6.5/schedule/schedule.go:41\ngo.etcd.io/etcd/pkg/v3/schedule.(*fifo).executeJob\n\tgo.etcd.io/etcd/pkg/v3@v3.6.5/schedule/schedule.go:206\ngo.etcd.io/etcd/pkg/v3/schedule.(*fifo).run\n\tgo.etcd.io/etcd/pkg/v3@v3.6.5/schedule/schedule.go:187"}
k8s-staging-control-2: {"level":"info","ts":"2026-02-19T22:50:51.504468Z","logger":"raft","caller":"v3@v3.6.0/raft.go:1981","msg":"8a94def43f7181ef switched to configuration voters=(14219512445102404207) learners=(9985851414405022191)"}
etcd logs for node 1:
k8s-staging-control-1: {"level":"info","ts":"2026-02-19T22:52:34.208319Z","caller":"etcdserver/server.go:1768","msg":"applied a configuration change through raft","local-member-id":"c555db3b0993326f","raft-conf-change":"ConfChangeAddLearnerNode","raft-conf-change-node-id":"1ff6a2882d6adc2b"}
k8s-staging-control-1: {"level":"info","ts":"2026-02-19T22:52:34.673636Z","caller":"etcdserver/corrupt.go:278","msg":"starting compact hash check","local-member-id":"c555db3b0993326f","timeout":"7s"}
k8s-staging-control-1: {"level":"info","ts":"2026-02-19T22:52:34.673724Z","caller":"etcdserver/corrupt.go:294","msg":"finished compaction hash check","number-of-hashes-checked":0}
k8s-staging-control-1: {"level":"error","ts":"2026-02-19T22:52:37.575944Z","caller":"etcdserver/server.go:2090","msg":"Validation on configuration change failed","shouldApplyV3":true,"error":"membership: too many learner members in cluster","stacktrace":"go.etcd.io/etcd/server/v3/etcdserver.(*EtcdServer).applyConfChange\n\tgo.etcd.io/etcd/server/v3/etcdserver/server.go:2090\ngo.etcd.io/etcd/server/v3/etcdserver.(*EtcdServer).apply\n\tgo.etcd.io/etcd/server/v3/etcdserver/server.go:1918\ngo.etcd.io/etcd/server/v3/etcdserver.(*EtcdServer).applyEntries\n\tgo.etcd.io/etcd/server/v3/etcdserver/server.go:1210\ngo.etcd.io/etcd/server/v3/etcdserver.(*EtcdServer).applyAll\n\tgo.etcd.io/etcd/server/v3/etcdserver/server.go:985\ngo.etcd.io/etcd/server/v3/etcdserver.(*EtcdServer).run.func6\n\tgo.etcd.io/etcd/server/v3/etcdserver/server.go:855\ngo.etcd.io/etcd/pkg/v3/schedule.job.Do\n\tgo.etcd.io/etcd/pkg/v3@v3.6.5/schedule/schedule.go:41\ngo.etcd.io/etcd/pkg/v3/schedule.(*fifo).executeJob\n\tgo.etcd.io/etcd/pkg/v3@v3.6.5/schedule/schedule.go:206\ngo.etcd.io/etcd/pkg/v3/schedule.(*fifo).run\n\tgo.etcd.io/etcd/pkg/v3@v3.6.5/schedule/schedule.go:187"}
k8s-staging-control-1: {"level":"info","ts":"2026-02-19T22:52:37.576237Z","logger":"raft","caller":"v3@v3.6.0/raft.go:1981","msg":"c555db3b0993326f switched to configuration voters=(14219512445102404207) learners=(9985851414405022191)"}
k8s-staging-control-1: {"level":"info","ts":"2026-02-19T22:52:37.576329Z","caller":"etcdserver/server.go:1768","msg":"applied a configuration change through raft","local-member-id":"c555db3b0993326f","raft-conf-change":"ConfChangeAddLearnerNode","raft-conf-change-node-id":"cc8d0464717ad9b0"}
k8s-staging-control-1: {"level":"error","ts":"2026-02-19T22:52:41.523087Z","caller":"etcdserver/server.go:2090","msg":"Validation on configuration change failed","shouldApplyV3":true,"error":"membership: too many learner members in cluster","stacktrace":"go.etcd.io/etcd/server/v3/etcdserver.(*EtcdServer).applyConfChange\n\tgo.etcd.io/etcd/server/v3/etcdserver/server.go:2090\ngo.etcd.io/etcd/server/v3/etcdserver.(*EtcdServer).apply\n\tgo.etcd.io/etcd/server/v3/etcdserver/server.go:1918\ngo.etcd.io/etcd/server/v3/etcdserver.(*EtcdServer).applyEntries\n\tgo.etcd.io/etcd/server/v3/etcdserver/server.go:1210\ngo.etcd.io/etcd/server/v3/etcdserver.(*EtcdServer).applyAll\n\tgo.etcd.io/etcd/server/v3/etcdserver/server.go:985\ngo.etcd.io/etcd/server/v3/etcdserver.(*EtcdServer).run.func6\n\tgo.etcd.io/etcd/server/v3/etcdserver/server.go:855\ngo.etcd.io/etcd/pkg/v3/schedule.job.Do\n\tgo.etcd.io/etcd/pkg/v3@v3.6.5/schedule/schedule.go:41\ngo.etcd.io/etcd/pkg/v3/schedule.(*fifo).executeJob\n\tgo.etcd.io/etcd/pkg/v3@v3.6.5/schedule/schedule.go:206\ngo.etcd.io/etcd/pkg/v3/schedule.(*fifo).run\n\tgo.etcd.io/etcd/pkg/v3@v3.6.5/schedule/schedule.go:187"}
k8s-staging-control-1: {"level":"info","ts":"2026-02-19T22:52:41.523184Z","logger":"raft","caller":"v3@v3.6.0/raft.go:1981","msg":"c555db3b0993326f switched to configuration voters=(14219512445102404207) learners=(9985851414405022191)"}
k8s-staging-control-1: {"level":"info","ts":"2026-02-19T22:52:41.523232Z","caller":"etcdserver/server.go:1768","msg":"applied a configuration change through raft","local-member-id":"c555db3b0993326f","raft-conf-change":"ConfChangeAddLearnerNode","raft-conf-change-node-id":"2b1f3ea31b9f1e59"}
node 1 dmesg
k8s-staging-control-1: user: warning: [2026-02-19T22:28:19.556557486Z]: [talos] controller failed {"component": "controller-runtime", "controller": "k8s.ManifestApplyController", "error": "2 errors occurred:\n\t* error creating mapping for object talos.dev/v1alpha1/ServiceAccount/talos-cloud-controller-manager-talos-secrets: no matches for kind \"ServiceAccount\" in version \"talos.dev/v1alpha1\"\n\t* error creating mapping for object talos.dev/v1alpha1/ServiceAccount/talos-backup-secrets: no matches for kind \"ServiceAccount\" in version \"talos.dev/v1alpha1\"\n\n"}
k8s-staging-control-1: user: warning: [2026-02-19T22:28:22.845324486Z]: [talos] controller failed {"component": "controller-runtime", "controller": "k8s.KubeletStaticPodController", "error": "error refreshing pod status: error fetching pod status: Get \"https://127.0.0.1:10250/pods/?timeout=30s\": remote error: tls: internal error"}
k8s-staging-control-1: user: warning: [2026-02-19T22:28:38.397148486Z]: [talos] controller failed {"component": "controller-runtime", "controller": "k8s.KubeletStaticPodController", "error": "error refreshing pod status: error fetching pod status: Get \"https://127.0.0.1:10250/pods/?timeout=30s\": remote error: tls: internal error"}
k8s-staging-control-1: user: warning: [2026-02-19T22:28:50.394325486Z]: [talos] found private network for private vip (alias IP) {"component": "controller-runtime", "controller": "network.OperatorVIPConfigController", "vip": "10.0.64.126", "network_id": 11955308}
k8s-staging-control-1: user: warning: [2026-02-19T22:28:50.421065486Z]: [talos] found private network for private vip (alias IP) {"component": "controller-runtime", "controller": "network.OperatorVIPConfigController", "vip": "10.0.64.126", "network_id": 11955308}
k8s-staging-control-1: user: warning: [2026-02-19T22:28:50.578930486Z]: [talos] found private network for private vip (alias IP) {"component": "controller-runtime", "controller": "network.OperatorVIPConfigController", "vip": "10.0.64.126", "network_id": 11955308}
k8s-staging-control-1: user: warning: [2026-02-19T22:28:50.616463486Z]: [talos] found private network for private vip (alias IP) {"component": "controller-runtime", "controller": "network.OperatorVIPConfigController", "vip": "10.0.64.126", "network_id": 11955308}
k8s-staging-control-1: user: warning: [2026-02-19T22:28:52.022936486Z]: [talos] new diagnostic {"component": "controller-runtime", "controller": "runtime.DiagnosticsLoggerController", "id": "kubelet-csr", "message": "kubelet server certificate rotation is enabled, but CSR is not approved", "details": ["kubelet API error: remote error: tls: internal error", "pending CSRs: csr-x9z5c"], "url": "https://talos.dev/diagnostic/kubelet-csr"}
k8s-staging-control-1: user: warning: [2026-02-19T22:28:52.164848486Z]: [talos] found private network for private vip (alias IP) {"component": "controller-runtime", "controller": "network.OperatorVIPConfigController", "vip": "10.0.64.126", "network_id": 11955308}
k8s-staging-control-1: user: warning: [2026-02-19T22:28:52.199201486Z]: [talos] found private network for private vip (alias IP) {"component": "controller-runtime", "controller": "network.OperatorVIPConfigController", "vip": "10.0.64.126", "network_id": 11955308}
k8s-staging-control-1: user: warning: [2026-02-19T22:28:53.064244486Z]: [talos] created talos.dev/v1alpha1/ServiceAccount/talos-cloud-controller-manager-talos-secrets {"component": "controller-runtime", "controller": "k8s.ManifestApplyController"}
k8s-staging-control-1: user: warning: [2026-02-19T22:28:53.661009486Z]: [talos] controller failed {"component": "controller-runtime", "controller": "k8s.KubeletStaticPodController", "error": "error refreshing pod status: error fetching pod status: Get \"https://127.0.0.1:10250/pods/?timeout=30s\": remote error: tls: internal error"}
k8s-staging-control-1: kern: warning: [2026-02-19T22:28:53.779351486Z]: virtio_net virtio1 eth0: XDP request 5 queues but max is 1. XDP_TX and XDP_REDIRECT will operate in a slower locked tx mode.
SUBSYSTEM=virtio
DEVICE=+virtio:virtio1
k8s-staging-control-1: kern: info: [2026-02-19T22:28:55.160668486Z]: eth0: renamed from tmp926ca
k8s-staging-control-1: user: warning: [2026-02-19T22:28:55.209634486Z]: [talos] found private network for private vip (alias IP) {"component": "controller-runtime", "controller": "network.OperatorVIPConfigController", "vip": "10.0.64.126", "network_id": 11955308}
k8s-staging-control-1: user: warning: [2026-02-19T22:28:55.246877486Z]: [talos] found private network for private vip (alias IP) {"component": "controller-runtime", "controller": "network.OperatorVIPConfigController", "vip": "10.0.64.126", "network_id": 11955308}
k8s-staging-control-1: user: warning: [2026-02-19T22:28:56.777506486Z]: [talos] machine is running and ready {"component": "controller-runtime", "controller": "runtime.MachineStatusController"}
k8s-staging-control-1: kern: info: [2026-02-19T22:28:57.277683486Z]: eth0: renamed from tmpe8ca1
k8s-staging-control-1: user: warning: [2026-02-19T22:28:57.278080486Z]: [talos] found private network for private vip (alias IP) {"component": "controller-runtime", "controller": "network.OperatorVIPConfigController", "vip": "10.0.64.126", "network_id": 11955308}
k8s-staging-control-1: user: warning: [2026-02-19T22:28:57.321235486Z]: [talos] found private network for private vip (alias IP) {"component": "controller-runtime", "controller": "network.OperatorVIPConfigController", "vip": "10.0.64.126", "network_id": 11955308}
k8s-staging-control-1: kern: info: [2026-02-19T22:28:57.334619486Z]: eth0: renamed from tmpaf114Confirmation
- I checked existing issues, discussions, and the web for similar problems