Skip to content

[BUG] etcd cluster scale-out pod crash #9941

@JashBook

Description

@JashBook

Describe the bug
A clear and concise description of what the bug is.

kbcli version
Kubernetes: v1.30.4-vke.4
KubeBlocks: 1.0.2-beta.20
kbcli: 1.0.2-beta.0
➜  ~ 
➜  ~ helm get notes -n kb-system kb-addon-etcd 
NOTES:
Release Information:
  Commit ID: "f292a950ec2cab2263b487025ac516518484dcee"
  Commit Time: "2025-10-21 17:06:17 +0800"
  Release Branch: "v1.0.2-beta.20"
  Release Time:  "2025-12-16 17:53:21 +0800"
  Enterprise: "false"

To Reproduce
Steps to reproduce the behavior:

  1. create cluster
apiVersion: apps.kubeblocks.io/v1
kind: Cluster
metadata:
  name: etcd-amqvfn
  namespace: default
spec:
  terminationPolicy: WipeOut
  componentSpecs:
    - name: etcd
      componentDef: etcd-3-1.0.2
      tls: false
      replicas: 3
      resources:
        requests:
          cpu: 100m
          memory: 0.5Gi
        limits:
          cpu: 100m
          memory: 0.5Gi
      volumeClaimTemplates:
        - name: data
          spec:
            storageClassName: 
            accessModes:
              - ReadWriteOnce
            resources:
              requests:
                storage: 20Gi
  services:
    - name: client
      serviceName: client
      spec:
        type: NodePort
        ports:
          - port: 2379
            targetPort: 2379
      componentSelector: etcd
      roleSelector: leader
  1. scale-out
kbcli cluster scale-out etcd-amqvfn --auto-approve --force=true --components etcd --replicas 2
  1. See error
kubectl get cluster etcd-amqvfn 
NAME          CLUSTER-DEFINITION   TERMINATION-POLICY   STATUS     AGE
etcd-amqvfn                        WipeOut              Updating   19m
➜  ~ 
➜  ~ kubectl get cmp -l app.kubernetes.io/instance=etcd-amqvfn        
NAME               DEFINITION     SERVICE-VERSION   STATUS     AGE
etcd-amqvfn-etcd   etcd-3-1.0.2   3.6.1             Updating   19m
➜  ~ 
➜  ~ kubectl get pod -l app.kubernetes.io/instance=etcd-amqvfn   
NAME                 READY   STATUS             RESTARTS        AGE
etcd-amqvfn-etcd-0   2/2     Running            0               11m
etcd-amqvfn-etcd-1   2/2     Running            0               11m
etcd-amqvfn-etcd-2   2/2     Running            0               10m
etcd-amqvfn-etcd-3   1/2     CrashLoopBackOff   6 (3m25s ago)   9m36s
etcd-amqvfn-etcd-4   2/2     Running            1 (3m8s ago)    3m24s

logs pod

kubectl logs etcd-amqvfn-etcd-3 --tail 10
Defaulted container "etcd" out of: etcd, kbagent, inject-bash (init), init-kbagent (init), kbagent-worker (init)
{"level":"info","ts":"2025-12-17T10:04:21.494387Z","logger":"bbolt","caller":"[email protected]/db.go:321","msg":"Opening bbolt db (/var/run/etcd/default.etcd/member/snap/db) successfully"}
{"level":"info","ts":"2025-12-17T10:04:21.494411Z","caller":"storage/backend.go:80","msg":"opened backend db","path":"/var/run/etcd/default.etcd/member/snap/db","took":"1.042172ms"}
{"level":"info","ts":"2025-12-17T10:04:21.494444Z","caller":"etcdserver/bootstrap.go:220","msg":"restore consistentIndex","index":0}
{"level":"info","ts":"2025-12-17T10:04:21.494462Z","caller":"etcdserver/bootstrap.go:232","msg":"recovered v3 backend","backend-size-bytes":20480,"backend-size":"20 kB","backend-size-in-use-bytes":12288,"backend-size-in-use":"12 kB"}
{"level":"warn","ts":"2025-12-17T10:04:21.494474Z","caller":"schema/schema.go:39","msg":"Failed to detect storage schema version. Please wait till wal snapshot before upgrading cluster."}
{"level":"info","ts":"2025-12-17T10:04:21.494501Z","caller":"etcdserver/bootstrap.go:94","msg":"bootstrapping cluster"}
{"level":"error","ts":"2025-12-17T10:04:21.496439Z","caller":"etcdserver/server.go:309","msg":"bootstrap failed","error":"error validating peerURLs {ClusterID:2d930058afce3800 Members:[&{ID:313ec3b8e990ea02 RaftAttributes:{PeerURLs:[http://etcd-amqvfn-etcd-2.etcd-amqvfn-etcd-headless.default.svc.cluster.local:2380] IsLearner:false} Attributes:{Name:etcd-amqvfn-etcd-2 ClientURLs:[http://etcd-amqvfn-etcd-2.etcd-amqvfn-etcd-headless.default.svc.cluster.local:2379]}} &{ID:3b11a408da3940c4 RaftAttributes:{PeerURLs:[http://etcd-amqvfn-etcd-3.etcd-amqvfn-etcd-headless.default.svc.cluster.local:2380] IsLearner:false} Attributes:{Name: ClientURLs:[]}} &{ID:b4761fbba21b965e RaftAttributes:{PeerURLs:[http://etcd-amqvfn-etcd-0.etcd-amqvfn-etcd-headless.default.svc.cluster.local:2380] IsLearner:false} Attributes:{Name:etcd-amqvfn-etcd-0 ClientURLs:[http://etcd-amqvfn-etcd-0.etcd-amqvfn-etcd-headless.default.svc.cluster.local:2379]}} &{ID:bda0fa292b260862 RaftAttributes:{PeerURLs:[http://etcd-amqvfn-etcd-1.etcd-amqvfn-etcd-headless.default.svc.cluster.local:2380] IsLearner:false} Attributes:{Name:etcd-amqvfn-etcd-1 ClientURLs:[http://etcd-amqvfn-etcd-1.etcd-amqvfn-etcd-headless.default.svc.cluster.local:2379]}}] RemovedMemberIDs:[]}: member count is unequal","stacktrace":"go.etcd.io/etcd/server/v3/etcdserver.NewServer\n\tgo.etcd.io/etcd/server/v3/etcdserver/server.go:309\ngo.etcd.io/etcd/server/v3/embed.StartEtcd\n\tgo.etcd.io/etcd/server/v3/embed/etcd.go:262\ngo.etcd.io/etcd/server/v3/etcdmain.startEtcd\n\tgo.etcd.io/etcd/server/v3/etcdmain/etcd.go:207\ngo.etcd.io/etcd/server/v3/etcdmain.startEtcdOrProxyV2\n\tgo.etcd.io/etcd/server/v3/etcdmain/etcd.go:114\ngo.etcd.io/etcd/server/v3/etcdmain.Main\n\tgo.etcd.io/etcd/server/v3/etcdmain/main.go:40\nmain.main\n\tgo.etcd.io/etcd/server/v3/main.go:31\nruntime.main\n\truntime/proc.go:272"}
{"level":"info","ts":"2025-12-17T10:04:21.496476Z","caller":"embed/etcd.go:426","msg":"closing etcd server","name":"etcd-amqvfn-etcd-3","data-dir":"/var/run/etcd/default.etcd","advertise-peer-urls":["http://etcd-amqvfn-etcd-3.etcd-amqvfn-etcd-headless.default.svc.cluster.local:2380"],"advertise-client-urls":["http://etcd-amqvfn-etcd-3.etcd-amqvfn-etcd-headless.default.svc.cluster.local:2379"]}
{"level":"info","ts":"2025-12-17T10:04:21.496507Z","caller":"embed/etcd.go:428","msg":"closed etcd server","name":"etcd-amqvfn-etcd-3","data-dir":"/var/run/etcd/default.etcd","advertise-peer-urls":["http://etcd-amqvfn-etcd-3.etcd-amqvfn-etcd-headless.default.svc.cluster.local:2380"],"advertise-client-urls":["http://etcd-amqvfn-etcd-3.etcd-amqvfn-etcd-headless.default.svc.cluster.local:2379"]}
{"level":"fatal","ts":"2025-12-17T10:04:21.496522Z","caller":"etcdmain/etcd.go:183","msg":"discovery failed","error":"error validating peerURLs {ClusterID:2d930058afce3800 Members:[&{ID:313ec3b8e990ea02 RaftAttributes:{PeerURLs:[http://etcd-amqvfn-etcd-2.etcd-amqvfn-etcd-headless.default.svc.cluster.local:2380] IsLearner:false} Attributes:{Name:etcd-amqvfn-etcd-2 ClientURLs:[http://etcd-amqvfn-etcd-2.etcd-amqvfn-etcd-headless.default.svc.cluster.local:2379]}} &{ID:3b11a408da3940c4 RaftAttributes:{PeerURLs:[http://etcd-amqvfn-etcd-3.etcd-amqvfn-etcd-headless.default.svc.cluster.local:2380] IsLearner:false} Attributes:{Name: ClientURLs:[]}} &{ID:b4761fbba21b965e RaftAttributes:{PeerURLs:[http://etcd-amqvfn-etcd-0.etcd-amqvfn-etcd-headless.default.svc.cluster.local:2380] IsLearner:false} Attributes:{Name:etcd-amqvfn-etcd-0 ClientURLs:[http://etcd-amqvfn-etcd-0.etcd-amqvfn-etcd-headless.default.svc.cluster.local:2379]}} &{ID:bda0fa292b260862 RaftAttributes:{PeerURLs:[http://etcd-amqvfn-etcd-1.etcd-amqvfn-etcd-headless.default.svc.cluster.local:2380] IsLearner:false} Attributes:{Name:etcd-amqvfn-etcd-1 ClientURLs:[http://etcd-amqvfn-etcd-1.etcd-amqvfn-etcd-headless.default.svc.cluster.local:2379]}}] RemovedMemberIDs:[]}: member count is unequal","stacktrace":"go.etcd.io/etcd/server/v3/etcdmain.startEtcdOrProxyV2\n\tgo.etcd.io/etcd/server/v3/etcdmain/etcd.go:183\ngo.etcd.io/etcd/server/v3/etcdmain.Main\n\tgo.etcd.io/etcd/server/v3/etcdmain/main.go:40\nmain.main\n\tgo.etcd.io/etcd/server/v3/main.go:31\nruntime.main\n\truntime/proc.go:272"}
➜  ~ 
➜  ~ kubectl logs etcd-amqvfn-etcd-4 --tail 10
Defaulted container "etcd" out of: etcd, kbagent, inject-bash (init), init-kbagent (init), kbagent-worker (init)
{"level":"warn","ts":"2025-12-17T10:08:09.673422Z","caller":"rafthttp/probing_status.go:68","msg":"prober detected unhealthy status","round-tripper-name":"ROUND_TRIPPER_SNAPSHOT","remote-peer-id":"3b11a408da3940c4","rtt":"0s","error":"dial tcp 192.168.0.144:2380: connect: connection refused"}
{"level":"warn","ts":"2025-12-17T10:08:09.673430Z","caller":"rafthttp/probing_status.go:68","msg":"prober detected unhealthy status","round-tripper-name":"ROUND_TRIPPER_RAFT_MESSAGE","remote-peer-id":"3b11a408da3940c4","rtt":"0s","error":"dial tcp 192.168.0.144:2380: connect: connection refused"}
{"level":"error","ts":"2025-12-17T10:08:10.355584Z","caller":"etcdserver/server.go:2074","msg":"Validation on configuration change failed","shouldApplyV3":true,"error":"membership: peerURL exists","stacktrace":"go.etcd.io/etcd/server/v3/etcdserver.(*EtcdServer).applyConfChange\n\tgo.etcd.io/etcd/server/v3/etcdserver/server.go:2074\ngo.etcd.io/etcd/server/v3/etcdserver.(*EtcdServer).apply\n\tgo.etcd.io/etcd/server/v3/etcdserver/server.go:1902\ngo.etcd.io/etcd/server/v3/etcdserver.(*EtcdServer).applyEntries\n\tgo.etcd.io/etcd/server/v3/etcdserver/server.go:1194\ngo.etcd.io/etcd/server/v3/etcdserver.(*EtcdServer).applyAll\n\tgo.etcd.io/etcd/server/v3/etcdserver/server.go:979\ngo.etcd.io/etcd/server/v3/etcdserver.(*EtcdServer).run.func6\n\tgo.etcd.io/etcd/server/v3/etcdserver/server.go:855\ngo.etcd.io/etcd/pkg/v3/schedule.job.Do\n\tgo.etcd.io/etcd/pkg/[email protected]/schedule/schedule.go:41\ngo.etcd.io/etcd/pkg/v3/schedule.(*fifo).executeJob\n\tgo.etcd.io/etcd/pkg/[email protected]/schedule/schedule.go:206\ngo.etcd.io/etcd/pkg/v3/schedule.(*fifo).run\n\tgo.etcd.io/etcd/pkg/[email protected]/schedule/schedule.go:187"}
{"level":"info","ts":"2025-12-17T10:08:10.355640Z","logger":"raft","caller":"[email protected]/raft.go:1981","msg":"4abaf732648edde0 switched to configuration voters=(3548488755374516738 4256363480769708228 5384888100282359264 13003615864817948254 13664196324166600802)"}
{"level":"error","ts":"2025-12-17T10:08:10.369042Z","caller":"etcdserver/server.go:2074","msg":"Validation on configuration change failed","shouldApplyV3":true,"error":"membership: peerURL exists","stacktrace":"go.etcd.io/etcd/server/v3/etcdserver.(*EtcdServer).applyConfChange\n\tgo.etcd.io/etcd/server/v3/etcdserver/server.go:2074\ngo.etcd.io/etcd/server/v3/etcdserver.(*EtcdServer).apply\n\tgo.etcd.io/etcd/server/v3/etcdserver/server.go:1902\ngo.etcd.io/etcd/server/v3/etcdserver.(*EtcdServer).applyEntries\n\tgo.etcd.io/etcd/server/v3/etcdserver/server.go:1194\ngo.etcd.io/etcd/server/v3/etcdserver.(*EtcdServer).applyAll\n\tgo.etcd.io/etcd/server/v3/etcdserver/server.go:979\ngo.etcd.io/etcd/server/v3/etcdserver.(*EtcdServer).run.func6\n\tgo.etcd.io/etcd/server/v3/etcdserver/server.go:855\ngo.etcd.io/etcd/pkg/v3/schedule.job.Do\n\tgo.etcd.io/etcd/pkg/[email protected]/schedule/schedule.go:41\ngo.etcd.io/etcd/pkg/v3/schedule.(*fifo).executeJob\n\tgo.etcd.io/etcd/pkg/[email protected]/schedule/schedule.go:206\ngo.etcd.io/etcd/pkg/v3/schedule.(*fifo).run\n\tgo.etcd.io/etcd/pkg/[email protected]/schedule/schedule.go:187"}
{"level":"info","ts":"2025-12-17T10:08:10.369082Z","logger":"raft","caller":"[email protected]/raft.go:1981","msg":"4abaf732648edde0 switched to configuration voters=(3548488755374516738 4256363480769708228 5384888100282359264 13003615864817948254 13664196324166600802)"}
{"level":"error","ts":"2025-12-17T10:08:11.421602Z","caller":"etcdserver/server.go:2074","msg":"Validation on configuration change failed","shouldApplyV3":true,"error":"membership: peerURL exists","stacktrace":"go.etcd.io/etcd/server/v3/etcdserver.(*EtcdServer).applyConfChange\n\tgo.etcd.io/etcd/server/v3/etcdserver/server.go:2074\ngo.etcd.io/etcd/server/v3/etcdserver.(*EtcdServer).apply\n\tgo.etcd.io/etcd/server/v3/etcdserver/server.go:1902\ngo.etcd.io/etcd/server/v3/etcdserver.(*EtcdServer).applyEntries\n\tgo.etcd.io/etcd/server/v3/etcdserver/server.go:1194\ngo.etcd.io/etcd/server/v3/etcdserver.(*EtcdServer).applyAll\n\tgo.etcd.io/etcd/server/v3/etcdserver/server.go:979\ngo.etcd.io/etcd/server/v3/etcdserver.(*EtcdServer).run.func6\n\tgo.etcd.io/etcd/server/v3/etcdserver/server.go:855\ngo.etcd.io/etcd/pkg/v3/schedule.job.Do\n\tgo.etcd.io/etcd/pkg/[email protected]/schedule/schedule.go:41\ngo.etcd.io/etcd/pkg/v3/schedule.(*fifo).executeJob\n\tgo.etcd.io/etcd/pkg/[email protected]/schedule/schedule.go:206\ngo.etcd.io/etcd/pkg/v3/schedule.(*fifo).run\n\tgo.etcd.io/etcd/pkg/[email protected]/schedule/schedule.go:187"}
{"level":"info","ts":"2025-12-17T10:08:11.421655Z","logger":"raft","caller":"[email protected]/raft.go:1981","msg":"4abaf732648edde0 switched to configuration voters=(3548488755374516738 4256363480769708228 5384888100282359264 13003615864817948254 13664196324166600802)"}
{"level":"error","ts":"2025-12-17T10:08:11.434415Z","caller":"etcdserver/server.go:2074","msg":"Validation on configuration change failed","shouldApplyV3":true,"error":"membership: peerURL exists","stacktrace":"go.etcd.io/etcd/server/v3/etcdserver.(*EtcdServer).applyConfChange\n\tgo.etcd.io/etcd/server/v3/etcdserver/server.go:2074\ngo.etcd.io/etcd/server/v3/etcdserver.(*EtcdServer).apply\n\tgo.etcd.io/etcd/server/v3/etcdserver/server.go:1902\ngo.etcd.io/etcd/server/v3/etcdserver.(*EtcdServer).applyEntries\n\tgo.etcd.io/etcd/server/v3/etcdserver/server.go:1194\ngo.etcd.io/etcd/server/v3/etcdserver.(*EtcdServer).applyAll\n\tgo.etcd.io/etcd/server/v3/etcdserver/server.go:979\ngo.etcd.io/etcd/server/v3/etcdserver.(*EtcdServer).run.func6\n\tgo.etcd.io/etcd/server/v3/etcdserver/server.go:855\ngo.etcd.io/etcd/pkg/v3/schedule.job.Do\n\tgo.etcd.io/etcd/pkg/[email protected]/schedule/schedule.go:41\ngo.etcd.io/etcd/pkg/v3/schedule.(*fifo).executeJob\n\tgo.etcd.io/etcd/pkg/[email protected]/schedule/schedule.go:206\ngo.etcd.io/etcd/pkg/v3/schedule.(*fifo).run\n\tgo.etcd.io/etcd/pkg/[email protected]/schedule/schedule.go:187"}
{"level":"info","ts":"2025-12-17T10:08:11.434470Z","logger":"raft","caller":"[email protected]/raft.go:1981","msg":"4abaf732648edde0 switched to configuration voters=(3548488755374516738 4256363480769708228 5384888100282359264 13003615864817948254 13664196324166600802)"}

Expected behavior
A clear and concise description of what you expected to happen.

Screenshots
If applicable, add screenshots to help explain your problem.

Desktop (please complete the following information):

  • OS: [e.g. iOS]
  • Browser [e.g. chrome, safari]
  • Version [e.g. 22]

Additional context
Add any other context about the problem here.

Metadata

Metadata

Labels

kind/bugSomething isn't working

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions