Skip to content

[BUG] rabbitmq 4.2.1 start error: kubelet PreStopHook failed #9954

@JashBook

Description

@JashBook

Describe the bug
A clear and concise description of what the bug is.

kbcli version
Kubernetes: v1.30.4-vke.4
KubeBlocks: 0.9.6-beta.8
kbcli: 0.9.6-beta.0

helm get notes -n kb-system kb-addon-rabbitmq 
NOTES:
Release Information:
  Commit ID: "96e859b21a040bfcc0f6305b2c7c241202586523"
  Commit Time: "2025-12-11 09:57:29 +0800"
  Release Branch: "release-0.9"
  Release Time:  "2025-12-11 09:59:07 +0800"
  Enterprise: "false"

To Reproduce
Steps to reproduce the behavior:

  1. create cluster
apiVersion: apps.kubeblocks.io/v1alpha1
kind: Cluster
metadata:
  name: rabbitmq-snkjbr
  namespace: default
spec:
  terminationPolicy: DoNotTerminate
  componentSpecs:
    - name: rabbitmq
      componentDef: rabbitmq
      serviceVersion: 4.2.1
      replicas: 3
      resources:
        requests:
          cpu: 500m
          memory: 0.5Gi
        limits:
          cpu: 500m
          memory: 0.5Gi
      serviceAccountName: kb-rabbitmq-snkjbr
      volumeClaimTemplates:
        - name: data
          spec:
            storageClassName: 
            accessModes:
              - ReadWriteOnce
            resources:
              requests:
                storage: 20Gi
  1. stop -> start
kbcli cluster stop rabbitmq-snkjbr --auto-approve --force=true

kbcli cluster start rabbitmq-snkjbr --force=true
  1. See error
kubectl get cluster rabbitmq-snkjbr 
NAME              CLUSTER-DEFINITION   VERSION   TERMINATION-POLICY   STATUS     AGE
rabbitmq-snkjbr                                  DoNotTerminate       Updating   55m
➜  ~ 
➜  ~ kubectl get cmp rabbitmq-snkjbr-rabbitmq 
NAME                       DEFINITION   SERVICE-VERSION   STATUS     AGE
rabbitmq-snkjbr-rabbitmq   rabbitmq     4.2.1             Updating   55m
➜  ~ 
➜  ~ kubectl get ops
NAME                          TYPE    CLUSTER           STATUS    PROGRESS   AGE
rabbitmq-snkjbr-start-fbv86   Start   rabbitmq-snkjbr   Running   0/3        45m
➜  ~ 
➜  ~ kubectl get pod
NAME                         READY   STATUS    RESTARTS     AGE
rabbitmq-snkjbr-rabbitmq-0   1/2     Running   9 (4s ago)   45m

describe pod

kdp rabbitmq-snkjbr-rabbitmq-0
Name:             rabbitmq-snkjbr-rabbitmq-0
Namespace:        default
Priority:         0
Service Account:  kb-rabbitmq-snkjbr
Node:             192.168.0.124/192.168.0.124
Start Time:       Wed, 24 Dec 2025 12:15:06 +0800
Labels:           app.kubernetes.io/component=rabbitmq
                  app.kubernetes.io/instance=rabbitmq-snkjbr
                  app.kubernetes.io/managed-by=kubeblocks
                  app.kubernetes.io/name=rabbitmq
                  app.kubernetes.io/version=rabbitmq
                  apps.kubeblocks.io/cluster-uid=ff5368d7-cde3-4adf-bef4-12a006cc89c3
                  apps.kubeblocks.io/component-name=rabbitmq
                  apps.kubeblocks.io/pod-name=rabbitmq-snkjbr-rabbitmq-0
                  componentdefinition.kubeblocks.io/name=rabbitmq
                  controller-revision-hash=74cf7cb7fb
                  workloads.kubeblocks.io/instance=rabbitmq-snkjbr-rabbitmq
                  workloads.kubeblocks.io/managed-by=InstanceSet
Annotations:      apps.kubeblocks.io/component-replicas: 3
                  vke.volcengine.com/cello-pod-evict-policy: allow
Status:           Running
IP:               192.168.0.134
IPs:
  IP:           192.168.0.134
Controlled By:  InstanceSet/rabbitmq-snkjbr-rabbitmq
Init Containers:
  init-lorry:
    Container ID:  containerd://c139007f16fddf9b287946316cd70038bec4669cddf8234ab3b5ee61dcdc1e22
    Image:         apecloud-registry.cn-zhangjiakou.cr.aliyuncs.com/apecloud/kubeblocks-tools:0.9.6-beta.8
    Image ID:      apecloud-registry.cn-zhangjiakou.cr.aliyuncs.com/apecloud/kubeblocks-tools@sha256:8441217e75c043d8def3f1519ff3374cea39f91e488b09d08074c1e0ac90415f
    Port:          <none>
    Host Port:     <none>
    Command:
      cp
      -r
      /bin/lorry
      /config
      /bin/curl
      /kubeblocks/
    State:          Terminated
      Reason:       Completed
      Exit Code:    0
      Started:      Wed, 24 Dec 2025 12:15:10 +0800
      Finished:     Wed, 24 Dec 2025 12:15:10 +0800
    Ready:          True
    Restart Count:  0
    Limits:
      cpu:     0
      memory:  0
    Requests:
      cpu:     0
      memory:  0
    Environment Variables from:
      rabbitmq-snkjbr-rabbitmq-env  ConfigMap  Optional: false
    Environment:
      RABBITMQ_DEFAULT_USER:  <set to the key 'username' in secret 'rabbitmq-snkjbr-rabbitmq-account-root'>  Optional: false
      RABBITMQ_DEFAULT_PASS:  <set to the key 'password' in secret 'rabbitmq-snkjbr-rabbitmq-account-root'>  Optional: false
      KB_POD_NAME:            rabbitmq-snkjbr-rabbitmq-0 (v1:metadata.name)
      KB_POD_UID:              (v1:metadata.uid)
      KB_NAMESPACE:           default (v1:metadata.namespace)
      KB_SA_NAME:              (v1:spec.serviceAccountName)
      KB_NODENAME:             (v1:spec.nodeName)
      KB_HOST_IP:              (v1:status.hostIP)
      KB_POD_IP:               (v1:status.podIP)
      KB_POD_IPS:              (v1:status.podIPs)
      KB_HOSTIP:               (v1:status.hostIP)
      KB_PODIP:                (v1:status.podIP)
      KB_PODIPS:               (v1:status.podIPs)
      KB_POD_FQDN:            $(KB_POD_NAME).rabbitmq-snkjbr-rabbitmq-headless.$(KB_NAMESPACE).svc
    Mounts:
      /kubeblocks from kubeblocks (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-zwrw8 (ro)
Containers:
  rabbitmq:
    Container ID:  containerd://b4abfeae42e5101d7993ea7dc0e7b21824acf5373f5d6c776f7365cb973d1838
    Image:         apecloud-registry.cn-zhangjiakou.cr.aliyuncs.com/apecloud/rabbitmq:4.2.1-management
    Image ID:      apecloud-registry.cn-zhangjiakou.cr.aliyuncs.com/apecloud/rabbitmq@sha256:86fa2b761fc3a71a2b73090d7e45ad820f611fc829c1cb8cf087e09258fb65c1
    Ports:         4369/TCP, 5672/TCP, 15672/TCP, 25672/TCP, 15692/TCP
    Host Ports:    0/TCP, 0/TCP, 0/TCP, 0/TCP, 0/TCP
    Command:
      /bin/sh
      -c
      if [ ! -f /var/lib/rabbitmq/enabled_plugins ]; then
        cp /etc/rabbitmq/enabled_plugins /var/lib/rabbitmq/enabled_plugins
      fi
      cp /root/erlang.cookie /var/lib/rabbitmq/.erlang.cookie
      chown rabbitmq:rabbitmq /var/lib/rabbitmq/.erlang.cookie
      chmod 400 /var/lib/rabbitmq/.erlang.cookie
      exec /opt/rabbitmq/sbin/rabbitmq-server
      
    State:          Running
      Started:      Wed, 24 Dec 2025 13:30:12 +0800
    Last State:     Terminated
      Reason:       Completed
      Exit Code:    0
      Started:      Wed, 24 Dec 2025 13:25:12 +0800
      Finished:     Wed, 24 Dec 2025 13:30:12 +0800
    Ready:          False
    Restart Count:  15
    Limits:
      cpu:                        500m
      memory:                     512Mi
      vke.volcengine.com/eni-ip:  1
    Requests:
      cpu:                        500m
      memory:                     512Mi
      vke.volcengine.com/eni-ip:  1
    Startup:                      tcp-socket :5672 delay=0s timeout=1s period=10s #success=1 #failure=30
    Environment Variables from:
      rabbitmq-snkjbr-rabbitmq-env      ConfigMap  Optional: false
      rabbitmq-snkjbr-rabbitmq-rsm-env  ConfigMap  Optional: false
    Environment:
      RABBITMQ_DEFAULT_USER:          <set to the key 'username' in secret 'rabbitmq-snkjbr-rabbitmq-account-root'>  Optional: false
      RABBITMQ_DEFAULT_PASS:          <set to the key 'password' in secret 'rabbitmq-snkjbr-rabbitmq-account-root'>  Optional: false
      KB_POD_NAME:                    rabbitmq-snkjbr-rabbitmq-0 (v1:metadata.name)
      KB_POD_UID:                      (v1:metadata.uid)
      KB_NAMESPACE:                   default (v1:metadata.namespace)
      KB_SA_NAME:                      (v1:spec.serviceAccountName)
      KB_NODENAME:                     (v1:spec.nodeName)
      KB_HOST_IP:                      (v1:status.hostIP)
      KB_POD_IP:                       (v1:status.podIP)
      KB_POD_IPS:                      (v1:status.podIPs)
      KB_HOSTIP:                       (v1:status.hostIP)
      KB_PODIP:                        (v1:status.podIP)
      KB_PODIPS:                       (v1:status.podIPs)
      KB_POD_FQDN:                    $(KB_POD_NAME).rabbitmq-snkjbr-rabbitmq-headless.$(KB_NAMESPACE).svc
      MY_POD_NAME:                    rabbitmq-snkjbr-rabbitmq-0 (v1:metadata.name)
      MY_POD_NAMESPACE:               default (v1:metadata.namespace)
      SERVICE_PORT:                   15692
      RABBITMQ_MNESIA_BASE:           /var/lib/rabbitmq/mnesia
      RABBITMQ_LOG_BASE:              /var/lib/rabbitmq/logs
      K8S_SERVICE_NAME:               $(KB_CLUSTER_COMP_NAME)-headless
      RABBITMQ_ENABLED_PLUGINS_FILE:  /var/lib/rabbitmq/enabled_plugins
      RABBITMQ_USE_LONGNAME:          true
      RABBITMQ_NODENAME:              rabbit@$(KB_POD_NAME).$(K8S_SERVICE_NAME).$(MY_POD_NAMESPACE)
      K8S_HOSTNAME_SUFFIX:            .$(K8S_SERVICE_NAME).$(MY_POD_NAMESPACE)
    Mounts:
      /etc/localtime from timezone (ro)
      /etc/rabbitmq/conf.d/12-kubeblocks.conf from rabbitmq-config (rw,path="rabbitmq.conf")
      /etc/rabbitmq/enabled_plugins from rabbitmq-config (rw,path="enabled_plugins")
      /root/erlang.cookie from rabbitmq-config (rw,path=".erlang.cookie")
      /usr/share/zoneinfo from zoneinfo (ro)
      /var/lib/rabbitmq from data (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-zwrw8 (ro)
  lorry:
    Container ID:  containerd://91faa72d261554fd5b209a8cde5e36acbeeb7727d982245ed59b7d48ce452210
    Image:         apecloud-registry.cn-zhangjiakou.cr.aliyuncs.com/apecloud/rabbitmq:4.2.1-management
    Image ID:      apecloud-registry.cn-zhangjiakou.cr.aliyuncs.com/apecloud/rabbitmq@sha256:86fa2b761fc3a71a2b73090d7e45ad820f611fc829c1cb8cf087e09258fb65c1
    Ports:         3501/TCP, 50001/TCP
    Host Ports:    0/TCP, 0/TCP
    Command:
      /kubeblocks/lorry
      --port
      3501
      --grpcport
      50001
      --config-path
      /kubeblocks/config/lorry/components/
    State:          Running
      Started:      Wed, 24 Dec 2025 12:15:11 +0800
    Ready:          True
    Restart Count:  0
    Limits:
      cpu:     0
      memory:  0
    Requests:
      cpu:     0
      memory:  0
    Startup:   tcp-socket :3501 delay=0s timeout=1s period=10s #success=1 #failure=3
    Environment Variables from:
      rabbitmq-snkjbr-rabbitmq-env      ConfigMap  Optional: false
      rabbitmq-snkjbr-rabbitmq-rsm-env  ConfigMap  Optional: false
    Environment:
      RABBITMQ_DEFAULT_USER:          <set to the key 'username' in secret 'rabbitmq-snkjbr-rabbitmq-account-root'>  Optional: false
      RABBITMQ_DEFAULT_PASS:          <set to the key 'password' in secret 'rabbitmq-snkjbr-rabbitmq-account-root'>  Optional: false
      KB_POD_NAME:                    rabbitmq-snkjbr-rabbitmq-0 (v1:metadata.name)
      KB_POD_UID:                      (v1:metadata.uid)
      KB_NAMESPACE:                   default (v1:metadata.namespace)
      KB_SA_NAME:                      (v1:spec.serviceAccountName)
      KB_NODENAME:                     (v1:spec.nodeName)
      KB_HOST_IP:                      (v1:status.hostIP)
      KB_POD_IP:                       (v1:status.podIP)
      KB_POD_IPS:                      (v1:status.podIPs)
      KB_HOSTIP:                       (v1:status.hostIP)
      KB_PODIP:                        (v1:status.podIP)
      KB_PODIPS:                       (v1:status.podIPs)
      KB_POD_FQDN:                    $(KB_POD_NAME).rabbitmq-snkjbr-rabbitmq-headless.$(KB_NAMESPACE).svc
      KB_BUILTIN_HANDLER:             custom
      KB_SERVICE_USER:                <set to the key 'username' in secret 'rabbitmq-snkjbr-rabbitmq-account-root'>  Optional: false
      KB_SERVICE_PASSWORD:            <set to the key 'password' in secret 'rabbitmq-snkjbr-rabbitmq-account-root'>  Optional: false
      KB_SERVICE_PORT:                4369
      KB_DATA_PATH:                   /var/lib/rabbitmq
      KB_ACTION_COMMANDS:             {"memberLeave":["/bin/bash","-c","#!/bin/bash\n\n\nis_node_deleted() {\n    local disk_nodes_str=$(echo \"$1\" | awk '/Disk Nodes/{flag=1;next} /^$/{flag++} {if(NF\u003e0 \u0026\u0026 flag==2){print}}')\n    while read -r line; do\n        if $(echo \"$line\" | grep -q \"$KB_LEAVE_MEMBER_POD_NAME\"); then\n            return 1\n        fi\n    done \u003c\u003c\u003c \"$disk_nodes_str\"\n    return 0\n}\n\ncleanup() {\n    echo \"Cleaning up...\"\n    rm -f /tmp/member_leave.lock\n}\n\nget_target_node() {\n    # get the list of running nodes\n    RUNNING_NODES=$(echo \"$1\" | grep -A 3 \"Running Nodes\" | tail -n +3 | grep 'rabbit@')\n\n    while read -r line; do\n        if [ ! -z \"$line\" ]; then\n            NODES+=(\"$line\")\n        fi\n    done \u003c\u003c\u003c \"$RUNNING_NODES\"\n\n    # found the target node to execute forget_cluster_node\n    TARGET_NODE=\"\"\n    for NODE in \"${NODES[@]}\"; do\n        if [[ \"$NODE\" != \"$LEAVE_NODE\" ]]; then\n            TARGET_NODE=$NODE\n            break\n        fi\n    done\n\n    if [[ -z \"$TARGET_NODE\" ]]; then\n        echo \"no target node found to execute forget_cluster_node.\"\n        return 1\n    fi\n    echo \"$TARGET_NODE\"\n}\n\n# if test by shellspec include, just return 0\nif [ \"${__SOURCED__:+x}\" ]; then\n  return 0\nfi\n\nset -ex\nif [[ -z \"$KB_LEAVE_MEMBER_POD_NAME\" ]]; then\n    echo \"no leave member name provided\"\n    exit 1\nfi\n\nif [[ -f /tmp/member_leave.lock ]]; then\n    echo \"member_leave.sh is already running\"\n    exit 1\nfi\n\nCURRENT_POD_NAME=$(echo \"${RABBITMQ_NODENAME}\"|grep -oP '(?\u003c=rabbit@).*?(?=\\.)')\nif [[ -f /tmp/${KB_LEAVE_MEMBER_POD_NAME}_leave.success ]]; then\n    echo \"member_leave.sh is already leave success\"\n    # if the current pod is the leave member pod, exit directly without delete the success file, because the leave member can't execute cluster_status anymore after leave the cluster.\n    if [[ \"$CURRENT_POD_NAME\" == \"$KB_LEAVE_MEMBER_POD_NAME\" ]]; then\n        exit 0\n    fi\n    rm -f /tmp/${KB_LEAVE_MEMBER_POD_NAME}_leave.success\n    exit 0\nfi\n\n\ntouch /tmp/member_leave.lock\n# Define the cleanup function\n\n# Set the trap to call the cleanup function on script exit\ntrap cleanup EXIT\n\n# the node to leave the cluster\nLEAVE_NODE=\"${RABBITMQ_NODENAME/$CURRENT_POD_NAME/$KB_LEAVE_MEMBER_POD_NAME}\"\n\n# the output of rabbitmqctl cluster_status\nCLUSTER_STATUS=$(rabbitmqctl cluster_status --formatter table)\n\nif is_node_deleted \"$CLUSTER_STATUS\"; then\n    echo \"Node $KB_LEAVE_MEMBER_POD_NAME has been deleted.\"\n    touch /tmp/${KB_LEAVE_MEMBER_POD_NAME}_leave.success\n    exit 0\nfi\n\n\nTARGET_NODE=$(get_target_node \"$CLUSTER_STATUS\")\nif [[ $? -ne 0 ]]; then\n    echo \"no target node found to execute forget_cluster_node.\"\n    exit 1\nfi\n\n# execute forget_cluster_node on the target node\nrabbitmqctl -n $LEAVE_NODE stop_app\nrabbitmqctl -n $TARGET_NODE forget_cluster_node $LEAVE_NODE\n\ntouch /tmp/${KB_LEAVE_MEMBER_POD_NAME}_leave.success\n\nif [[ $? -eq 0 ]]; then\n    echo \"Leave member success: $LEAVE_NODE.\"\nelse\n    echo \"leave member failed: $LEAVE_NODE.\"\n    exit 1\nfi\n"]}
      MY_POD_NAME:                    rabbitmq-snkjbr-rabbitmq-0 (v1:metadata.name)
      MY_POD_NAMESPACE:               default (v1:metadata.namespace)
      SERVICE_PORT:                   15692
      RABBITMQ_MNESIA_BASE:           /var/lib/rabbitmq/mnesia
      RABBITMQ_LOG_BASE:              /var/lib/rabbitmq/logs
      K8S_SERVICE_NAME:               $(KB_CLUSTER_COMP_NAME)-headless
      RABBITMQ_ENABLED_PLUGINS_FILE:  /var/lib/rabbitmq/enabled_plugins
      RABBITMQ_USE_LONGNAME:          true
      RABBITMQ_NODENAME:              rabbit@$(KB_POD_NAME).$(K8S_SERVICE_NAME).$(MY_POD_NAMESPACE)
      K8S_HOSTNAME_SUFFIX:            .$(K8S_SERVICE_NAME).$(MY_POD_NAMESPACE)
    Mounts:
      /etc/localtime from timezone (ro)
      /etc/rabbitmq/conf.d/12-kubeblocks.conf from rabbitmq-config (rw,path="rabbitmq.conf")
      /etc/rabbitmq/enabled_plugins from rabbitmq-config (rw,path="enabled_plugins")
      /kubeblocks from kubeblocks (rw)
      /root/erlang.cookie from rabbitmq-config (rw,path=".erlang.cookie")
      /usr/share/zoneinfo from zoneinfo (ro)
      /var/lib/rabbitmq from data (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-zwrw8 (ro)
Conditions:
  Type                        Status
  PodReadyToStartContainers   True 
  Initialized                 True 
  Ready                       False 
  ContainersReady             False 
  PodScheduled                True 
Volumes:
  timezone:
    Type:          HostPath (bare host directory volume)
    Path:          /etc/localtime
    HostPathType:  File
  zoneinfo:
    Type:          HostPath (bare host directory volume)
    Path:          /usr/share/zoneinfo
    HostPathType:  Directory
  rabbitmq-config:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      rabbitmq-snkjbr-rabbitmq-config
    Optional:  false
  data:
    Type:       PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
    ClaimName:  data-rabbitmq-snkjbr-rabbitmq-0
    ReadOnly:   false
  kubeblocks:
    Type:       EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:     
    SizeLimit:  <unset>
  kube-api-access-zwrw8:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   Burstable
Node-Selectors:              <none>
Tolerations:                 kb-data=true:NoSchedule
                             node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                             node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type     Reason             Age                  From     Message
  ----     ------             ----                 ----     -------
  Warning  FailedPreStopHook  100s (x15 over 71m)  kubelet  PreStopHook failed

logs error pod

kubectl logs rabbitmq-snkjbr-rabbitmq-0 --previous 
Defaulted container "rabbitmq" out of: rabbitmq, lorry, init-lorry (init)
2025-12-24 04:50:15.261207+00:00 [notice] <0.45.0> Application syslog exited with reason: stopped
2025-12-24 04:50:15.263349+00:00 [notice] <0.209.0> Logging: switching to configured handler(s); following messages may not be visible in this log output
{exit,{shutdown,{gen_server,call,[application_controller,{start_application,rabbit,transient},infinity]}},[{gen_server,call,3,[{file,"gen_server.erl"},{line,1222}]},{application_controller,call,2,[{file,"application_controller.erl"},{line,509}]},{application,enqueue_or_start_app,6,[{file,"application.erl"},{line,419}]},{application,enqueue_or_start,6,[{file,"application.erl"},{line,384}]},{application,ensure_all_started,3,[{file,"application.erl"},{line,359}]},{rabbit,'-start_it/1-fun-0-',1,[{file,"rabbit.erl"},{line,440}]},{timer,tc,2,[{file,"timer.erl"},{line,595}]},{rabbit,start_it,1,[{file,"rabbit.erl"},{line,436}]}]}

Expected behavior
A clear and concise description of what you expected to happen.

Screenshots
If applicable, add screenshots to help explain your problem.

Desktop (please complete the following information):

  • OS: [e.g. iOS]
  • Browser [e.g. chrome, safari]
  • Version [e.g. 22]

Additional context
Add any other context about the problem here.

Metadata

Metadata

Assignees

Labels

kind/bugSomething isn't working

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions