Skip to content

Rabbitmq cluster boot failed #1356

@hockkg

Description

@hockkg

I try to create a rabbitmq cluster using cluster-operator according the Quickstart, but it failed:

ERROR: epmd error for host hello-world-server-0.hello-world-nodes.default: timeout (timed out)

To Reproduce

Steps to reproduce the behavior:

  1. kubectl rabbitmq install-cluster-operator
  2. kubectl apply -f https://raw.githubusercontent.com/rabbitmq/cluster-operator/main/docs/examples/hello-world/rabbitmq.yaml
  3. i create a pv then the pod changed to running
  4. i describe the pod, it has some warnings:
Events:
  Type     Reason     Age                   From               Message
  ----     ------     ----                  ----               -------
  Normal   Scheduled  20m                   default-scheduler  Successfully assigned default/hello-world-server-0 to k8s-master
  Normal   Pulled     20m                   kubelet            Container image "rabbitmq:3.11.10-management" already present on machine
  Normal   Created    20m                   kubelet            Created container setup-container
  Normal   Started    20m                   kubelet            Started container setup-container
  Normal   Created    18m (x2 over 20m)     kubelet            Created container rabbitmq
  Normal   Started    18m (x2 over 20m)     kubelet            Started container rabbitmq
  Normal   Pulled     17m (x3 over 20m)     kubelet            Container image "rabbitmq:3.11.10-management" already present on machine
  Warning  BackOff    5m28s (x29 over 17m)  kubelet            Back-off restarting failed container rabbitmq in pod hello-world-server-0_default(9c700d4c-4edb-4ba6-9760-87f2b090ba3f)
  Warning  Unhealthy  28s (x50 over 19m)    kubelet            Readiness probe failed: dial tcp 10.244.235.208:5672: connect: connection refused
  1. kubectl logs hello-world-server-0 --previous
Defaulted container "rabbitmq" out of: rabbitmq, setup-container (init)

BOOT FAILED
===========
2023-05-18 08:45:16.825044+00:00 [error] <0.132.0>
2023-05-18 08:45:16.825044+00:00 [error] <0.132.0> BOOT FAILED
2023-05-18 08:45:16.825044+00:00 [error] <0.132.0> ===========
2023-05-18 08:45:16.825044+00:00 [error] <0.132.0> ERROR: epmd error for host hello-world-server-0.hello-world-nodes.default: timeout (timed out)
2023-05-18 08:45:16.825044+00:00 [error] <0.132.0>
ERROR: epmd error for host hello-world-server-0.hello-world-nodes.default: timeout (timed out)

2023-05-18 08:45:17.857046+00:00 [error] <0.132.0>     supervisor: {local,rabbit_prelaunch_sup}
2023-05-18 08:45:17.857046+00:00 [error] <0.132.0>     errorContext: start_error
2023-05-18 08:45:17.857046+00:00 [error] <0.132.0>     reason: {epmd_error,"hello-world-server-0.hello-world-nodes.default",
2023-05-18 08:45:17.857046+00:00 [error] <0.132.0>                         timeout}
2023-05-18 08:45:17.857046+00:00 [error] <0.132.0>     offender: [{pid,undefined},
2023-05-18 08:45:17.857046+00:00 [error] <0.132.0>                {id,prelaunch},
2023-05-18 08:45:17.857046+00:00 [error] <0.132.0>                {mfargs,{rabbit_prelaunch,run_prelaunch_first_phase,[]}},
2023-05-18 08:45:17.857046+00:00 [error] <0.132.0>                {restart_type,transient},
2023-05-18 08:45:17.857046+00:00 [error] <0.132.0>                {significant,false},
2023-05-18 08:45:17.857046+00:00 [error] <0.132.0>                {shutdown,5000},
2023-05-18 08:45:17.857046+00:00 [error] <0.132.0>                {child_type,worker}]
2023-05-18 08:45:17.857046+00:00 [error] <0.132.0>
2023-05-18 08:45:17.858578+00:00 [error] <0.130.0>   crasher:
2023-05-18 08:45:17.858578+00:00 [error] <0.130.0>     initial call: application_master:init/4
2023-05-18 08:45:17.858578+00:00 [error] <0.130.0>     pid: <0.130.0>
2023-05-18 08:45:17.858578+00:00 [error] <0.130.0>     registered_name: []
2023-05-18 08:45:17.858578+00:00 [error] <0.130.0>     exception exit: {{shutdown,
2023-05-18 08:45:17.858578+00:00 [error] <0.130.0>                          {failed_to_start_child,prelaunch,
2023-05-18 08:45:17.858578+00:00 [error] <0.130.0>                              {epmd_error,
2023-05-18 08:45:17.858578+00:00 [error] <0.130.0>                                  "hello-world-server-0.hello-world-nodes.default",
2023-05-18 08:45:17.858578+00:00 [error] <0.130.0>                                  timeout}}},
2023-05-18 08:45:17.858578+00:00 [error] <0.130.0>                      {rabbit_prelaunch_app,start,[normal,[]]}}
2023-05-18 08:45:17.858578+00:00 [error] <0.130.0>       in function  application_master:init/4 (application_master.erl, line 142)
2023-05-18 08:45:17.858578+00:00 [error] <0.130.0>     ancestors: [<0.129.0>]
2023-05-18 08:45:17.858578+00:00 [error] <0.130.0>     message_queue_len: 1
2023-05-18 08:45:17.858578+00:00 [error] <0.130.0>     messages: [{'EXIT',<0.131.0>,normal}]
2023-05-18 08:45:17.858578+00:00 [error] <0.130.0>     links: [<0.129.0>,<0.44.0>]
2023-05-18 08:45:17.858578+00:00 [error] <0.130.0>     dictionary: []
2023-05-18 08:45:17.858578+00:00 [error] <0.130.0>     trap_exit: true
2023-05-18 08:45:17.858578+00:00 [error] <0.130.0>     status: running
2023-05-18 08:45:17.858578+00:00 [error] <0.130.0>     heap_size: 376
2023-05-18 08:45:17.858578+00:00 [error] <0.130.0>     stack_size: 28
2023-05-18 08:45:17.858578+00:00 [error] <0.130.0>     reductions: 196
2023-05-18 08:45:17.858578+00:00 [error] <0.130.0>   neighbours:
2023-05-18 08:45:17.858578+00:00 [error] <0.130.0>
2023-05-18 08:45:17.863871+00:00 [notice] <0.44.0> Application rabbitmq_prelaunch exited with reason: {{shutdown,{failed_to_start_child,prelaunch,{epmd_error,"hello-world-server-0.hello-world-nodes.default",timeout}}},{rabbit_prelaunch_app,start,[normal,[]]}}
{"Kernel pid terminated",application_controller,"{application_start_failure,rabbitmq_prelaunch,{{shutdown,{failed_to_start_child,prelaunch,{epmd_error,\"hello-world-server-0.hello-world-nodes.default\",timeout}}},{rabbit_prelaunch_app,start,[normal,[]]}}}"}
Kernel pid terminated (application_controller) ({application_start_failure,rabbitmq_prelaunch,{{shutdown,{failed_to_start_child,prelaunch,{epmd_error,"hello-world-server-0.hello-world-nodes.default",timeout}}},{rabbit_prelaunch_app,start,[normal,[]]}}})

Crash dump is being written to: erl_crash.dump...
  1. kubectl exec -it hello-world-server-0 -- ps aux
USER         PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
rabbitmq       1  0.1  0.0   2616  1816 ?        Ss   08:52   0:00 /bin/sh /opt/rabbitmq/sbin/rabbitmq-server
rabbitmq      11  9.7  0.8 3064792 68928 ?       Sl   08:52   0:01 /usr/local/lib/erlang/erts-13.2/bin/beam.smp -W w -MBas ageffcbf -MHas ageffcbf -MBlmbcs 512 -MHlmbcs 512 -MMmc
rabbitmq      17  0.0  0.0   2504   572 ?        Ss   08:52   0:00 erl_child_setup 1048576
rabbitmq      57  0.0  0.0   3892    88 ?        S    08:52   0:00 /usr/local/lib/erlang/erts-13.2/bin/epmd -daemon
rabbitmq      92  0.0  0.0   3892   828 ?        Ss   08:52   0:00 /usr/local/lib/erlang/erts-13.2/bin/inet_gethost 4
rabbitmq      93  0.0  0.0   4116  1912 ?        S    08:52   0:00 /usr/local/lib/erlang/erts-13.2/bin/inet_gethost 4
rabbitmq      94  0.5  0.0   6004  3868 pts/0    Ss   08:52   0:00 bash
rabbitmq     100  0.0  0.0   7652  3208 pts/0    R+   08:53   0:00 ps aux

Version and environment information

  1. kubectl rabbitmq version:
kubectl-rabbitmq v2.2.0
  1. kubectl version
WARNING: This version information is deprecated and will be replaced with the output from kubectl version --short.  Use --output=yaml|json to get the full version.
Client Version: version.Info{Major:"1", Minor:"27", GitVersion:"v1.27.1", GitCommit:"4c9411232e10168d7b050c49a1b59f6df9d7ea4b", GitTreeState:"clean", BuildDate:"2023-04-14T13:21:19Z", GoVersion:"go1.20.3", Compiler:"gc", Platform:"linux/amd64"}
Kustomize Version: v5.0.1
Server Version: version.Info{Major:"1", Minor:"27", GitVersion:"v1.27.1", GitCommit:"4c9411232e10168d7b050c49a1b59f6df9d7ea4b", GitTreeState:"clean", BuildDate:"2023-04-14T13:14:42Z", GoVersion:"go1.20.3", Compiler:"gc", Platform:"linux/amd64"}

I am trying to install dnsutils so that I can use nslookup to check whether it is a DNS problem. However, the container exits quickly, and dnsutils cannot be installed in time. Any helps would be appreciated.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions