-
Notifications
You must be signed in to change notification settings - Fork 307
Description
Describe the bug
This is similar to #1327 but with CephFS PVCs. Also at https://stackoverflow.com/questions/67771239/rabbitmq-fails-to-start-with-persistence-storage-on-kubernetes-permission-denie
The pod starts up but has no write access to the mnesia folder
I've deployed the standard example operator and test cluster from: https://rabbitmq.com/kubernetes/operator/quickstart-operator.html
The only modification I've added to the test cluster is that I set the storage-class.
To Reproduce
Steps to reproduce the behavior:
- kubectl apply -f https://github.com/rabbitmq/cluster-operator/releases/latest/download/cluster-operator.yml
- enable persistence with a storage class
apiVersion: rabbitmq.com/v1beta1
kind: RabbitmqCluster
metadata:
name: hello-world
spec:
persistence:
storageClassName: nvme-pool-ec62
storage: 20Gi
- kubectl apply -f rabbitmq.yaml
Expected behavior
- pod, PVC, PV is provisioned
- pod attaches PV
- pod starts
At step 3. the process fails as the binary does not have write access to the persistence changes
Stream closed EOF for default/hello-world-server-0 (rabbitmq)
rabbitmq 2023-05-23 19:30:58.187362+00:00 [warning] <0.132.0> Failed to write PID file "/var/lib/rabbitmq/mnesia/[email protected]": permission denied
rabbitmq 2023-05-23 19:31:01.039621+00:00 [notice] <0.44.0> Application syslog exited with reason: stopped
rabbitmq 2023-05-23 19:31:01.039898+00:00 [notice] <0.230.0> Logging: switching to configured handler(s); following messages may not be visible in this log output
rabbitmq 2023-05-23 19:31:01.066987+00:00 [notice] <0.230.0> Logging: configured log handlers are now ACTIVE
rabbitmq
rabbitmq BOOT FAILED
rabbitmq ===========
rabbitmq Error during startup: {error,
rabbitmq 2023-05-23 19:31:01.126017+00:00 [error] <0.230.0>
rabbitmq 2023-05-23 19:31:01.126017+00:00 [error] <0.230.0> BOOT FAILED
rabbitmq 2023-05-23 19:31:01.126017+00:00 [error] <0.230.0> ===========
rabbitmq 2023-05-23 19:31:01.126017+00:00 [error] <0.230.0> Error during startup: {error,
rabbitmq 2023-05-23 19:31:01.126017+00:00 [error] <0.230.0> {cannot_create_mnesia_dir,
rabbitmq 2023-05-23 19:31:01.126017+00:00 [error] <0.230.0> "/var/lib/rabbitmq/mnesia/[email protected]/",
rabbitmq 2023-05-23 19:31:01.126017+00:00 [error] <0.230.0> eacces}}
rabbitmq {cannot_create_mnesia_dir,
rabbitmq 2023-05-23 19:31:01.126017+00:00 [error] <0.230.0>
rabbitmq "/var/lib/rabbitmq/mnesia/[email protected]/",
rabbitmq eacces}}
rabbitmq
rabbitmq 2023-05-23 19:31:02.127328+00:00 [error] <0.229.0> crasher:
rabbitmq 2023-05-23 19:31:02.127328+00:00 [error] <0.229.0> initial call: application_master:init/4
rabbitmq 2023-05-23 19:31:02.127328+00:00 [error] <0.229.0> pid: <0.229.0>
rabbitmq 2023-05-23 19:31:02.127328+00:00 [error] <0.229.0> registered_name: []
rabbitmq 2023-05-23 19:31:02.127328+00:00 [error] <0.229.0> exception exit: {{cannot_create_mnesia_dir,
rabbitmq 2023-05-23 19:31:02.127328+00:00 [error] <0.229.0> "/var/lib/rabbitmq/mnesia/[email protected]/",
rabbitmq 2023-05-23 19:31:02.127328+00:00 [error] <0.229.0> eacces},
rabbitmq 2023-05-23 19:31:02.127328+00:00 [error] <0.229.0> {rabbit,start,[normal,]]}}
rabbitmq 2023-05-23 19:31:02.127328+00:00 [error] <0.229.0> in function application_master:init/4 (application_master.erl, line 142)
rabbitmq 2023-05-23 19:31:02.127328+00:00 [error] <0.229.0> ancestors: [<0.228.0>]
rabbitmq 2023-05-23 19:31:02.127328+00:00 [error] <0.229.0> message_queue_len: 1
rabbitmq 2023-05-23 19:31:02.127328+00:00 [error] <0.229.0> messages: [{'EXIT',<0.230.0>,normal}]
rabbitmq 2023-05-23 19:31:02.127328+00:00 [error] <0.229.0> links: [<0.228.0>,<0.44.0>]
rabbitmq 2023-05-23 19:31:02.127328+00:00 [error] <0.229.0> dictionary: []
rabbitmq 2023-05-23 19:31:02.127328+00:00 [error] <0.229.0> trap_exit: true
rabbitmq 2023-05-23 19:31:02.127328+00:00 [error] <0.229.0> status: running
rabbitmq 2023-05-23 19:31:02.127328+00:00 [error] <0.229.0> heap_size: 610
rabbitmq 2023-05-23 19:31:02.127328+00:00 [error] <0.229.0> stack_size: 28
rabbitmq 2023-05-23 19:31:02.127328+00:00 [error] <0.229.0> reductions: 178
rabbitmq 2023-05-23 19:31:02.127328+00:00 [error] <0.229.0> neighbours:
rabbitmq 2023-05-23 19:31:02.127328+00:00 [error] <0.229.0>
rabbitmq 2023-05-23 19:31:02.143479+00:00 [notice] <0.44.0> Application rabbit exited with reason: {{cannot_create_mnesia_dir,"/var/lib/rabbitmq/mnesia/[email protected]/",eacces},{rabbit,start,[normal,]]}}
rabbitmq {"Kernel pid terminated",application_controller,"{application_start_failure,rabbit,{{cannot_create_mnesia_dir,\"/var/lib/rabbitmq/mnesia/[email protected]/\",eacces},{rabbit,start,[normal,]]}}}"}
rabbitmq Kernel pid terminated (application_controller) ({application_start_failure,rabbit,{{cannot_create_mnesia_dir,"/var/lib/rabbitmq/mnesia/[email protected]/", eacces},{rabbit,start,[normal,]]}}})
rabbitmq
rabbitmq Crash dump is being written to: /var/log/rabbitmq/erl_crash.dump...done
Stream closed EOF for default/hello-world-server-0 (setup-container)
The volume that gets created is owned by root by default as with all other PVCs:
psarossy@artemis: ~/ceph/volumes/csi/csi-vol-f6c8aaf0-8c7b-4bdb-a7dd-fb514c9d3639/26f680c7-a460-4789-bb9d-9b085672b406
$ ls -al [16:11:13]
total 0
drwxr-xr-x 2 root root 0 May 23 11:29 .
drwxr-xr-x 3 root root 2 May 23 11:29 ..
If I change the folder ownership tot UID/GID 999:999 aka rabbitmq:rabbitmq then the pod starts up and works fine.
The statefulset is missing the command to claim the folder as part of init before handing over to the non-privileged user to start the process... Unfortunately this needs to be fixed in the operator as every pod has the same issue when new PVCs are created, as it'll overwrite any changes to the configs, rightfully so.
Version and environment information
- RabbitMQ: 3.11.10
- RabbitMQ Cluster Operator: 2.2.0
- Kubernetes: v1.23.2
- Cloud provider or hardware configuration: baremetal via kubeadm on Dell servers with Ceph Rook