Skip to content

k8s rabbitmq cluster multiple nodes high availability exception #705

@jipeigong

Description

@jipeigong

Describe the bug

k8s rabbitmq cluster multiple nodes high availability exception

To Reproduce

Steps to reproduce the behavior:

  1. creating a triple replica rabbitmq cluster
  2. run a latest springboot project within spring-cloud-starter-bus-amqp, write a simple test program
  3. Continuous test message sending(The connection is working properly, the cluster-0 is connected)
  4. delete cluster-0 container,It will restart automatically
  5. test program got error:
org.springframework.amqp.AmqpIOException: java.io.IOException
	at org.springframework.amqp.rabbit.support.RabbitExceptionTranslator.convertRabbitAccessException(RabbitExceptionTranslator.java:71) ~[spring-rabbit-2.1.6.RELEASE.jar!/:2.1.6.RELEASE]
	at org.springframework.amqp.rabbit.connection.RabbitAccessor.convertRabbitAccessException(RabbitAccessor.java:116) ~[spring-rabbit-2.1.6.RELEASE.jar!/:2.1.6.RELEASE]
	at org.springframework.amqp.rabbit.core.RabbitTemplate.doExecute(RabbitTemplate.java:2100) ~[spring-rabbit-2.1.6.RELEASE.jar!/:2.1.6.RELEASE]
	at org.springframework.amqp.rabbit.core.RabbitTemplate.execute(RabbitTemplate.java:2047) ~[spring-rabbit-2.1.6.RELEASE.jar!/:2.1.6.RELEASE]
	at org.springframework.amqp.rabbit.core.RabbitTemplate.execute(RabbitTemplate.java:2027) ~[spring-rabbit-2.1.6.RELEASE.jar!/:2.1.6.RELEASE]
	at org.springframework.amqp.rabbit.core.RabbitAdmin.initialize(RabbitAdmin.java:591) ~[spring-rabbit-2.1.6.RELEASE.jar!/:2.1.6.RELEASE]
	at org.springframework.amqp.rabbit.listener.AbstractMessageListenerContainer.attemptDeclarations(AbstractMessageListenerContainer.java:1791) ~[spring-rabbit-2.1.6.RELEASE.jar!/:2.1.6.RELEASE]
	at org.springframework.amqp.rabbit.listener.AbstractMessageListenerContainer.redeclareElementsIfNecessary(AbstractMessageListenerContainer.java:1768) ~[spring-rabbit-2.1.6.RELEASE.jar!/:2.1.6.RELEASE]
	at org.springframework.amqp.rabbit.listener.SimpleMessageListenerContainer$AsyncMessageProcessingConsumer.initialize(SimpleMessageListenerContainer.java:1195) [spring-rabbit-2.1.6.RELEASE.jar!/:2.1.6.RELEASE]
	at org.springframework.amqp.rabbit.listener.SimpleMessageListenerContainer$AsyncMessageProcessingConsumer.run(SimpleMessageListenerContainer.java:1041) [spring-rabbit-2.1.6.RELEASE.jar!/:2.1.6.RELEASE]
	at java.lang.Thread.run(Thread.java:745) [na:1.8.0_111-internal]
Caused by: java.io.IOException: null
	at com.rabbitmq.client.impl.AMQChannel.wrap(AMQChannel.java:126) ~[amqp-client-5.4.3.jar!/:5.4.3]
	at com.rabbitmq.client.impl.AMQChannel.wrap(AMQChannel.java:122) ~[amqp-client-5.4.3.jar!/:5.4.3]
	at com.rabbitmq.client.impl.AMQChannel.exnWrappingRpc(AMQChannel.java:144) ~[amqp-client-5.4.3.jar!/:5.4.3]
	at com.rabbitmq.client.impl.ChannelN.queueDeclare(ChannelN.java:962) ~[amqp-client-5.4.3.jar!/:5.4.3]
	at com.rabbitmq.client.impl.ChannelN.queueDeclare(ChannelN.java:52) ~[amqp-client-5.4.3.jar!/:5.4.3]
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[na:1.8.0_111-internal]
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[na:1.8.0_111-internal]
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[na:1.8.0_111-internal]
	at java.lang.reflect.Method.invoke(Method.java:498) ~[na:1.8.0_111-internal]
	at org.springframework.amqp.rabbit.connection.CachingConnectionFactory$CachedChannelInvocationHandler.invoke(CachingConnectionFactory.java:1140) ~[spring-rabbit-2.1.6.RELEASE.jar!/:2.1.6.RELEASE]
	at com.sun.proxy.$Proxy105.queueDeclare(Unknown Source) ~[na:na]
	at org.springframework.amqp.rabbit.core.RabbitAdmin.declareQueues(RabbitAdmin.java:710) ~[spring-rabbit-2.1.6.RELEASE.jar!/:2.1.6.RELEASE]
	at org.springframework.amqp.rabbit.core.RabbitAdmin.lambda$initialize$12(RabbitAdmin.java:593) ~[spring-rabbit-2.1.6.RELEASE.jar!/:2.1.6.RELEASE]
	at org.springframework.amqp.rabbit.core.RabbitTemplate.invokeAction(RabbitTemplate.java:2135) ~[spring-rabbit-2.1.6.RELEASE.jar!/:2.1.6.RELEASE]
	at org.springframework.amqp.rabbit.core.RabbitTemplate.doExecute(RabbitTemplate.java:2094) ~[spring-rabbit-2.1.6.RELEASE.jar!/:2.1.6.RELEASE]
	... 8 common frames omitted
Caused by: com.rabbitmq.client.ShutdownSignalException: channel error; protocol method: #method<channel.close>(reply-code=404, reply-text=NOT_FOUND - home node '[email protected]' of durable queue 'test1' in vhost '/' is down or inaccessible, class-id=50, method-id=10)
	at com.rabbitmq.utility.ValueOrException.getValue(ValueOrException.java:66) ~[amqp-client-5.4.3.jar!/:5.4.3]
	at com.rabbitmq.utility.BlockingValueOrException.uninterruptibleGetValue(BlockingValueOrException.java:36) ~[amqp-client-5.4.3.jar!/:5.4.3]
	at com.rabbitmq.client.impl.AMQChannel$BlockingRpcContinuation.getReply(AMQChannel.java:494) ~[amqp-client-5.4.3.jar!/:5.4.3]
	at com.rabbitmq.client.impl.AMQChannel.privateRpc(AMQChannel.java:288) ~[amqp-client-5.4.3.jar!/:5.4.3]
	at com.rabbitmq.client.impl.AMQChannel.exnWrappingRpc(AMQChannel.java:138) ~[amqp-client-5.4.3.jar!/:5.4.3]
	... 20 common frames omitted
Caused by: com.rabbitmq.client.ShutdownSignalException: channel error; protocol method: #method<channel.close>(reply-code=404, reply-text=NOT_FOUND - home node '[email protected]' of durable queue 'test1' in vhost '/' is down or inaccessible, class-id=50, method-id=10)
	at com.rabbitmq.client.impl.ChannelN.asyncShutdown(ChannelN.java:516) ~[amqp-client-5.4.3.jar!/:5.4.3]
	at com.rabbitmq.client.impl.ChannelN.processAsync(ChannelN.java:346) ~[amqp-client-5.4.3.jar!/:5.4.3]
	at com.rabbitmq.client.impl.AMQChannel.handleCompleteInboundCommand(AMQChannel.java:178) ~[amqp-client-5.4.3.jar!/:5.4.3]
	at com.rabbitmq.client.impl.AMQChannel.handleFrame(AMQChannel.java:111) ~[amqp-client-5.4.3.jar!/:5.4.3]
	at com.rabbitmq.client.impl.AMQConnection.readFrame(AMQConnection.java:670) ~[amqp-client-5.4.3.jar!/:5.4.3]
	at com.rabbitmq.client.impl.AMQConnection.access$300(AMQConnection.java:48) ~[amqp-client-5.4.3.jar!/:5.4.3]
	at com.rabbitmq.client.impl.AMQConnection$MainLoop.run(AMQConnection.java:597) ~[amqp-client-5.4.3.jar!/:5.4.3]
	... 1 common frames omitted

2021-05-27 08:38:32.158  WARN 1 --- [ntContainer#0-2] o.s.a.r.listener.BlockingQueueConsumer   : Failed to declare queue: test1
2021-05-27 08:38:32.160  WARN 1 --- [ntContainer#0-2] o.s.a.r.listener.BlockingQueueConsumer   : Queue declaration failed; retries left=3

org.springframework.amqp.rabbit.listener.BlockingQueueConsumer$DeclarationException: Failed to declare queue(s):[test1]
	at org.springframework.amqp.rabbit.listener.BlockingQueueConsumer.attemptPassiveDeclarations(BlockingQueueConsumer.java:710) [spring-rabbit-2.1.6.RELEASE.jar!/:2.1.6.RELEASE]
	at org.springframework.amqp.rabbit.listener.BlockingQueueConsumer.passiveDeclarations(BlockingQueueConsumer.java:594) [spring-rabbit-2.1.6.RELEASE.jar!/:2.1.6.RELEASE]
	at org.springframework.amqp.rabbit.listener.BlockingQueueConsumer.start(BlockingQueueConsumer.java:581) [spring-rabbit-2.1.6.RELEASE.jar!/:2.1.6.RELEASE]
	at org.springframework.amqp.rabbit.listener.SimpleMessageListenerContainer$AsyncMessageProcessingConsumer.initialize(SimpleMessageListenerContainer.java:1196) [spring-rabbit-2.1.6.RELEASE.jar!/:2.1.6.RELEASE]
	at org.springframework.amqp.rabbit.listener.SimpleMessageListenerContainer$AsyncMessageProcessingConsumer.run(SimpleMessageListenerContainer.java:1041) [spring-rabbit-2.1.6.RELEASE.jar!/:2.1.6.RELEASE]
	at java.lang.Thread.run(Thread.java:745) [na:1.8.0_111-internal]
Caused by: java.io.IOException: null
	at com.rabbitmq.client.impl.AMQChannel.wrap(AMQChannel.java:126) ~[amqp-client-5.4.3.jar!/:5.4.3]
	at com.rabbitmq.client.impl.AMQChannel.wrap(AMQChannel.java:122) ~[amqp-client-5.4.3.jar!/:5.4.3]
	at com.rabbitmq.client.impl.AMQChannel.exnWrappingRpc(AMQChannel.java:144) ~[amqp-client-5.4.3.jar!/:5.4.3]
	at com.rabbitmq.client.impl.ChannelN.queueDeclarePassive(ChannelN.java:1006) ~[amqp-client-5.4.3.jar!/:5.4.3]
	at com.rabbitmq.client.impl.ChannelN.queueDeclarePassive(ChannelN.java:52) ~[amqp-client-5.4.3.jar!/:5.4.3]
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[na:1.8.0_111-internal]
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[na:1.8.0_111-internal]
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[na:1.8.0_111-internal]
	at java.lang.reflect.Method.invoke(Method.java:498) ~[na:1.8.0_111-internal]
	at org.springframework.amqp.rabbit.connection.CachingConnectionFactory$CachedChannelInvocationHandler.invoke(CachingConnectionFactory.java:1140) ~[spring-rabbit-2.1.6.RELEASE.jar!/:2.1.6.RELEASE]
	at com.sun.proxy.$Proxy105.queueDeclarePassive(Unknown Source) ~[na:na]
	at org.springframework.amqp.rabbit.listener.BlockingQueueConsumer.attemptPassiveDeclarations(BlockingQueueConsumer.java:689) [spring-rabbit-2.1.6.RELEASE.jar!/:2.1.6.RELEASE]
	... 5 common frames omitted
Caused by: com.rabbitmq.client.ShutdownSignalException: channel error; protocol method: #method<channel.close>(reply-code=404, reply-text=NOT_FOUND - home node '[email protected]' of durable queue 'test1' in vhost '/' is down or inaccessible, class-id=50, method-id=10)
	at com.rabbitmq.utility.ValueOrException.getValue(ValueOrException.java:66) ~[amqp-client-5.4.3.jar!/:5.4.3]
	at com.rabbitmq.utility.BlockingValueOrException.uninterruptibleGetValue(BlockingValueOrException.java:36) ~[amqp-client-5.4.3.jar!/:5.4.3]
	at com.rabbitmq.client.impl.AMQChannel$BlockingRpcContinuation.getReply(AMQChannel.java:494) ~[amqp-client-5.4.3.jar!/:5.4.3]
	at com.rabbitmq.client.impl.AMQChannel.privateRpc(AMQChannel.java:288) ~[amqp-client-5.4.3.jar!/:5.4.3]
	at com.rabbitmq.client.impl.AMQChannel.exnWrappingRpc(AMQChannel.java:138) ~[amqp-client-5.4.3.jar!/:5.4.3]
	... 14 common frames omitted
Caused by: com.rabbitmq.client.ShutdownSignalException: channel error; protocol method: #method<channel.close>(reply-code=404, reply-text=NOT_FOUND - home node '[email protected]' of durable queue 'test1' in vhost '/' is down or inaccessible, class-id=50, method-id=10)
	at com.rabbitmq.client.impl.ChannelN.asyncShutdown(ChannelN.java:516) ~[amqp-client-5.4.3.jar!/:5.4.3]
	at com.rabbitmq.client.impl.ChannelN.processAsync(ChannelN.java:346) ~[amqp-client-5.4.3.jar!/:5.4.3]
	at com.rabbitmq.client.impl.AMQChannel.handleCompleteInboundCommand(AMQChannel.java:178) ~[amqp-client-5.4.3.jar!/:5.4.3]
	at com.rabbitmq.client.impl.AMQChannel.handleFrame(AMQChannel.java:111) ~[amqp-client-5.4.3.jar!/:5.4.3]
	at com.rabbitmq.client.impl.AMQConnection.readFrame(AMQConnection.java:670) ~[amqp-client-5.4.3.jar!/:5.4.3]
	at com.rabbitmq.client.impl.AMQConnection.access$300(AMQConnection.java:48) ~[amqp-client-5.4.3.jar!/:5.4.3]
	at com.rabbitmq.client.impl.AMQConnection$MainLoop.run(AMQConnection.java:597) ~[amqp-client-5.4.3.jar!/:5.4.3]
	... 1 common frames omitted

2021-05-27 08:38:34.066  INFO 1 --- [nio-8080-exec-9] com.example.rabbitmqtest.Sender          : do sendMessage test1 connectionTest msg : 8
2021-05-27 08:38:37.129  INFO 1 --- [GCTOtX2p14Viw-2] o.s.a.r.l.SimpleMessageListenerContainer : Restarting Consumer@3f931af3: tags=[[]], channel=null, acknowledgeMode=AUTO local queue size=0
2021-05-27 08:38:37.134  WARN 1 --- [GCTOtX2p14Viw-3] o.s.a.r.listener.BlockingQueueConsumer   : Failed to declare queue: springCloudBus.anonymous.0LZO0zlLQGCTOtX2p14Viw
2021-05-27 08:38:37.135  WARN 1 --- [GCTOtX2p14Viw-3] o.s.a.r.listener.BlockingQueueConsumer   : Queue declaration failed; retries left=3

org.springframework.amqp.rabbit.listener.BlockingQueueConsumer$DeclarationException: Failed to declare queue(s):[springCloudBus.anonymous.0LZO0zlLQGCTOtX2p14Viw]
	at org.springframework.amqp.rabbit.listener.BlockingQueueConsumer.attemptPassiveDeclarations(BlockingQueueConsumer.java:710) [spring-rabbit-2.1.6.RELEASE.jar!/:2.1.6.RELEASE]
	at org.springframework.amqp.rabbit.listener.BlockingQueueConsumer.passiveDeclarations(BlockingQueueConsumer.java:594) [spring-rabbit-2.1.6.RELEASE.jar!/:2.1.6.RELEASE]
	at org.springframework.amqp.rabbit.listener.BlockingQueueConsumer.start(BlockingQueueConsumer.java:581) [spring-rabbit-2.1.6.RELEASE.jar!/:2.1.6.RELEASE]
	at org.springframework.amqp.rabbit.listener.SimpleMessageListenerContainer$AsyncMessageProcessingConsumer.initialize(SimpleMessageListenerContainer.java:1196) [spring-rabbit-2.1.6.RELEASE.jar!/:2.1.6.RELEASE]
	at org.springframework.amqp.rabbit.listener.SimpleMessageListenerContainer$AsyncMessageProcessingConsumer.run(SimpleMessageListenerContainer.java:1041) [spring-rabbit-2.1.6.RELEASE.jar!/:2.1.6.RELEASE]
	at java.lang.Thread.run(Thread.java:745) [na:1.8.0_111-internal] 
  1. When the cluster-0 restart is completed, springboot is still connected to the quence cluster-0,But the queue message cannot be sent and received normally,At this time, the overview and connections show that the cluster-2

image
image
image

yaml:

kind: RabbitmqCluster
metadata:
  annotations:
    k8s.kuboard.cn/displayName: rabbitmq-cluster-server
    k8s.kuboard.cn/workload: rabbitmq-cluster-server
  labels:
    k8s.kuboard.cn/layer: cloud
    k8s.kuboard.cn/name: rabbitmq-cluster
  namespace: prod
  name: rabbitmq-cluster
spec:
  replicas: 3
  service:
    type: NodePort
  rabbitmq:
    additionalConfig: |
      cluster_partition_handling = pause_minority
      vm_memory_high_watermark_paging_ratio = 0.99
      disk_free_limit.relative = 1.0
      collect_statistics_interval = 10000
  persistence:
    storageClassName: prod-nas
    storage: 100Gi
  resources:
    requests:
      cpu: 200m
      memory: 400Mi
    limits:
      cpu: 300m
      memory: 600Mi

Expected behavior
The expected result is that springboot reconnects to other cluster and runs normally,

Version and environment information

  • RabbitMQ: [latest]
  • RabbitMQ Cluster Operator: [latest]
  • Kubernetes: [e.g. 1.20.4]

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions