pause_minority Cluster Partition Handling does not work as expected #8111

justsomescripts · 2023-05-05T13:08:50Z

justsomescripts
May 5, 2023

Describe the bug

When the connection of multiple nodes in a RabbitMQ cluster is interrupted at the same time, the partition that has the minority of nodes is still running and accepting messages.

RabbitMQ version: 3.11.13
Erlang version: 25.3
Host: Ubuntu 22.04.2 LTS

Reproduction steps

set cluster_partition_handling to pause_minority
Interrupt the connection of two nodes in a three-node-cluster simultaneously for a short period of time
Check the status of each node

All nodes still accept messages despite cluster_partition_handling set to pause_minority and being partitioned.
After a restart of the single node, the cluster works again as expected.

Expected behavior

RabbitMQ is paused on the single node partition.

Additional context

Output of the first nodes (partition of two)

# rabbitmqctl cluster_status
Cluster status of node rabbit@node01 ...
Basics

Cluster name: rabbit_cluster_dev
Total CPU cores available cluster-wide: 4

Disk Nodes

rabbit@node01
rabbit@node02
rabbit@node03

Running Nodes

rabbit@node01
rabbit@node02

Versions

rabbit@node01: RabbitMQ 3.11.13 on Erlang 25.3
rabbit@node02: RabbitMQ 3.11.13 on Erlang 25.3

CPU Cores

Node: rabbit@node01, available CPU cores: 2
Node: rabbit@node02, available CPU cores: 2

Maintenance status

Node: rabbit@node01, status: not under maintenance
Node: rabbit@node02, status: not under maintenance

Alarms

(none)

Network Partitions

Node rabbit@node01 cannot communicate with rabbit@node03
Node rabbit@node02 cannot communicate with rabbit@node03

Listeners

Node: rabbit@node01, interface: [::], port: 25672, protocol: clustering, purpose: inter-node and CLI tool communication
Node: rabbit@node01, interface: [::], port: 8080, protocol: http, purpose: HTTP API
Node: rabbit@node01, interface: [::], port: 443, protocol: https, purpose: HTTP API over TLS (HTTPS)
Node: rabbit@node01, interface: [::], port: 15692, protocol: http/prometheus, purpose: Prometheus exporter API over HTTP
Node: rabbit@node01, interface: [::], port: 5672, protocol: amqp, purpose: AMQP 0-9-1 and AMQP 1.0
Node: rabbit@node02, interface: [::], port: 25672, protocol: clustering, purpose: inter-node and CLI tool communication
Node: rabbit@node02, interface: [::], port: 8080, protocol: http, purpose: HTTP API
Node: rabbit@node02, interface: [::], port: 443, protocol: https, purpose: HTTP API over TLS (HTTPS)
Node: rabbit@node02, interface: [::], port: 15692, protocol: http/prometheus, purpose: Prometheus exporter API over HTTP
Node: rabbit@node02, interface: [::], port: 5672, protocol: amqp, purpose: AMQP 0-9-1 and AMQP 1.0

Feature flags

Flag: classic_mirrored_queue_version, state: enabled
Flag: classic_queue_type_delivery_support, state: disabled
Flag: direct_exchange_routing_v2, state: disabled
Flag: drop_unroutable_metric, state: enabled
Flag: empty_basic_get_metric, state: enabled
Flag: feature_flags_v2, state: disabled
Flag: implicit_default_bindings, state: enabled
Flag: listener_records_in_ets, state: disabled
Flag: maintenance_mode_status, state: enabled
Flag: quorum_queue, state: enabled
Flag: stream_queue, state: enabled
Flag: stream_single_active_consumer, state: disabled
Flag: tracking_records_in_ets, state: disabled
Flag: user_limits, state: enabled
Flag: virtual_host_metadata, state: enabled

Output of the 3rd node (partition of one)

# rabbitmqctl cluster_status
Cluster status of node rabbit@node03 ...
Basics

Cluster name: rabbit_cluster_dev
Total CPU cores available cluster-wide: 2

Disk Nodes

rabbit@node01
rabbit@node02
rabbit@node03

Running Nodes

rabbit@node03

Versions

rabbit@node03: RabbitMQ 3.11.13 on Erlang 25.3

CPU Cores

Node: rabbit@node03, available CPU cores: 2

Maintenance status

Node: rabbit@node03, status: not under maintenance

Alarms

(none)

Network Partitions

Node rabbit@node03 cannot communicate with rabbit@node01, rabbit@node02

Listeners

Node: rabbit@node03, interface: [::], port: 25672, protocol: clustering, purpose: inter-node and CLI tool communication
Node: rabbit@node03, interface: [::], port: 8080, protocol: http, purpose: HTTP API
Node: rabbit@node03, interface: [::], port: 443, protocol: https, purpose: HTTP API over TLS (HTTPS)
Node: rabbit@node03, interface: [::], port: 15692, protocol: http/prometheus, purpose: Prometheus exporter API over HTTP
Node: rabbit@node03, interface: [::], port: 5672, protocol: amqp, purpose: AMQP 0-9-1 and AMQP 1.0

Feature flags

Flag: classic_mirrored_queue_version, state: enabled
Flag: classic_queue_type_delivery_support, state: disabled
Flag: direct_exchange_routing_v2, state: disabled
Flag: drop_unroutable_metric, state: enabled
Flag: empty_basic_get_metric, state: enabled
Flag: feature_flags_v2, state: disabled
Flag: implicit_default_bindings, state: enabled
Flag: listener_records_in_ets, state: disabled
Flag: maintenance_mode_status, state: enabled
Flag: quorum_queue, state: enabled
Flag: stream_queue, state: enabled
Flag: stream_single_active_consumer, state: disabled
Flag: tracking_records_in_ets, state: disabled
Flag: user_limits, state: enabled
Flag: virtual_host_metadata, state: enabled

Relevant log

Apr 29 12:42:38 node03 rabbitmq-server[617244]: 2023-04-29 12:42:33.240761+02:00 [error] <0.24425.0> ** Node rabbit@node01 not responding **
Apr 29 12:42:38 node03 rabbitmq-server[617244]: 2023-04-29 12:42:33.240761+02:00 [error] <0.24425.0> ** Removing (timedout) connection **
Apr 29 12:42:38 node03 rabbitmq-server[617244]: 2023-04-29 12:42:33.240761+02:00 [error] <0.24425.0> 
Apr 29 12:42:38 node03 rabbitmq-server[617244]: 2023-04-29 12:42:33.240764+02:00 [error] <0.12131.0> ** Node rabbit@node02 not responding **
Apr 29 12:42:38 node03 rabbitmq-server[617244]: 2023-04-29 12:42:33.240764+02:00 [error] <0.12131.0> ** Removing (timedout) connection **
Apr 29 12:42:38 node03 rabbitmq-server[617244]: 2023-04-29 12:42:33.240764+02:00 [error] <0.12131.0> 
Apr 29 12:42:38 node03 rabbitmq-server[617244]: 2023-04-29 12:42:33.241461+02:00 [info] <0.1156.0> rabbit on node rabbit@node01 down
Apr 29 12:42:38 node03 rabbitmq-server[617244]: 2023-04-29 12:42:33.244855+02:00 [warning] <0.2753.0> Management delegate query returned errors:
Apr 29 12:42:38 node03 rabbitmq-server[617244]: 2023-04-29 12:42:33.244855+02:00 [warning] <0.2753.0> [{<13741.2503.0>,{exit,{nodedown,rabbit@node01},[]}},
Apr 29 12:42:38 node03 rabbitmq-server[617244]: 2023-04-29 12:42:33.244855+02:00 [warning] <0.2753.0>  {<13742.2444.0>,{exit,{nodedown,rabbit@node02},[]}}]
Apr 29 12:42:38 node03 rabbitmq-server[617244]: 2023-04-29 12:42:33.244872+02:00 [warning] <0.2752.0> Management delegate query returned errors:
Apr 29 12:42:38 node03 rabbitmq-server[617244]: 2023-04-29 12:42:33.244872+02:00 [warning] <0.2752.0> [{<13741.2503.0>,{exit,{nodedown,rabbit@node01},[]}},
Apr 29 12:42:38 node03 rabbitmq-server[617244]: 2023-04-29 12:42:33.244872+02:00 [warning] <0.2752.0>  {<13742.2444.0>,{exit,{nodedown,rabbit@node02},[]}}]
Apr 29 12:42:42 node03 rabbitmq-server[617244]: 2023-04-29 12:42:42.493921+02:00 [info] <0.30232.665> accepting AMQP connection <0.30232.665> (10.194.48.77:13672 -> 10.201.32.148:5672)
Apr 29 12:42:43 node03 rabbitmq-server[617244]: 2023-04-29 12:42:43.366651+02:00 [error] <0.723.0> Mnesia(rabbit@node03): ** ERROR ** mnesia_event got {inconsistent_database, running_partitioned_network, rabbit@node01}
Apr 29 12:42:43 node03 rabbitmq-server[617244]: 2023-04-29 12:42:43.366651+02:00 [error] <0.723.0> 
Apr 29 12:42:43 node03 rabbitmq-server[617244]: 2023-04-29 12:42:43.366922+02:00 [info] <0.1156.0> Keeping rabbit@node01 listeners: the node is already back
Apr 29 12:42:43 node03 rabbitmq-server[617244]: 2023-04-29 12:42:43.374562+02:00 [error] <0.723.0> Mnesia(rabbit@node03): ** ERROR ** mnesia_event got {inconsistent_database, running_partitioned_network, rabbit@node02}
Apr 29 12:42:43 node03 rabbitmq-server[617244]: 2023-04-29 12:42:43.374562+02:00 [error] <0.723.0> 
Apr 29 12:42:43 node03 rabbitmq-server[617244]: 2023-04-29 12:42:43.378360+02:00 [info] <0.1156.0> node rabbit@node01 down: net_tick_timeout
Apr 29 12:42:43 node03 rabbitmq-server[617244]: 2023-04-29 12:42:43.378447+02:00 [info] <0.1156.0> rabbit on node rabbit@node02 down
Apr 29 12:42:43 node03 rabbitmq-server[617244]: 2023-04-29 12:42:43.535550+02:00 [info] <0.1156.0> Keeping rabbit@node02 listeners: the node is already back
Apr 29 12:42:43 node03 rabbitmq-server[617244]: 2023-04-29 12:42:43.536331+02:00 [info] <0.1156.0> node rabbit@node02 down: net_tick_timeout
Apr 29 12:42:43 node03 rabbitmq-server[617244]: 2023-04-29 12:42:43.536443+02:00 [info] <0.1156.0> node rabbit@node01 up
Apr 29 12:42:43 node03 rabbitmq-server[617244]: 2023-04-29 12:42:43.536658+02:00 [info] <0.1156.0> node rabbit@node02 up

Configuration

...
net_ticktime = 20
...
# ====================================================================================
# Clustering
# ====================================================================================
cluster_partition_handling = pause_minority
cluster_formation.peer_discovery_backend = classic_config

cluster_formation.classic_config.nodes.1 = rabbit@node01
cluster_formation.classic_config.nodes.2 = rabbit@node02
cluster_formation.classic_config.nodes.3 = rabbit@node03
...

lukebakken · 2023-05-05T13:24:18Z

lukebakken
May 5, 2023
Maintainer

Interrupt the connection of two nodes in a three-node-cluster simultaneously for a short period of time

Please provide the exact commands you are using to do this. Right now you're asking us to guess how to reproduce the issue the same way you have.

In addition, please attach your complete configuration file.

1 reply

justsomescripts May 5, 2023
Author

Interrupt the connection of two nodes in a three-node-cluster simultaneously for a short period of time

Please provide the exact commands you are using to do this. Right now you're asking us to guess how to reproduce the issue the same way you have.

In addition, please attach your complete configuration file.

I can't unfortunately, this was caused by a connection drop in the company network, don't know a way to reproduce this reliably locally. Would providing any additional information help in this case?

michaelklishin · 2023-05-05T13:36:31Z

michaelklishin
May 5, 2023
Maintainer

The cluster is not expected to pause on a single partition since two nodes will be in a majority.

Nodes do not necessarily detect such conditions instantly, the default inactivity timeout is 60 seconds IIRC, and can be configured but not to a value that's too low (say, not 5s).

0 replies

michaelklishin · 2023-05-05T13:40:26Z

michaelklishin
May 5, 2023
Maintainer

In addition, it's important to distinguish scenarios where, say, node A loses connection to B and C. Then it will pause itself as it will be in the minority.

A much trickier situation is when A disconnects from B but not from C. In this case some nodes will
try to voluntarily stop using a pretty poorly defined algorithm to "promote" a partial partition to a "full" one. In this case yes, sometimes you may end up with different behaviors but usually that's something that most users would prefer to avoid.

In RabbitMQ 4.0, these partition handling strategies will go away. The recovery strategy with Khepri will be that of Raft: the majority of nodes keeps going, and a majority of nodes must always be online, or your cluster will lose availability. So, only the connectivity of each replica to the currently elected leader controls whether it is available or needs to recover, if we oversimplify.

0 replies

justsomescripts · 2023-05-05T13:43:04Z

justsomescripts
May 5, 2023
Author

I still don't get it, node 3 was disconnected from 1 and 2, so not connected to any node at all. Why didn't it pause then? (been in this state for 3 days)

7 replies

lukebakken May 5, 2023
Maintainer

25672 is the port used for distributed Erlang. Please check that.

justsomescripts May 5, 2023
Author

25672 is the port used for distributed Erlang. Please check that.

Checked the history, I tested 25672 so was just a typo.

After a service restart (systemctl restart rabbitmq-server) of the single partitioned node (03) after 3 days the cluster also was in a working state again so connection was possible between the servers

lukebakken May 5, 2023
Maintainer

That's what I was going to suggest, restarting the disconnected node. This is also what is in the docs.

As to why the node wasn't paused, I'm not sure. I'll try to reproduce when I have time.

justsomescripts May 5, 2023
Author

Thanks for looking into this 👍 . Yes, the main issue is that the node isn't paused and therefore still can receive and send messages until manual intervention.

justsomescripts May 5, 2023
Author

I also found an old bug report describing a similar issue (but in an OpenShift cluster):
https://bugzilla.redhat.com/show_bug.cgi?id=1189480

In a three node cluster, configured to auto correct network partitions using pause_minority in cluster_partition_handling, the cluster will solve network partitions by shutting the tcp listener and restarting if the connectivity is regained. If the connectivity is lost for around 60 seconds this process is not initiated and the partition will remain. The impact on OpenStack HA with the mirrored queues across the cluster is that there will be two separate master databases active. A split brain situation, only resolved after restarting the rabbitmq-server on the minority node

justsomescripts · 2023-05-05T13:54:49Z

justsomescripts
May 5, 2023
Author

This is the complete config:

rabbitmq.conf

# ====================================================================================
# Networking
# ====================================================================================
listeners.tcp.default = 5672

tcp_listen_options.backlog        = 4096
tcp_listen_options.nodelay        = true
tcp_listen_options.linger.on      = true
tcp_listen_options.linger.timeout = 0
tcp_listen_options.exit_on_close  = false

ssl_options.versions.1     = tlsv1.3
ssl_options.versions.2     = tlsv1.2
ssl_options.versions.3     = tlsv1.1

management.ssl.port        = 443
management.ssl.ip          = 0.0.0.0
management.ssl.certfile    = /etc/rabbitmq/ssl/*********.crt.pem
management.ssl.keyfile     = /etc/rabbitmq/ssl/*********.key.pem
management.ssl.cacertfile  = /etc/rabbitmq/ssl/****.crt
management.tcp.port        = 8080        # Monitoring endpoint
management.tcp.ip          = 127.0.0.1   # Monitoring endpoint
management.rates_mode      = basic

# ====================================================================================
# Security, Access Control
# ====================================================================================
cluster_name = rabbit_cluster_dev

auth_backends.1 = internal
auth_backends.2 = cache

auth_cache.cached_backend = ldap
auth_cache.cache_ttl      = 3600000

auth_ldap.servers.1                        = ******
auth_ldap.servers.2                        = ******
auth_ldap.use_ssl                          = true
auth_ldap.port                             = 636
auth_ldap.dn_lookup_attribute              = userPrincipalName
auth_ldap.dn_lookup_base                   = DC=**,DC=**,DC=**
auth_ldap.user_dn_pattern                  = ${username}@**.**.**
auth_ldap.ssl_options.cacertfile           = /etc/rabbitmq/ssl/****.crt
auth_ldap.ssl_options.verify               = verify_peer
auth_ldap.ssl_options.fail_if_no_peer_cert = true

# ====================================================================================
# Default User / VHost
# ====================================================================================
default_vhost                   = /
default_user                    = guest
default_pass                    = guest
default_permissions.configure   = .*
default_permissions.read        = .*
default_permissions.write       = .*
default_user_tags.administrator = true

# ====================================================================================
# Resource Limits + Flow Control
# ====================================================================================
vm_memory_high_watermark.relative = 0.7
disk_free_limit.relative          = 2.0

# ====================================================================================
# Logging
# ====================================================================================
log.console = true
log.file    = false

# ====================================================================================
# Clustering
# ====================================================================================
cluster_partition_handling = pause_minority
cluster_formation.peer_discovery_backend = classic_config

cluster_formation.classic_config.nodes.1 = rabbit@node01
cluster_formation.classic_config.nodes.2 = rabbit@node02
cluster_formation.classic_config.nodes.3 = rabbit@node03

# ====================================================================================
# Kernel
# ====================================================================================
net_ticktime = 20

# ====================================================================================
# Prometheus
# ====================================================================================
prometheus.return_per_object_metrics = true

advanced.config

[
  {kernel, [
    {inet_default_connect_options, [{nodelay, true}]},
    {inet_default_listen_options,  [{nodelay, true}]}
  ]},
  {rabbitmq_auth_backend_ldap, [
    {vhost_access_query,    {'or', [
      {'and', [
        {equals, "${vhost}", "/"},
        {'or', [
          {in_group_nested, "CN=****-RMQ-****,OU=RabbitMQ,OU=**,OU=**,OU=**,DC=**,DC=**,DC=**"},
          {in_group_nested, "CN=****-RMQ-****,OU=RabbitMQ,OU=**,OU=**,OU=**,DC=**,DC=**,DC=**"},
          {in_group_nested, "CN=****-RMQ-****,OU=RabbitMQ,OU=**,OU=**,OU=**,DC=**,DC=**,DC=**"},
          {in_group_nested, "CN=****-RMQ-****,OU=RabbitMQ,OU=**,OU=**,OU=**,DC=**,DC=**,DC=**"},
          {in_group_nested, "CN=****-RMQ-****,OU=RabbitMQ,OU=**,OU=**,OU=**,DC=**,DC=**,DC=**"},
          {in_group_nested, "CN=****-RMQ-****,OU=RabbitMQ,OU=**,OU=**,OU=**,DC=**,DC=**,DC=**"},
          {in_group_nested, "CN=****-RMQ-****,OU=RabbitMQ,OU=**,OU=**,OU=**,DC=**,DC=**,DC=**"},
          {in_group_nested, "CN=****-RMQ-monitoring,OU=RabbitMQ,OU=**,OU=**,OU=**,DC=**,DC=**,DC=**"},
          {in_group_nested, "CN=****-RMQ-administrator,OU=RabbitMQ,OU=**,OU=**,OU=**,DC=**,DC=**,DC=**"}
        ]}
      ]},
      {'and', [
        {'not', {equals, "${vhost}", "/"}},
        {'or', [
          {in_group_nested, "CN=****-RMQ-${vhost},OU=RabbitMQ,OU=**,OU=**,OU=**,DC=**,DC=**,DC=**"},
          {in_group_nested, "CN=****-RMQ-monitoring,OU=RabbitMQ,OU=**,OU=**,OU=**,DC=**,DC=**,DC=**"}
        ]}
      ]}
    ]}
    },
    {resource_access_query, {for, [
      {permission, configure, {'or', [
        {'and', [
          {equals, "${vhost}", "/"},
          {match, "${name}", "^****-"},
          {in_group_nested, "CN=****-RMQ-****,OU=RabbitMQ,OU=**,OU=**,OU=**,DC=**,DC=**,DC=**"}
        ]},
        {'and', [
          {equals, "${vhost}", "/"},
          {match, "${name}", "^****-"},
          {in_group_nested, "CN=****-RMQ-****,OU=RabbitMQ,OU=**,OU=**,OU=**,DC=**,DC=**,DC=**"}
        ]},
        {'and', [
          {equals, "${vhost}", "/"},
          {match, "${name}", "^****-"},
          {in_group_nested, "CN=****-RMQ-****,OU=RabbitMQ,OU=**,OU=**,OU=**,DC=**,DC=**,DC=**"}
        ]},
        {'and', [
          {equals, "${vhost}", "/"},
          {match, "${name}", "^****-"},
          {in_group_nested, "CN=****-RMQ-****,OU=RabbitMQ,OU=**,OU=**,OU=**,DC=**,DC=**,DC=**"}
        ]},
        {'and', [
          {equals, "${vhost}", "/"},
          {match, "${name}", "^****-"},
          {in_group_nested, "CN=****-RMQ-****,OU=RabbitMQ,OU=**,OU=**,OU=**,DC=**,DC=**,DC=**"}
        ]},
        {'and', [
          {equals, "${vhost}", "/"},
          {match, "${name}", "^****-"},
          {in_group_nested, "CN=****-RMQ-****,OU=RabbitMQ,OU=**,OU=**,OU=**,DC=**,DC=**,DC=**"}
        ]},
        {'and', [
          {equals, "${vhost}", "/"},
          {match, "${name}", "^****-"},
          {in_group_nested, "CN=****-RMQ-****,OU=RabbitMQ,OU=**,OU=**,OU=**,DC=**,DC=**,DC=**"}
        ]},
          {'not', {equals, "${vhost}", "/"}}
        ]}
      },
      {permission, write, {'or', [
        {'and', [
          {equals, "${vhost}", "/"},
          {match, "${name}", "^****-"},
          {in_group_nested, "CN=****-RMQ-****,OU=RabbitMQ,OU=**,OU=**,OU=**,DC=**,DC=**,DC=**"}
        ]},
        {'and', [
          {equals, "${vhost}", "/"},
          {match, "${name}", "^****-"},
          {in_group_nested, "CN=****-RMQ-****,OU=RabbitMQ,OU=**,OU=**,OU=**,DC=**,DC=**,DC=**"}
        ]},
        {'and', [
          {equals, "${vhost}", "/"},
          {match, "${name}", "^****-"},
          {in_group_nested, "CN=****-RMQ-****,OU=RabbitMQ,OU=**,OU=**,OU=**,DC=**,DC=**,DC=**"}
        ]},
        {'and', [
          {equals, "${vhost}", "/"},
          {match, "${name}", "^****-"},
          {in_group_nested, "CN=****-RMQ-****,OU=RabbitMQ,OU=**,OU=**,OU=**,DC=**,DC=**,DC=**"}
        ]},
        {'and', [
          {equals, "${vhost}", "/"},
          {match, "${name}", "^****-"},
          {in_group_nested, "CN=****-RMQ-****,OU=RabbitMQ,OU=**,OU=**,OU=**,DC=**,DC=**,DC=**"}
        ]},
        {'and', [
          {equals, "${vhost}", "/"},
          {match, "${name}", "^****-"},
          {in_group_nested, "CN=****-RMQ-****,OU=RabbitMQ,OU=**,OU=**,OU=**,DC=**,DC=**,DC=**"}
        ]},
        {'and', [
          {equals, "${vhost}", "/"},
          {match, "${name}", "^****-"},
          {in_group_nested, "CN=****-RMQ-****,OU=RabbitMQ,OU=**,OU=**,OU=**,DC=**,DC=**,DC=**"}
        ]},
          {'not', {equals, "${vhost}", "/"}}
        ]}
      },
      {permission, read, {constant, true}}
    ]}
    },
    {tag_queries,           [
      {monitoring,    {in_group_nested, "CN=****-RMQ-****,OU=RabbitMQ,OU=**,OU=**,OU=**,DC=**,DC=**,DC=**"}},
      {monitoring,    {in_group_nested, "CN=****-RMQ-****,OU=RabbitMQ,OU=**,OU=**,OU=**,DC=**,DC=**,DC=**"}},
      {monitoring,    {in_group_nested, "CN=****-RMQ-****,OU=RabbitMQ,OU=**,OU=**,OU=**,DC=**,DC=**,DC=**"}},
      {monitoring,    {in_group_nested, "CN=****-RMQ-****,OU=RabbitMQ,OU=**,OU=**,OU=**,DC=**,DC=**,DC=**"}},
      {monitoring,    {in_group_nested, "CN=****-RMQ-****,OU=RabbitMQ,OU=**,OU=**,OU=**,DC=**,DC=**,DC=**"}},
      {monitoring,    {in_group_nested, "CN=****-RMQ-****,OU=RabbitMQ,OU=**,OU=**,OU=**,DC=**,DC=**,DC=**"}},
      {monitoring,    {in_group_nested, "CN=****-RMQ-****,OU=RabbitMQ,OU=**,OU=**,OU=**,DC=**,DC=**,DC=**"}},
      {monitoring,    {in_group_nested, "CN=****-RMQ-monitoring,OU=RabbitMQ,OU=**,OU=**,OU=**,DC=**,DC=**,DC=**"}},
      {administrator, {in_group_nested, "CN=****-RMQ-administrator,OU=RabbitMQ,OU=**,OU=**,OU=**,DC=**,DC=**,DC=**"}}
    ]}
  ]}
].

enabled_plugins

[rabbitmq_management,rabbitmq_prometheus,rabbitmq_auth_backend_ldap,rabbitmq_auth_backend_cache].

0 replies

lukebakken · 2023-05-05T15:08:16Z

lukebakken
May 5, 2023
Maintainer

@justsomescripts the status output you have provided does not show consistent node names. I see all of these names:

rabbit@tadevrmq03
rabbit@tadevrmq04
rabbit@node03
rabbit@node05
rabbit@node01
rabbit@node02

This doesn't appear to be a 3-node cluster...?

2 replies

justsomescripts May 5, 2023
Author

@justsomescripts the status output you have provided does not show consistent node names. I see all of these names:
rabbit@tadevrmq03
rabbit@tadevrmq04
rabbit@node03
rabbit@node05
rabbit@node01
rabbit@node02
This doesn't appear to be a 3-node cluster...?

Yes, three node cluster. Happened during cleanup. Here's the correct output:

Node 1

# rabbitmqctl cluster_status
Cluster status of node rabbit@node01 ...
Basics

Cluster name: rabbit_cluster_dev
Total CPU cores available cluster-wide: 4

Disk Nodes

rabbit@node01
rabbit@node02
rabbit@node03

Running Nodes

rabbit@node01
rabbit@node02

Versions

rabbit@node01: RabbitMQ 3.11.13 on Erlang 25.3
rabbit@node02: RabbitMQ 3.11.13 on Erlang 25.3

CPU Cores

Node: rabbit@node01, available CPU cores: 2
Node: rabbit@node02, available CPU cores: 2

Maintenance status

Node: rabbit@node01, status: not under maintenance
Node: rabbit@node02, status: not under maintenance

Alarms

(none)

Network Partitions

Node rabbit@node01 cannot communicate with rabbit@node03
Node rabbit@node02 cannot communicate with rabbit@node03

Listeners

Node: rabbit@node01, interface: [::], port: 25672, protocol: clustering, purpose: inter-node and CLI tool communication
Node: rabbit@node01, interface: [::], port: 8080, protocol: http, purpose: HTTP API
Node: rabbit@node01, interface: [::], port: 443, protocol: https, purpose: HTTP API over TLS (HTTPS)
Node: rabbit@node01, interface: [::], port: 15692, protocol: http/prometheus, purpose: Prometheus exporter API over HTTP
Node: rabbit@node01, interface: [::], port: 5672, protocol: amqp, purpose: AMQP 0-9-1 and AMQP 1.0
Node: rabbit@node02, interface: [::], port: 25672, protocol: clustering, purpose: inter-node and CLI tool communication
Node: rabbit@node02, interface: [::], port: 8080, protocol: http, purpose: HTTP API
Node: rabbit@node02, interface: [::], port: 443, protocol: https, purpose: HTTP API over TLS (HTTPS)
Node: rabbit@node02, interface: [::], port: 15692, protocol: http/prometheus, purpose: Prometheus exporter API over HTTP
Node: rabbit@node02, interface: [::], port: 5672, protocol: amqp, purpose: AMQP 0-9-1 and AMQP 1.0

Feature flags

Flag: classic_mirrored_queue_version, state: enabled
Flag: classic_queue_type_delivery_support, state: disabled
Flag: direct_exchange_routing_v2, state: disabled
Flag: drop_unroutable_metric, state: enabled
Flag: empty_basic_get_metric, state: enabled
Flag: feature_flags_v2, state: disabled
Flag: implicit_default_bindings, state: enabled
Flag: listener_records_in_ets, state: disabled
Flag: maintenance_mode_status, state: enabled
Flag: quorum_queue, state: enabled
Flag: stream_queue, state: enabled
Flag: stream_single_active_consumer, state: disabled
Flag: tracking_records_in_ets, state: disabled
Flag: user_limits, state: enabled
Flag: virtual_host_metadata, state: enabled

Node 3

# rabbitmqctl cluster_status
Cluster status of node rabbit@node03 ...
Basics

Cluster name: rabbit_cluster_dev
Total CPU cores available cluster-wide: 2

Disk Nodes

rabbit@node01
rabbit@node02
rabbit@node03

Running Nodes

rabbit@node03

Versions

rabbit@node03: RabbitMQ 3.11.13 on Erlang 25.3

CPU Cores

Node: rabbit@node03, available CPU cores: 2

Maintenance status

Node: rabbit@node03, status: not under maintenance

Alarms

(none)

Network Partitions

Node rabbit@node03 cannot communicate with rabbit@node01, rabbit@node02

Listeners

Node: rabbit@node03, interface: [::], port: 25672, protocol: clustering, purpose: inter-node and CLI tool communication
Node: rabbit@node03, interface: [::], port: 8080, protocol: http, purpose: HTTP API
Node: rabbit@node03, interface: [::], port: 443, protocol: https, purpose: HTTP API over TLS (HTTPS)
Node: rabbit@node03, interface: [::], port: 15692, protocol: http/prometheus, purpose: Prometheus exporter API over HTTP
Node: rabbit@node03, interface: [::], port: 5672, protocol: amqp, purpose: AMQP 0-9-1 and AMQP 1.0

Feature flags

Flag: classic_mirrored_queue_version, state: enabled
Flag: classic_queue_type_delivery_support, state: disabled
Flag: direct_exchange_routing_v2, state: disabled
Flag: drop_unroutable_metric, state: enabled
Flag: empty_basic_get_metric, state: enabled
Flag: feature_flags_v2, state: disabled
Flag: implicit_default_bindings, state: enabled
Flag: listener_records_in_ets, state: disabled
Flag: maintenance_mode_status, state: enabled
Flag: quorum_queue, state: enabled
Flag: stream_queue, state: enabled
Flag: stream_single_active_consumer, state: disabled
Flag: tracking_records_in_ets, state: disabled
Flag: user_limits, state: enabled
Flag: virtual_host_metadata, state: enabled

lukebakken May 5, 2023
Maintainer

Please note that I have used the <details> and <summary> tags to put large amounts of output into collapsible sections. This makes it MUCH easier to follow a discussion.

lukebakken · 2023-05-07T14:51:20Z

lukebakken
May 7, 2023
Maintainer

@justsomescripts I couldn't reproduce what you report using this project - https://github.com/lukebakken/docker-rabbitmq-cluster

When I disconnect a container from the network, RabbitMQ on that node pauses as expected. When I re-connect the container, I can see RabbitMQ try to start.

What I did find, however, is this serious bug - #8114

1 reply

justsomescripts May 7, 2023
Author

Interesting 🤔 . Thanks for looking into this, I'll try if I can somehow replicate this in a Docker / local Dev environment when I have time.

QwertyUser17 · 2024-08-28T13:45:48Z

QwertyUser17
Aug 28, 2024

I encountered a similar problem. The cluster_partition_handling = pause_minority setting is enabled, but after disconnecting one of the 3 nodes, it remains reachable, and applications can still connect to it.

rabbitmq.conf: |
      cluster_formation.peer_discovery_backend  = rabbit_peer_discovery_k8s
      cluster_formation.k8s.address_type = hostname
      cluster_formation.k8s.host = kubernetes.default
      cluster_formation.node_cleanup.interval = 10
      cluster_formation.node_cleanup.only_log_warning = true
      cluster_partition_handling = pause_minority
      queue_master_locator=min-masters
      disk_free_limit.absolute = 512MB
      vm_memory_high_watermark.absolute = 1GB
      cluster_name = atrabbit.cag.wargaming.net
      prometheus.return_per_object_metrics = true


apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: rabbitmq
  labels:
    app: rabbitmq
  namespace: rabbitmq
spec:
  serviceName: rabbitmq
  replicas: 3
  updateStrategy:
    type: RollingUpdate
  revisionHistoryLimit: 3
  podManagementPolicy: Parallel
  selector:
    matchLabels:
      app: rabbitmq
  template:
    metadata:
      name: rabbitmq
      labels:
        app: rabbitmq
    spec:
      terminationGracePeriodSeconds: 30
      securityContext:
        runAsUser: 1001
        fsGroup: 1001
      volumes:
        - name: config
          configMap:
            name: rabbitmq-config
            items:
            - key: rabbitmq.conf
              path: rabbitmq.conf
            - key: enabled_plugins
              path: enabled_plugins
        - name: rabbitmq-erlang-cookie
          emptyDir: {}
        - name: erlang-cookie-secret
          secret:
            secretName: rabbitmq-erlang-cookie
            defaultMode: 420
      initContainers:
      - name: erlang-cookie-hack
        image: busybox:latest
        imagePullPolicy: IfNotPresent
        command: ["sh", "-c", "cp /tmp/erlang-cookie-secret/.erlang.cookie /var/lib/rabbitmq/.erlang.cookie && chmod 600 /var/lib/rabbitmq/.erlang.cookie"]
        securityContext:
          privileged: true
        volumeMounts:
        - name: rabbitmq-erlang-cookie
          mountPath: /var/lib/rabbitmq/
        - name: erlang-cookie-secret
          mountPath: /tmp/erlang-cookie-secret/
      containers:
      - name: rabbitmq
        image: rabbitmq:management-alpine
        imagePullPolicy: Always
        volumeMounts:
          - name: config
            mountPath: /etc/rabbitmq
            readOnly: true
          - name: rabbitmq-data
            mountPath: /var/lib/rabbitmq/mnesia
          - name: rabbitmq-erlang-cookie
            mountPath: /var/lib/rabbitmq/
        ports:
          - name: http
            protocol: TCP
            containerPort: 15672
          - name: amqp
            protocol: TCP
            containerPort: 5672
          - name: prometheus
            protocol: TCP
            containerPort: 15692
        env:
          - name: MY_POD_NAME
            valueFrom:
              fieldRef:
                apiVersion: v1
                fieldPath: metadata.name
          - name: MY_POD_NAMESPACE
            valueFrom:
              fieldRef:
                fieldPath: metadata.namespace
          - name: RABBITMQ_USE_LONGNAME
            value: "true"
          - name: K8S_SERVICE_NAME
            value: rabbitmq
          - name: RABBITMQ_NODENAME
            value: rabbit@$(MY_POD_NAME).$(K8S_SERVICE_NAME).$(MY_POD_NAMESPACE)
          - name: K8S_HOSTNAME_SUFFIX
            value: .$(K8S_SERVICE_NAME).$(MY_POD_NAMESPACE)
          - name: RABBITMQ_DEFAULT_USER
            value: admin
          - name: RABBITMQ_DEFAULT_PASS
            value: admin
        resources:
          limits:
            memory: 1.5Gi
          requests:
            memory: 64Mi
            cpu: 0.2
        livenessProbe:
          exec:
            command:
              - sh
              - -ec
              - rabbitmq-diagnostics -q status
        ## Configure RabbitMQ containers' extra options for liveness probe st.2
        ## ref: https://www.rabbitmq.com/monitoring.html#health-checks
          initialDelaySeconds: 40
          periodSeconds: 30
          timeoutSeconds: 20
          failureThreshold: 6
          successThreshold: 1
        readinessProbe:
          exec:
            command:
              - sh
              - -ec
              - rabbitmq-diagnostics -q check_running && rabbitmq-diagnostics -q check_local_alarms
        ## Configure RabbitMQ containers' extra options for readiness probe st.3
        ## ref: https://www.rabbitmq.com/monitoring.html#health-checks
          initialDelaySeconds: 40
          periodSeconds: 30
          timeoutSeconds: 20
          failureThreshold: 3
          successThreshold: 1
      affinity:
        podAntiAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            - labelSelector:
                matchExpressions:
                  - key: app
                    operator: In
                    values:
                      - rabbitmq
              topologyKey: kubernetes.io/hostname

Reproduction steps: Pause any of the virtual hosts in VMware vSphere, wait for 30 seconds to 1 minute until the Kubernetes cluster reports that the node is unavailable, and then resume the host from the pause. As a result, the RabbitMQ cluster will split, with the cluster divided into two parts – 2 nodes and 1 node will operate in parallel.

The nature of the partition is as follows:

Node	Was partitioned from
[email protected]	[email protected] [email protected]

At the same time, none of the cluster state check commands will indicate that the 2-node group is in the minority. Examples:

/ $ rabbitmq-diagnostics -q alarms
Node [email protected] reported no alarms, local or clusterwide

/ $ rabbitmq-diagnostics check_running
Checking if RabbitMQ is running on node [email protected] ...
RabbitMQ on node [email protected] is fully booted and running

I would expect that in such a scenario, the cluster node that is in the minority would at least shut down its port so that applications can't connect to it, but that's not the case.

Example of checking port availability from inside the container:

/ $ rabbitmq-diagnostics check_port_listener 5672
Asking node [email protected] if there's an active listener on port 5672 ...
A listener for port 5672 is running on node [email protected].

Example of checking port availability from another Kubernetes namespace:

/app # telnet rabbitmq-2.rabbitmq.rabbitmq.svc.cluster.local 5672
Trying 10.0.0.31...
Connected to rabbitmq-2.rabbitmq.rabbitmq.svc.cluster.local.
Escape character is '^]'.

Could you suggest an alternative solution, other than manually restarting the node as mentioned in the documentation, or waiting for version 4.0?

3 replies

michaelklishin Aug 28, 2024
Maintainer

Hijacking an existing discussion is not particularly nice.

It's hard to suggest much without logs from all nodes, generally nodes log at the very least

When peers are down
When peers are back up
When peers are detected as partitioned

I'm also not sure why you verify connectivity on port 5672, that's not the port used for inter-node communication but rather for client connections.

Pausing a VM has more side effects than interrupting its network connections, as extensively described in #10701.

This is not a commonly reported issue and pause_minority is the most widely used option. But when it is tested, it's the network connectivity that is interrupted, not the OS process activity or its CPU. Without logs, I cannot really guess why.

You can try autoheal but it is a very aggressive recovery strategy that is difficult to recommend for everyone.

Finally, 4.0 is expected to ship in Sep-Oct but you will have to run Khepri, still a relatively young "feature" FWIW.

michaelklishin Aug 28, 2024
Maintainer

@CrazyMushu can you please move it into a separate discussion and share logs from all nodes there?

michaelklishin Aug 28, 2024
Maintainer

https://youtu.be/y2HAJBiXsw0?feature=shared&t=1232 is one example of pause_minority in action.

michaelklishin · 2024-08-28T14:30:02Z

michaelklishin
Aug 28, 2024
Maintainer

FTR, https://youtu.be/y2HAJBiXsw0?feature=shared&t=1232 is one example of pause_minority in action presented by the core team a while ago.

0 replies

pause_minority Cluster Partition Handling does not work as expected #8111

Uh oh!

Uh oh!

justsomescripts May 5, 2023

Describe the bug

Reproduction steps

Expected behavior

Additional context

Replies: 9 comments · 14 replies

Uh oh!

Uh oh!

lukebakken May 5, 2023 Maintainer

Uh oh!

justsomescripts May 5, 2023 Author

Uh oh!

michaelklishin May 5, 2023 Maintainer

Uh oh!

Uh oh!

michaelklishin May 5, 2023 Maintainer

Uh oh!

justsomescripts May 5, 2023 Author

Uh oh!

lukebakken May 5, 2023 Maintainer

Uh oh!

Uh oh!

justsomescripts May 5, 2023 Author

Uh oh!

lukebakken May 5, 2023 Maintainer

Uh oh!

justsomescripts May 5, 2023 Author

Uh oh!

justsomescripts May 5, 2023 Author

Uh oh!

Uh oh!

justsomescripts May 5, 2023 Author

Uh oh!

lukebakken May 5, 2023 Maintainer

Uh oh!

Uh oh!

justsomescripts May 5, 2023 Author

Uh oh!

lukebakken May 5, 2023 Maintainer

Uh oh!

lukebakken May 7, 2023 Maintainer

Uh oh!

justsomescripts May 7, 2023 Author

Uh oh!

Uh oh!

QwertyUser17 Aug 28, 2024

Uh oh!

michaelklishin Aug 28, 2024 Maintainer

Uh oh!

michaelklishin Aug 28, 2024 Maintainer

Uh oh!

michaelklishin Aug 28, 2024 Maintainer

Uh oh!

michaelklishin Aug 28, 2024 Maintainer

justsomescripts
May 5, 2023

Replies: 9 comments 14 replies

lukebakken
May 5, 2023
Maintainer

justsomescripts May 5, 2023
Author

michaelklishin
May 5, 2023
Maintainer

michaelklishin
May 5, 2023
Maintainer

justsomescripts
May 5, 2023
Author

lukebakken May 5, 2023
Maintainer

justsomescripts May 5, 2023
Author

lukebakken May 5, 2023
Maintainer

justsomescripts May 5, 2023
Author

justsomescripts May 5, 2023
Author

justsomescripts
May 5, 2023
Author

lukebakken
May 5, 2023
Maintainer

justsomescripts May 5, 2023
Author

lukebakken May 5, 2023
Maintainer

lukebakken
May 7, 2023
Maintainer

justsomescripts May 7, 2023
Author

QwertyUser17
Aug 28, 2024

michaelklishin Aug 28, 2024
Maintainer

michaelklishin Aug 28, 2024
Maintainer

michaelklishin Aug 28, 2024
Maintainer

michaelklishin
Aug 28, 2024
Maintainer