Queue Federation doesn't work with V2 Classic Queues #8297

jefftrimm · 2023-05-23T15:45:09Z

jefftrimm
May 23, 2023

Describe the bug

A recent upgrade of RabbitMQ from 3.9.x --> 3.11.x and use of V2 Classic queues has broken queue federation between our clusters. This queue Federation has worked fine for us, with no change in upstreams or federation policy, for multiple years. Queue Federation seems to be broken with V2 Classic Mirrored Queues.

Reproduction steps

Create two clusters
Create the same name queue in both clusters as a V2 classic queue
Create the appropriate upstreams and policy to federate the queue
Add a consumer to the 2nd cluster
publish messages to the queue in the first cluster
They will not be consumed by the consumer in the second cluster.
...

Expected behavior

Create two clusters
Create the same name queue in both clusters as a V2 classic queue
Create the appropriate upstreams and policy to federate the queue
Add a consumer to the 2nd cluster
publish messages to the queue in the first cluster
Messages will be forwarded via Federation to the queue in the second cluster and the consumer in the 2nd cluster will receive and consume the messages.

Additional context

This appears to have broken with the move to V2 Classic Queues in 3.11.x

Answered by michaelklishin

May 23, 2023

Queue federation does not depend on what storage implementation of classic queues is used. I find it hard to believe that it is CQv2 that are the root cause.

A more likely cause is a common misunderstanding of how queue federation works. Queue federation only moves messages between clusters if there are no local consumers. Federation links are very low (negative) priority consumers that only kick in if the node does not have any local consumers.

View full answer

michaelklishin · 2023-05-23T15:50:33Z

michaelklishin
May 23, 2023
Maintainer

Queue federation does not depend on what storage implementation of classic queues is used. I find it hard to believe that it is CQv2 that are the root cause.

A more likely cause is a common misunderstanding of how queue federation works. Queue federation only moves messages between clusters if there are no local consumers. Federation links are very low (negative) priority consumers that only kick in if the node does not have any local consumers.

3 replies

jefftrimm May 23, 2023
Author

I understand exactly how queue federation works, and we've been using as such it in production for 7 years. It has worked exactly as you describe up until our upgrade from 3.9.X --> 3.11.X, with a change of the queues from V1 to V2...

michaelklishin May 23, 2023
Maintainer

Two members of our team cannot reproduce the issue. We have provided details of how the test was conducted, and evidence of queue federation links transferring messages as expected.

If you want us to continue investigating, please put together an executable way to reproduce, or at least a description similar to what I have, plus logs from all nodes. Guessing is a very expensive way of troubleshooting distributed systems.

lukebakken May 23, 2023
Maintainer

up until our upgrade from 3.9.X --> 3.11.X, with a change of the queues from V1 to V2

How was the upgrade performed?
How were the queues "changed" from V1 to V2?

Providing a definitions export from all environments would be a good start in helping us assist you.

michaelklishin · 2023-05-23T15:51:58Z

michaelklishin
May 23, 2023
Maintainer

If you want messages to be replicated from cluster A to B, you want exchange federation. If you want them to be unconditionally moved from A to B, you want rabbitmq_shovel. Queue federation only moves messages where there are consumers, and if every federated side always has local consumers, it does not make much sense to use queue federation.

0 replies

kjnilsson · 2023-05-23T16:03:12Z

kjnilsson
May 23, 2023
Maintainer

I just tested this and it (the instructions in "Expected Behaviour" seems to work just fine.

I think we'd need very detailed instructions with the exact policies / parameters etc to investigate further.

0 replies

michaelklishin · 2023-05-23T16:13:06Z

michaelklishin
May 23, 2023
Maintainer

Here are some specific steps that can be used to reproduce:

Given two nodes, A (standard ports) and B (AMQP port: 5673, HTTP API: 15673) with the following
setup:

# config.a.conf
classic_queue.default_version = 2

# config.b.conf
classic_queue.default_version = 2

management.tcp.port = 15673

# node A
rabbitmqadmin --port 15672 list queues

+----------+----------+
|   name   | messages |
+----------+----------+
| a.cqv2.1 | 0        |
+----------+----------+

rabbitmqadmin --port15672 list parameters

+-------+----------------+---------------------+------------------------------------------------------------------------------------+
| vhost |      name      |      component      |                                       value                                        |
+-------+----------------+---------------------+------------------------------------------------------------------------------------+
| /     | localhost-5673 | federation-upstream | {"ack-mode": "on-confirm", "trust-user-id": false, "uri": "amqp://localhost:5673"} |
+-------+----------------+---------------------+------------------------------------------------------------------------------------+

rabbitmqadmin --port 15672 list policies

+-------+--------------+----------------+------------------------------------+------------+----------+
| vhost |     name     |    apply-to    |             definition             |  pattern   | priority |
+-------+--------------+----------------+------------------------------------+------------+----------+
| /     | q.federation | classic_queues | {"federation-upstream-set": "all"} | a\.cqv2\.* | 0        |
+-------+--------------+----------------+------------------------------------+------------+----------+

# node B
rabbitmqadmin --port 15673 list queues
+----------+----------+
|   name   | messages |
+----------+----------+
| a.cqv2.1 | 0        |
+----------+----------+

rabbitmqadmin --port 15673 list connections
+-----------------------------------+-------+----------+
|               name                | user  | channels |
+-----------------------------------+-------+----------+
| 127.0.0.1:54941 -> 127.0.0.1:5673 | guest | 1        |
+-----------------------------------+-------+----------+

I can observe a federation link from A to B:

xh --json http://guest:guest@localhost:15673/api/connections
HTTP/1.1 200 OK
# [elided for brevity]
[
    {
# elided for brevity
        "channels": 1,
        "client_properties": {
            "capabilities": {
                "authentication_failure_close": true,
                "basic.nack": true,
                "connection.blocked": true,
                "consumer_cancel_notify": true,
                "exchange_exchange_bindings": true,
                "publisher_confirms": true
            },
            "connection_name": "Federation link (upstream: localhost-5673, policy: q.federation)",
            "copyright": "Copyright (c) 2007-2023 VMware, Inc. or its affiliates.",
            "information": "Licensed under the MPL.  See https://www.rabbitmq.com/",
            "platform": "Erlang",
            "product": "RabbitMQ",
            "version": "3.13.0"
        },
        "connected_at": 1684857557593,
# elided for brevity
        "host": "127.0.0.1",
        "name": "127.0.0.1:54941 -> 127.0.0.1:5673",
        "node": "hare@sunnyside",
# elided for brevity
        "state": "running",
        "timeout": 10,
        "type": "network",
        "user": "guest",
        "user_provided_name": "Federation link (upstream: localhost-5673, policy: q.federation)",
        "user_who_performed_action": "guest",
        "vhost": "/"
    }
]

I open a publishing connection to node B and a consuming one to node A:

c72 = Bunny.new(port: 5672); c72.start
c73 = Bunny.new(port: 5673); c73.start

ch1 = c72.create_channel
ch2 = c73.create_channel

q1 = ch1.queue("a.cqv2.1", durable: true)
q2 = ch2.queue("a.cqv2.1", durable: true)

q1.subscribe { |*delivery| puts(delivery) }

# publish on ch2

0 replies

michaelklishin · 2023-05-23T17:47:27Z

michaelklishin
May 23, 2023
Maintainer

Here is some evidence that queue federation works as expected with CQv1 with the setup above (or a slightly modified version, to test both CQv1 and CQv2 using a single pair of nodes).

Evidence for CQv1

Below you see a Federation link connection that has a non-zero transfer rate, and two queues with identical names, one on side B (the upstream), and another on side A (the downstream), having
comparable egress and ingress rates, respectively:

The above screenshots are with CQv1:

Evidence for CQv2

Now CQv2:

And the same screenshots with non-zero rates as I was publishing 2M messages to the upstream queue:

0 replies

michaelklishin · 2023-05-24T10:25:00Z

michaelklishin
May 24, 2023
Maintainer

Having tested a few different scenarios makes me fairly certain that the behavior comes down to
queue federation link pausing: federation links react to certain queue metrics and can pause
themselves.

This paused state can be easily observed only by inspecting the number of consumers on the
upstream queue. Federation link state does not reflect that, and we did not log any pause/unpause state transitions even at debug level (we will once #8282 is finished and merged).

While CQv2 is very unlikely to report the relevant metrics differently, we will take a look.

1 reply

michaelklishin May 24, 2023
Maintainer

Relevant: #8321

Queue Federation doesn't work with V2 Classic Queues #8297

Uh oh!

jefftrimm May 23, 2023

Describe the bug

Reproduction steps

Expected behavior

Additional context

Replies: 6 comments · 4 replies

Uh oh!

michaelklishin May 23, 2023 Maintainer

Uh oh!

jefftrimm May 23, 2023 Author

Uh oh!

michaelklishin May 23, 2023 Maintainer

Uh oh!

Uh oh!

lukebakken May 23, 2023 Maintainer

Uh oh!

michaelklishin May 23, 2023 Maintainer

Uh oh!

kjnilsson May 23, 2023 Maintainer

Uh oh!

Uh oh!

michaelklishin May 23, 2023 Maintainer

Uh oh!

michaelklishin May 23, 2023 Maintainer

Evidence for CQv1

Evidence for CQv2

Uh oh!

michaelklishin May 24, 2023 Maintainer

Uh oh!

michaelklishin May 24, 2023 Maintainer

jefftrimm
May 23, 2023

Replies: 6 comments 4 replies

michaelklishin
May 23, 2023
Maintainer

jefftrimm May 23, 2023
Author

michaelklishin May 23, 2023
Maintainer

lukebakken May 23, 2023
Maintainer

michaelklishin
May 23, 2023
Maintainer

kjnilsson
May 23, 2023
Maintainer

michaelklishin
May 23, 2023
Maintainer

michaelklishin
May 23, 2023
Maintainer

michaelklishin
May 24, 2023
Maintainer

michaelklishin May 24, 2023
Maintainer