Skip to content

Message duplication #3064

@davidich

Description

@davidich

I'm experiencing a message duplication as if it was possible to bind a queue twice to an exchange. I see a consumer ack rate twice as big as a publish rate while the redelivered rate is 0.
The image shows the problem symptoms before 8:22 (The green line is twice bigger than the yellow one - consumers get the same message twice at the same time).
RabbitMessageDuplicates
Setup details:
RabbitMQ version: 3.8.14
One publisher -> Fanout exchange -> One quorum queue -> Four consumers.
Every consumer creates 10 channels with Prefetch Count = 5.

The problem occurs after consumer service restart (it is a docker swarm service with 4 instances). I noticed that the more consumers I have, the easier it to reproduce the issue. With 4 consumer instances, I can reproduce the issue 5 out of 10 times, while with 2 consumers I was able to reproduce it 2 out of 10 restarts. And with 1 consumer instance, I couldn't reproduce it after 10 restarts.

Some other oddities in RabbitMQ behavior I noticed during the issue:

  • once message duplication happens I always have stuck Unacked messages. The number of unacked messages doesn't change over time and the value is always a multiple of 50 (10 channels * 5 prefetch count). I observed this correlation every time I saw message duplication on the consumer side (10+ times).
  • when duplicate messages are received, they both have redelivered property = false (as if it was delivered for the first time).
  • in web management, all stuck unacked messages are shown only at the queue and connection level, but at the channels page, all Unacked counters are 0.

The problem was solved after I renamed exchange+queue. Basically, I appended v2 to the previous names of exchange and queue.
With a new set of exchange+queue I tried restarting my consumers 10 times and didn't see the issue. My "bad" pair of queue&exchange are still on the server and after I re-pointed my producer and consumer to the original queue, I was able to reproduce the issue 3 out of 5 restarts. So I believe this is something specific to a particular instance of a queue or exchange. Also, I believe this happened after we had an incident in our prod environment. One host server in the swarm cluster ran out of available memory. DevOps team just killed the app with a memory leak and RabbitMQ restored itself. At the same moment we started to see stuck unacked messages for the first time.
Also, DevOps team confirmed that the metadata of our Old and New exchange and Old and New queue in DETS file are identical.

Some additional details about Rabbit setup:
CONFIG:
loopback_users.admin = false
cluster_formation.peer_discovery_backend = classic_config
cluster_formation.node_cleanup.only_log_warning = true
cluster_formation.classic_config.nodes.1 = rabbit@rabbitmq-1
cluster_formation.classic_config.nodes.2 = rabbit@rabbitmq-2
cluster_formation.classic_config.nodes.3 = rabbit@rabbitmq-3
cluster_formation.classic_config.nodes.4 = rabbit@rabbitmq-4
cluster_partition_handling = autoheal
#Flow Control is triggered if memory usage above %80.
vm_memory_high_watermark.relative = 0.8
#Flow Control is triggered if free disk size below 5GB.
disk_free_limit.absolute = 5GB
prometheus.return_per_object_metrics = true
Plugins
[rabbitmq_prometheus,
rabbitmq_management,
rabbitmq_federation,
rabbitmq_federation_management,
rabbitmq_shovel,
rabbitmq_shovel_management].

Leader
rabbit@rabbitmq-1
Online
rabbit@rabbitmq-2
rabbit@rabbitmq-1
rabbit@rabbitmq-3
Members
rabbit@rabbitmq-2
rabbit@rabbitmq-1
rabbit@rabbitmq-3

Will be happy to provide any further details.

Thanks,
Alex

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions