Replies: 8 comments 5 replies
-
You have ~10000 quorum queues which is already stretching it and judging by their names, you probably have a high quorum queue churn as well? Combined with connection churn, most likely you are overloading the Erlang distribution links (communication within the cluster). Step one is probably to reduce this load, probably by reconsidering your topology (number and type of queues) and preventing connection churn. |
Beta Was this translation helpful? Give feedback.
-
The queues are static.. not dynamic.. they do not delete.. so there is no queue churn... The name of the queue has UUID.. but they are static in nature... |
Beta Was this translation helpful? Give feedback.
-
I see a lot of
The above happened when a consumer tried to start. My guess is that your cluster is overloaded. @ankitgr8 if you have a support contract with Broadcom for RabbitMQ, please use the official support channels. cc @kjnilsson |
Beta Was this translation helpful? Give feedback.
-
We had 3 node cluster.. and there was 1:1 mapping of POD and NODE, each POD is part of one dedicated node.. With this configuration.. we are seeing issue as mentioned above... after moving all PODS in a single NODE.. we did not encounter any issue.. So what I am not clear is the ERLANG inter node(in this case inter POD.. since all the PODS are in same NODE..) is still happening and all other configuration and load remains the same, why we do not see any issue.. Once we move PODS accross NODE, what ERLANG communication changes?...Does "Inter-node Communication Buffer Size" has any role to play... since I can see these warning in my logs -- "rabbit_sysmon_handler busy_dist_port" |
Beta Was this translation helpful? Give feedback.
-
And also does Streams will perform better with quorum queues.. with such a scale (no of queues).. we are aiming to have 100k quorum queues. |
Beta Was this translation helpful? Give feedback.
-
At a high level the requirements are-- We have device which report data.. the count of these device can go up to 20k.
|
Beta Was this translation helpful? Give feedback.
-
@mkuratczyk .. Shared High level requirements.. Any guidance on the pattern we should follow in rabbitMQ... |
Beta Was this translation helpful? Give feedback.
-
@mkuratczyk Thanks for the reply. There is no issue for publisher, publisher can send data to single queue. The issue is at the consumer end only...
So as per your suggestion we can use consistent hash exchange which can send message to 200 queues.. and all the 20k devices messages are distributes among these 200 queues... The Only issue we have is.. if some messages in queue take time to process. they impact the processing of messages from another device. if both are sending message to same queue.----This was one of the reason of segregating the queues per device. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Below is the image from our setup where suddenly the Memory for two POD goes UP and it came down only when we restarted the POD..
During this time there were lot AMQP Error connections, and we believe that due to this error connection.. there was huge Channel and connection churn happened which may be the reason for the low memory.. We have seen this only for rabbitmq-cluster-0 and rabbitmq-cluster-1 nodes only.. The memory breakdown does show the memory consumption in "other_proc" which is pointing to the internal rabbitmq_event being generated due to channel/connection churn
What we are looking is to know the reason of why the connections were getting disconnected.
Attached are the logs for all the 3 Nodes of a RabbitMQ Cluster..
rabbitmq-cluster-0.tar.gz
rabbitmq-cluster-0-1.tar.gz
rabbitmq-cluster-1.tar.gz
rabbitmq-cluster-2.tar.gz
Beta Was this translation helpful? Give feedback.
All reactions