HTTP API get '/queues' response delay up to 40+seconds when 1 of 3 cluster nodes goes down (quorum queues) #6048
-
RabbitMQ 3.10.6, Erlang 23.3.4.7. 3-node cluster, quorum queues. I'm faced with the issue using RabbitMQ cluster (3 nodes), quorum queues & management-plugin's REST API. In conditions when all nodes are up & running GET request for queues list (GET /api/queues) takes at most 300ms. If we shutdown one of cluster node completely (with OS) - the same request started to hang for up to 40 seconds. After short investigation figured out:
Steps to reproduce:
3)Stop OS on one of the node (zmpha-wfan-10024-master1.wflab.io for example), perform HTTP GET request to any other node. This behavior block some pretty standard maintenance operations, cause in our scenario we need to periodically gather queue statistics. |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 24 replies
-
This works expected. When you query all queues or all connections or other "all things" endpoints, the node that handles the request aggregates the results from its peers and returns them to the client. There is a certain timeout involved. If one node is disconnected without being shut down, all requests to it from its peers will block until they time out. Monitoring using This problem is not present in the Prometheus endpoint because every node returns only its own stats, and the aggregation is done at a later point by tools such as Grafana. Prometheus endpoing scraping is recommended for other reasons which are documented in the RabbitMQ guide on monitoring. |
Beta Was this translation helpful? Give feedback.
This works expected.
When you query all queues or all connections or other "all things" endpoints, the node that handles the request aggregates the results from its peers and returns them to the client. There is a certain timeout involved. If one node is disconnected without being shut down, all requests to it from its peers will block until they time out.
Monitoring using
GET /api/queues
,GET /api/connections
and in late 2022, using HTTP API queries at all is wrong. It is very common to see people use those endpoints to get a single field from a single object. That's really wasteful, as most of the metrics returned are not used at all.This problem is not present in the Prometheus endpoint…