Cluster with all Quorum Queues performance limits #8113
Replies: 5 comments 14 replies
-
40K-50K per quorum queue is reasonable. You can grow up to a point by using more queues. With PerfTest, that may require a different set of queue name patterns. Network I/O, disk I/O, and CPU load can all be good indicators. Queue leader distribution can help, too. If you need more than 50K per queue, you can always use streams. You can reach several million messages a second with a few streams or superstreams with three replicas. Note that streams in part achieve this via parallelism: clients connect to ALL replicas when they can. If all connections go via a load balancer, the results will differ. |
Beta Was this translation helpful? Give feedback.
-
Hey Michael, thanks for your reply. 40/50K with X number of queues. I would
scale the StatefulSet to X number of replicas, each would create its own
queue.
I looked at Network, Disk and CPU all look healthy, that is why I’m a bit
puzzled onto what is hitting the limit.
…On Sat, 6 May 2023 at 14:35 Michael Klishin ***@***.***> wrote:
40K-50K per quorum queue is reasonable. You can grow up to a point by
using more queues. With PerfTest, that may require a different set of queue
name patterns.
Network I/O, disk I/O, and CPU load can all be good indicators. Queue
leader distribution can help, too.
If you need more than 50K per queue, you can always use streams. You can
reach several million messages a second with a few streams or superstreams
<https://blog.rabbitmq.com/posts/2022/07/rabbitmq-3-11-feature-preview-super-streams/>
with three replicas. Note that streams in part achieve this via
parallelism: clients connect to ALL replicas
<https://blog.rabbitmq.com/posts/2021/07/connecting-to-streams/> when
they can. If all connections go via a load balancer, the results will
differ.
—
Reply to this email directly, view it on GitHub
<#8113 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ACLS7COKHMTDPDUV6X4VE7LXEZARHANCNFSM6AAAAAAXXLCDYA>
.
You are receiving this because you authored the thread.Message ID:
***@***.***
com>
|
Beta Was this translation helpful? Give feedback.
-
Thanks, the clients are distributed through the cluster dns.
We are not looking to use streams, we are looking to understand why is RMQ
Queues are not providing more performance given no signs of resource
exhaustion.
…On Sat, 6 May 2023 at 15:53 Michael Klishin ***@***.***> wrote:
If clients predominantly to one node, it won't help to have more nodes.
Streams do not have this problem when clients can discover and connect to
the specific nodes they need. The stream protocol has this feature but none
of the messaging protocols RabbitMW supports does.
—
Reply to this email directly, view it on GitHub
<#8113 (reply in thread)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ACLS7CKQGNLITEK4YKCKS3LXEZJUXANCNFSM6AAAAAAXXLCDYA>
.
You are receiving this because you authored the thread.Message ID:
***@***.***
com>
|
Beta Was this translation helpful? Give feedback.
-
It looks like you're connecting to a load balancer -
...more than likely this results in all queue leaders running on the same node. @michaelklishin alluded to that in this comment I recommend taking the load balancer out of the equation and specifying your connection like this:
|
Beta Was this translation helpful? Give feedback.
-
Having a single metric to express "the server is overloaded" would be awesome but would be extremely hard/impossible. Based on your description, I'm fairly sure the bottleneck for you is the I'm not clear on what your expectation is here. If you thought that since you can do 30k/s with a single queue then you should be able to do 300k/s with 10 queues, then it's not something RabbitMQ can do (it'd be nice though! ;) ). We have PoC branch with multiple WAL files (a queue, when it's declared, is assigned to one of them). This could help with scaling but we need to dig it up, and test again. It'll also require more work as we can foresee all kinds of additional challenges (users trying to change the number of WAL up/down, etc). We keep improving performance in many areas and would love to do that based on a real use case. If you can provide the details of a real workload you are planning to run (or you wish RabbitMQ could handle), please share it. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Hi! We are running a cluster with around 50 Queues and PerfTest to test the limits of a given cluster setup and try to understand the metrics that will indicate our cluster does not have more capacity.
So far, we have been unable to find clear metrics of the cluster reaching its limit aside from the side-effect of the latency on the PerfTest metrics.
Setup:
c5.4xlarge
Cluster config:
PerfTest Config
With this setup we can reach around 30/40K msg/s, but even with more PerfTest pods we cant seem to get more performance and our PerfTest metrics start to go through the roof.
We tried:
Any help would be appreciated.
A bit of context: We are essentially trying to run RMQ as a platform for our teams but we have so far found no way of knowing when the cluster is reaching its limits and struggling quite a bit to guarantee or tune performance in this multi-tenancy scenario. eg. A client pushed fat messages, or a client shoves a lot of data (100k/s) suddenly to the cluster, etc. So we are exploring multi-cluster and other options, but at its core we are struggling with having good observability into our cluster, each time we had to analyze bad/good performance we had to dive into how each consumer/producer is using the cluster (is it using confirms, acks, etc).
Beta Was this translation helpful? Give feedback.
All reactions