rabbit_shovel_status consumes a significant majority of memory on one cluster node #14910

lilvinz · 2025-11-06T13:14:48Z

lilvinz
Nov 6, 2025

Describe the bug

In connection with creating a shovel this happens soon after creating it. It leads to the high memory watermark being hit and never recovers itself. The only way to "fix" it is by deleting the shovel AND restarting the affected node(s).

Reproduction steps

We can trigger it on a production installation but nowhere else. It is unclear which piece of the setup is triggering it.

Expected behavior

We expect the memory consumption to stay normal.

Additional context

We are happy to recreate on our env temporarily and provide whatever further insights are helpful.

mkuratczyk · 2025-11-06T13:48:03Z

mkuratczyk
Nov 6, 2025
Maintainer

what's the definition of that shovel and what's going on in the source queue (number of messages, size of those messages)

0 replies

lilvinz · 2025-11-06T14:07:04Z

lilvinz
Nov 6, 2025
Author

Its an in-cluster shovel from one vhost to another one:

~500 messages per second, message size below 1KiB each

We tried to narrow down this further. Here is what we understand so far:

doesnt matter in which vhost its created
after creation, one node (probably the one running the shovel) goes mad on shovel_status memory (10GiB+) within minutes
management api returns HTTP 500 on shovel status after creation and for all vhosts on the same cluster
when deleting the affected shovel, the memory consumption gets stuck high where it was on deletion
when deleting the affected shovel, management UI shovel status works again but only if it was the only shovel in the vhost otherwise it still fails
after deleting affected shovel and then restarting the node with the elevated memory using shovel_status process, things are back to normal
when running just the shovel in another cluster (same version) via network, that one then has the same problem

0 replies

mkuratczyk · 2025-11-06T15:18:32Z

mkuratczyk
Nov 6, 2025
Maintainer

A few more questions:

what's the source queue type?
how many messages are in the source queue when shovel starts?
how many messages are shovelled before it goes crazy (order of magnitue - none, a few, thousands?)

Things you can try:

add src-prefetch-count to the shovel definition, with a value of 10 or something for a start - perhaps it'll help
upgrade to 4.2.0 - lots of shovel fixes in this version
if 4.2.0 is also affected, you can then change the protocol from amqp091 to local to see if that helps (local shovels are new in 4.2)

0 replies

lilvinz · 2025-11-06T15:30:36Z

lilvinz
Nov 6, 2025
Author

Source queue type is classic. Its automatically created by the shovel and thus empty on start.
When no messages go through, nothing happens. As soon as we have messages, symptoms manifest within seconds.

Thanks for the proposals. We will try them all and report back.

0 replies

michaelklishin · 2025-11-06T20:03:34Z

michaelklishin
Nov 6, 2025
Maintainer

For starters, all shovels are hosted on a single node and each shovel is a small app that consumes from a source and re-publishers to a destination. You haven't mentioned how many shovels there are in the system.

Shovels consume messages and keep them in memory until after they can republish them and confirm them to the source queue.

By default, shovels use a prefetch of 1000, meaning up to 1000 messages will be kept in memory per shovel. 1000 messages * 5000 shovels * 1 kiB messages will require 5 GiB of memory just for keeping the message payloads in memory, ignoring all other protocol metadata and shovel worker state.

If for any reason publishing is much slower than consumption (which is the case where consumption is local but publishing is remote), you will have a constant in memory "backlog".
All those messages will be reported as the plugin's memory footprint.

Use a much lower prefetch (e.g. 10-30 or something like that, there are usually NO reasons to use the extremely conservative value of 1) by explicitly setting the src-prefetch-count key instead of relying on the default.

Distributed Shovels

Tanzu RabbitMQ provides a rabbitmq_distributed_shovel plugin that hosts a subset of shovels on each cluster node.

0 replies

lilvinz · 2025-11-07T06:51:02Z

lilvinz
Nov 7, 2025
Author

Thanks for the explanation. In this case it is all about a single shovel which makes the difference. Altogether on the cluster we run 7 shovels with similar or less workload without any issues. Adding that 11th shovel breaks it all.

0 replies

lilvinz · 2025-11-07T09:30:06Z

lilvinz
Nov 7, 2025
Author

We have just upraded the cluster to 4.2.0 and switchted to khepri_db. The issue is gone without any other change.
That means this is clearly a bug in 4.1.5 that we hit.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

rabbit_shovel_status consumes a significant majority of memory on one cluster node #14910

Uh oh!

{{title}}

Uh oh!

Replies: 7 comments

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

rabbit_shovel_status consumes a significant majority of memory on one cluster node #14910

Uh oh!

lilvinz Nov 6, 2025

Describe the bug

Reproduction steps

Expected behavior

Additional context

Replies: 7 comments

Uh oh!

mkuratczyk Nov 6, 2025 Maintainer

Uh oh!

Uh oh!

lilvinz Nov 6, 2025 Author

Uh oh!

mkuratczyk Nov 6, 2025 Maintainer

Uh oh!

lilvinz Nov 6, 2025 Author

Uh oh!

Uh oh!

michaelklishin Nov 6, 2025 Maintainer

Distributed Shovels

Uh oh!

lilvinz Nov 7, 2025 Author

Uh oh!

lilvinz Nov 7, 2025 Author

lilvinz
Nov 6, 2025

mkuratczyk
Nov 6, 2025
Maintainer

lilvinz
Nov 6, 2025
Author

mkuratczyk
Nov 6, 2025
Maintainer

lilvinz
Nov 6, 2025
Author

michaelklishin
Nov 6, 2025
Maintainer

lilvinz
Nov 7, 2025
Author

lilvinz
Nov 7, 2025
Author