[Questions] When exactly triggers a consumer acknowledge timeout? #12446

lukaseckert · 2024-10-04T12:09:30Z

lukaseckert
Oct 4, 2024

Hi,

I'm seeing a delivery acknowledgement timeout on a server every couple of days and have not been able to manually reproduce such a timeout with an intentionally blocked consumer. Therefore my question is, what exactly triggers the timeout (if not a sleeping consumer like in the code below)?

Please see the following RabbitMQ screenshot:

The queue has a consumer timeout of 120sec
Marker (1): A single message is delivered with manual ack. The consumer then enter Thread.sleep() for 10 minutes before acking. See code below.
Marker (2): For the next 10 minutes, there are no consumer acks
Marker (3): The unacked message stays unacked for at least 10 min. No consumer timeout appears.

Community Support Policy

I have read RabbitMQ's Community Support Policy
I agree to provide all relevant information (versions, logs, rabbitmq-diagnostics output, detailed reproduction steps)

RabbitMQ version used

4.0.2

Erlang version used

26.2.x

Operating system (distribution) used

Ubuntu 24

How is RabbitMQ deployed?

Community Docker image

rabbitmq-diagnostics status output

Logs from node 1 (with sensitive values edited out)

(Note: The timeout here is the production timeout of 1800000ms, the screenshot above however shows the local setup of my unsuccessful try to reproduce the error with a shorter timeout)

2024-09-23 06:58:01.119848+02:00 [warning] <0.10612.1710> Consumer 111941 on channel 4 has timed out waiting for delivery acknowledgement. Timeout used: 1800000 ms. This timeout value can be configured, see consumers doc guide to learn more

2024-09-23 06:58:01.120335+02:00 [error] <0.10612.1710> Channel error on connection <0.10594.1710> (172.18.0.16:54664 -> 172.18.0.2:5673, vhost: '/', user: 'binary-service'), channel 4:

2024-09-23 06:58:01.120335+02:00 [error] <0.10612.1710> operation none caused a channel exception precondition_failed: delivery acknowledgement on channel 4 timed out. Timeout value used: 1800000 ms. This timeout value can be configured, see consumers doc guide to learn more

Logs from node 2 (if applicable, with sensitive values edited out)

No response

Logs from node 3 (if applicable, with sensitive values edited out)

No response

rabbitmq.conf

loopback_users.guest = false
listeners.tcp.default = 5672
channel_max = 1000
listeners.ssl.default = 5673

# to make the default setting "vm_memory_high_watermark.relative = 0.4" work,
# rabbitmq needs to be made aware of the cgroup memory limit:
total_memory_available_override_value = 1024MB
#... various TLS settings
# own settings
management.load_definitions = ...
tcp_listen_options.keepalive = true

Steps to deploy RabbitMQ cluster

docker run Commands

Steps to reproduce the behavior in question

Currently I'm not aware of a way to manually reproduce the timeout, however the timeout is encountered every couple of days for a long-running application instance.

advanced.config

No response

Application code

@RabbitListener(
    queues = "#{userQueue.name}",
    id = "liveSyncQueue", 
    autoStartup = "false", 
    exclusive = true,
    ackMode = "MANUAL")   // <---- manual ack mode
public class RabbitController {

    @RabbitHandler
    public void receiveUserSyncMessage(
        Channel channel,
        @Header(AmqpHeaders.DELIVERY_TAG) long tag,
        @Payload UserMessageBody userMessageBody
    ) throws IOException {
        try {
            log.info("Sleep after user message");
            Thread.sleep(10 * 60 * 1000); // 10 min sleep
        } catch (InterruptedException e) {
            throw new RuntimeException(e);
        }
        // processing code
        channel.basicAck(tag, false); // manual ack
    }
}

Kubernetes deployment file

No response

Answered by michaelklishin

Oct 5, 2024

@lukaseckert the relevant doc section has an answer:

If a consumer does not ack its delivery for more than the timeout value (30 minutes by default),
its channel will be closed with a PRECONDITION_FAILED channel exception.

Here is where it happens in the code. There's nothing particularly creative there, it's just a timer which is not perfectly precise but for what this feature does, it is perfectly fine.

View full answer

mkuratczyk · 2024-10-04T13:24:08Z

mkuratczyk
Oct 4, 2024
Maintainer

First of all, it's important to keep in mind that this timeout is not accurate. In fact, we know it usually takes 1 minute more than the configured value to trigger the timeout. We haven't fixed that because it's not important - this timeout is just a safety precaution to prevent broken consumers hanging for a very long time, so 1 minute doesn't make a difference.

Having said that, for you it's not about just 1 additional minute. I just tried with https://perftest.rabbitmq.com/ and everything works as expected (with the +1 minute caveat). If I run this:

$ date; perf-test --exclusive -qa x-consumer-timeout=60000 -C 1 -D 1 -L 600000000

2 minutes later I see the consumer timeout logged.

Does it work for you with perf-test?

0 replies

michaelklishin · 2024-10-05T00:37:43Z

michaelklishin
Oct 5, 2024
Maintainer

@lukaseckert the relevant doc section has an answer:

If a consumer does not ack its delivery for more than the timeout value (30 minutes by default),
its channel will be closed with a PRECONDITION_FAILED channel exception.

Here is where it happens in the code. There's nothing particularly creative there, it's just a timer which is not perfectly precise but for what this feature does, it is perfectly fine.

1 reply

lukaseckert Oct 8, 2024
Author

Thanks, marking this as an answer, the key is the ...when is_integer(Timeout)... part :)

lukaseckert · 2024-10-07T12:13:00Z

lukaseckert
Oct 7, 2024
Author

Hi @mkuratczyk and @michaelklishin
thank you for the quick response. The perftest tool was a valuable hint (as you said, here the consumer timeout is triggered). For the last two days of debugging I couldn't find the difference in the apps... Until comparing the TCP traffic in Wireshark frame by frame, seeing the perftest tool send in the Queue.Declare command
Arguments: x-consumer-timeout (long int): 60000
and the Spring App sending
Arguments: x-consumer-timeout (string): 60000
🤦‍♂️

While is is completely my mistake, may I suggest to add a note in the docs that any non-integer value for the x-consumer-timeout property is silently ignored, or to log a warning when setting a non-integer value for it?

I think this is especially important because from the management UI, there is no way to see the difference:

On one hand, the correct (integer) configuration of the perftest tool is incorrectly displayed as "string" in the UI when hovering the queue argument:
And on the other hand, the wrong (string) consumer timeout is incorrectly displayed as regular consumer timeout in the consumer list although it has no effect:

1 reply

mkuratczyk Oct 8, 2024
Maintainer

Yeah, we should either reject an incorrect type or convert it and make it work. I can look into this at some point.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Questions] When exactly triggers a consumer acknowledge timeout? #12446

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 3 comments 2 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

[Questions] When exactly triggers a consumer acknowledge timeout? #12446

Uh oh!

Uh oh!

lukaseckert Oct 4, 2024

Community Support Policy

RabbitMQ version used

Erlang version used

Operating system (distribution) used

How is RabbitMQ deployed?

rabbitmq-diagnostics status output

Logs from node 1 (with sensitive values edited out)

Logs from node 2 (if applicable, with sensitive values edited out)

Logs from node 3 (if applicable, with sensitive values edited out)

rabbitmq.conf

Steps to deploy RabbitMQ cluster

Steps to reproduce the behavior in question

advanced.config

Application code

Kubernetes deployment file

Replies: 3 comments · 2 replies

Uh oh!

mkuratczyk Oct 4, 2024 Maintainer

Uh oh!

Uh oh!

michaelklishin Oct 5, 2024 Maintainer

Uh oh!

lukaseckert Oct 8, 2024 Author

Uh oh!

lukaseckert Oct 7, 2024 Author

Uh oh!

mkuratczyk Oct 8, 2024 Maintainer

lukaseckert
Oct 4, 2024

Replies: 3 comments 2 replies

mkuratczyk
Oct 4, 2024
Maintainer

michaelklishin
Oct 5, 2024
Maintainer

lukaseckert Oct 8, 2024
Author

lukaseckert
Oct 7, 2024
Author

mkuratczyk Oct 8, 2024
Maintainer