Skip to content

[Enhancement] Pop Long-polling Not Awakened for V1 Retry Messages #9632

@KingCide

Description

@KingCide

Before Creating the Enhancement Request

  • I have confirmed that this should be classified as an enhancement rather than a bug/feature.

Summary

Pop long-polling is not awakened for V1 retry messages, causing significant consumption delay

Motivation

Description

In Pop consumption mode, when a message is not acknowledged (ACKed) within its invisible time, it is moved to a retry topic for later consumption. The system is expected to wake up any waiting long-polling requests for the original topic so the message can be re-consumed promptly.

Currently, this wake-up mechanism works correctly for V2 retry topics (%RETRY%group+topic) because the original topic and group can be reliably parsed.

However, for V1 retry topics (%RETRY%group_topic), the broker fails to parse the original topic name from the retry topic. As a result, the notifyMessageArrivingWithRetryTopic method cannot identify and awaken the correct long-polling request. This forces the consumer's long-polling request to wait until it times out (controlled by BROKER_SUSPEND_MAX_TIME_MILLIS, typically 15 seconds), introducing a significant delay in message retries.

Steps to Reproduce

  1. Configure and start the broker with the following settings:
  • enableRetryTopicV2 = false (to use the V1 retry topic format)
  • popConsumerKVServiceEnable = true (or popConsumerFSServiceInit = true)
  1. Producer: Send a batch of messages (e.g., 32 messages) to a normal topic, let's call it TopicA.
  2. Consumer: Use a Pop-based consumer (e.g., PushConsumer) to subscribe to TopicA.
  3. Simulate Failure: In the message listener, do not return CONSUME_SUCCESS for the received messages. For example, return RECONSUME_LATER or simply don't ACK them, causing them to expire and be sent to the retry topic (%RETRY%YourConsumerGroup_TopicA).
  4. Observe: Do not send any new messages to TopicA. Monitor the consumer logs.

Expected Behavior

When the invisible time for a message expires and it's moved to the V1 retry topic, the long-polling request waiting for messages on TopicA should be awakened immediately. The consumer should receive the retry message promptly (e.g., within 1-2 seconds after the invisible time + retry delay).

Actual Behavior

The long-polling request is not awakened by the arrival of the retry message. It remains suspended until the long-poll timeout is reached (approx. 15 seconds). Only after the timeout does the client re-initiate the poll request and finally fetch the message from the retry queue.

Log Evidence:

  • Message first nack'd time: 2025-08-22 16:05:21,414
  • Message revived and written to retry topic (from mqadmin topicStatus): 2025-08-22 16:05:33,497
  • Consumer receives the retry message: 2025-08-22 16:05:36,385
  • Observed Delay: ~15 seconds from the initial NACK, matching the long-poll timeout.

This is further confirmed by a second experiment: if new normal messages are continuously sent to TopicA, the retry messages are consumed much faster. This proves that the arrival of new normal messages is waking up the long-poll, which then happens to pick up the waiting retry messages. The retry message itself is not triggering the wake-up.

Describe the Solution You'd Like

The issue lies in PopLongPollingService.notifyMessageArrivingWithRetryTopic. For V1 retry topics, it cannot resolve the original topic.

We can enhance this method by implementing a reverse-lookup mechanism using the topicCidMap, which stores active (topic, consumer_group) mappings.

Logic:

  1. When a message arrives in a V1 retry topic, iterate through the entries in PopLongPollingService.topicCidMap.
  2. For each (topic, cid) pair, reconstruct the potential V1 retry topic name using KeyBuilder.buildPopRetryTopicV1(topic, cid).
  3. Compare this reconstructed name with the incoming retry topic name.
  4. If a unique match is found, we have successfully identified the original topic. Use this original topic to notify the long-polling service.
  5. If multiple matches are found, or no match is found, fall back to the current behavior (using the retry topic name) to avoid incorrect notifications.

Describe Alternatives You've Considered

popKV may be an alternative

Additional Context

No response

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions