-
Notifications
You must be signed in to change notification settings - Fork 0
Add partition-level idle detection to prevent message loss on ownership transfer #39
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Co-authored-by: nodece <16235121+nodece@users.noreply.github.com>
Co-authored-by: nodece <16235121+nodece@users.noreply.github.com>
…t magic numbers Co-authored-by: nodece <16235121+nodece@users.noreply.github.com>
… idle detection Co-authored-by: nodece <16235121+nodece@users.noreply.github.com>
… improve test reliability Co-authored-by: nodece <16235121+nodece@users.noreply.github.com>
|
@copilot |
…o-op detector pattern Co-authored-by: nodece <16235121+nodece@users.noreply.github.com>
…ctor reconnect Co-authored-by: nodece <16235121+nodece@users.noreply.github.com>
Done in commits 33e09e7 and 3891e64:
|
Co-authored-by: nodece <16235121+nodece@users.noreply.github.com>
Problem
Consumers permanently stop receiving messages when topic ownership transfers (broker failover, unload) and the
CommandCloseConsumernotification fails to arrive due to network issues or broker crashes. Current connection-level ping-pong only validates TCP connectivity, not topic ownership.Changes
Core Implementation
PartitionConsumerIdleDetector: Tracks consumer activity (receive, ack, nack, redeliver) and triggers lookup when idle for 30s (configurable). Reconnects with cleanup if ownership changed.Configuration (
ConsumerConfigurationData)consumerIdleTimeoutMs(default: 30000, set to 0 to disable)Integration (
ConsumerImpl)internalReceive(),doAcknowledge(),negativeAcknowledge(),redeliverUnacknowledgedMessages()getUnAckedMessageTracker(),getAcknowledgmentsGroupingTracker(),incrementConsumerEpoch()Example Usage
Test Coverage
Original prompt
Add partition-level idle detection for consumer to prevent permanent message loss after topic ownership transfer
Problem
When topic ownership transfers (e.g., broker failover, topic unload), some partition consumers may permanently fail to receive messages because:
CommandCloseConsumernotificationImpact: Silent data loss, requires manual intervention, affects production availability
Solution
Implement partition-level idle detection + automatic reconnect mechanism in the consumer:
Key Features
internalReceive())doAcknowledge())negativeAcknowledge())Implementation Requirements
1. Create
PartitionConsumerIdleDetector.javaLocation:
pulsar-client/src/main/java/org/apache/pulsar/client/impl/PartitionConsumerIdleDetector.javaKey Methods:
markActive()- Called when consumer has activitycheckIdleAndReconnectIfNeeded()- Periodic check for idle stateverifyTopicOwnership()- Lookup to verify if broker changedreconnectWithCleanup()- Reconnect + cleanup logicCleanup Logic:
2. Modify
ConsumerImpl.javaAdd fields:
Integrate in constructor:
Add activity markers:
Expose helper methods (package-private for IdleDetector):
Clean up in closeAsync():
3. Modify
ConsumerConfigurationData.javaAdd configuration fields: