[fix][client] Prevent epoch race in MultiTopicsConsumer batch receive#25210
[fix][client] Prevent epoch race in MultiTopicsConsumer batch receive#25210ChimdumebiNebolisa wants to merge 9 commits intoapache:masterfrom
Conversation
|
@ChimdumebiNebolisa Please add the following content to your PR description and select a checkbox: |
fa110d7 to
9602f5c
Compare
5ed0fda to
a83cb0d
Compare
| @@ -0,0 +1,185 @@ | |||
| # Investigation: GitHub issue #25204 – MultiTopicsConsumer receives message with older ConsumerEpoch after redeliverUnacknowledgedMessages() | |||
There was a problem hiding this comment.
You should not commit the investigation document to the code base. You can post it to Gist and share the link in the PR description
| * validations, so the epoch changes mid-batch. Without the fix, exactly one message is delivered | ||
| * (bug). The test asserts totalDelivered != 1. | ||
| */ | ||
| @Test(groups = "flaky") |
There was a problem hiding this comment.
Why is it flaky? If the bug is fixed, this test should not be flaky
| Field consumersField = MultiTopicsConsumerImpl.class.getDeclaredField("consumers"); | ||
| consumersField.setAccessible(true); | ||
| java.util.Map<String, ConsumerImpl<byte[]>> consumersMap = | ||
| (java.util.Map<String, ConsumerImpl<byte[]>>) consumersField.get(multiConsumer); |
There was a problem hiding this comment.
Don't use reflection. consumers is a protected field, you can access consumers directly
In addition, AI tends to use full qualified type name (java.util.Map) rather than Map with import java.util.Map. You can customize the AGENTS.md or manually review the AI generated code.
|
Thanks for the review. I’ve updated the test to address the feedback:
The remaining reflection is only for private members with no stable test seam ( |
Fixes #25204
Motivation
MultiTopicsConsumerprocesses a received batch by validating the consumer epoch per message. IfredeliverUnacknowledgedMessages()runs concurrently, it can incrementconsumerEpochwhile the batch loop is still iterating, which can produce a mixed outcome where part of the batch is accepted and the rest is filtered. This matches the behavior reported in #25204.Modifications
incomingQueueLockacross the entire batch loop and theincomingMessages.size()read, soconsumerEpochcannot change mid-batch.MultiTopicsConsumerEpochRaceTestto assert the post-fix invariant:acceptedByEpochCount == 2)Verifying this change
flaky, so it must not be excluded):mvn --% -pl pulsar-client "-Dtest=org.apache.pulsar.client.impl.MultiTopicsConsumerEpochRaceTest" "-DexcludedGroups=quarantine" testmvn -pl pulsar-client testDoes this pull request potentially affect one of the following parts
The threading modelDocumentation
docdoc-requireddoc-not-neededdoc-completeMatching PR in forked repository
PR in forked repository: ChimdumebiNebolisa#2