-
Notifications
You must be signed in to change notification settings - Fork 3.7k
[fix][broker] Remove consumer from cache if closed before creation #24638
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| response = getResponse(); | ||
| assertEquals(response.getClass(), CommandError.class); | ||
| assertEquals(((CommandError) response).getRequestId(), 3); | ||
| // We should not receive response for 1st consumer, since it was cancelled by the close |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's better to respond to the client even if the subscribe request was cancelled by a close command, otherwise the client will be left hanging.
This is a TODO.
lhotari
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This happens because the consumer remains in the connection's cache even after it is closed before the subscription is fully established. As a result, any new subscription attempt with the same consumer ID fails with
A failed consumer with id is already present on the connection.
I think this PR makes sense to me, but I'm just wondering why would a client perform a new subscription attempt with the same consumer ID if it already closed the consumer? Explaining the scenario in the PR description would be useful context.
pulsar-broker/src/main/java/org/apache/pulsar/broker/service/ServerCnx.java
Show resolved
Hide resolved
@lhotari Yes, that's correct. I guess that the issue occurs when:
The producer has same bugfix: https://github.com/apache/pulsar/blob/v3.0.12/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/ServerCnx.java#L2073 |
|
I personally think we don't need to remove the Consumer Future unless it's done. |
This is an async(improvement) issue and is not related to this PR. When the consumer sends a close request, the broker will complete the future. We must remove the future at this point; Otherwise, the broker cache retains the failed consumer, and subsequent subscribe requests will hit the error: Please see #24638 (comment). |
@nodece I think that we need @poorbarcode to explain why #22283 was needed since that seems to intentionally change the behavior and let the failed consumer future remain in the map. |
|
@nodece @poorbarcode Instead of leaving the failed future in the map to prevent race conditions, perhaps |
Once a subscription (including close) has completed, the consumer can reuse the same consumer ID (final value) to send a new subscription request. The main goal here seems to be preventing a consumer from sending multiple concurrent subscription requests. However, the old code also blocks reuse of the same consumer ID even after the subscription has done, which looks more like a bug since at that point the consumer should be allowed to resume with the same ID. I also noticed that we already handle similar concurrency cases in the ledger module, so we could apply the same approach here for |
Signed-off-by: Zixuan Liu <nodeces@gmail.com>
c726ab7 to
df530aa
Compare
|
A non-durable cursor bugfix: #24643 |
|
/pulsarbot rerun-failure-checks |
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #24638 +/- ##
============================================
- Coverage 74.28% 74.28% -0.01%
+ Complexity 33164 32784 -380
============================================
Files 1882 1882
Lines 146855 146858 +3
Branches 16867 16867
============================================
- Hits 109094 109089 -5
+ Misses 29088 29087 -1
- Partials 8673 8682 +9
Flags with carried forward coverage won't be shown. Click here to find out more.
🚀 New features to boost your workflow:
|
Could you write a test for this case? |
@poorbarcode We only need to verify the logic in The existing test |
pulsar-broker/src/test/java/org/apache/pulsar/broker/service/ServerCnxTest.java
Show resolved
Hide resolved
Signed-off-by: Zixuan Liu <nodeces@gmail.com>
|
/pulsarbot rerun-failure-checks |
|
Thanks for the review and discussion. After re-evaluating the logic, I realized that the current behavior is correct: The consumer should only be removed from the cache once the subscription completes successfully. Removing it earlier could lead to inconsistent states. I'll close this PR. |
Motivation
When a consumer sends a subscribe command and then immediately sends a close command while the subscription is still in progress, the broker logs messages like:
This happens because the consumer remains in the connection's cache even after it is closed before the subscription is fully established. As a result, any new subscription attempt with the same consumer ID fails with
A failed consumer with id is already present on the connection.This regression was introduced by #22283.
Modifications
handleCloseConsumer.Documentation
docdoc-requireddoc-not-neededdoc-completeMatching PR in forked repository
PR in forked repository: