-
-
Notifications
You must be signed in to change notification settings - Fork 1.7k
(2.14) reset invalid consumer state to stream state on startup #7692
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
Is this with If it's not configured it could be the stream and consumer are properly in sync, a VM crash follows, and some of the stream's messages at the tail revert while the consumer state remains ahead. Fully resetting the consumer state to a recalculated starting sequence is not the correct thing to do (might only work for a WorkQueue, but don't think it's fully correct there either). For example, imagine a scenario where a stream has 100 messages, and those 100 messages have been sent to the consumer and they are all pending. The next sequence is Instead, the state should be looked at and compared to the highest stream sequence:
I'd recommend to start with a test that only uses normal JetStream APIs to get the system in a state like described above, then shutdown the server and remove |
|
I'd also recommend to open an issue and discuss there first before opening a PR like this. Otherwise you're spending time on code that might not be suitable, and we need to do multiple code review cycles, instead of discussing the desired approach first. See also the contributing guide: CONTRIBUTING.md |
16335ad to
fe19965
Compare
|
Hi @MauriceVanVeen, thank you for your patience and time. I opened this PR primarily for discussion purposes. I initially wanted to create an issue or proposal in the repository, but wasn’t able to do that at the same time. I agree that we should first discuss the approach and then move on to implementation. To that end, I’ve opened a proposal here #7693, and we can use it to discuss the approach or any other related points. |
e262676 to
634b7e3
Compare
236f8b3 to
912caba
Compare
|
Hi @MauriceVanVeen , thank you for the review. I’ve addressed all the comments you mentioned and have added the test case as well. Please take a look at the updated PR and let me know if there’s anything you’d like to change. |
MauriceVanVeen
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left some final comments, think we're nearly there.
Also, could you edit your commits to also contain the Signed-off-by to pass CI?
See also: CONTRIBUTING.md#sign-off
912caba to
be4de2b
Compare
|
@MauriceVanVeen , I’ve pushed the updated changes. It looks like the sign-off on the second commit didn’t happen while resolving conflicts, could you please check now? |
Signed-off-by: Tilak Raj <[email protected]>
be4de2b to
eaa2cb3
Compare
MauriceVanVeen
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thank you!
This PR addresses scenarios where the stream state and consumer state diverge. Such discrepancies can occur in real-world situations, for example after a server or VM crash. While we don’t currently have deterministic steps to reproduce this issue, it’s reasonable to expect cases where a consumer’s sequence ends up ahead of the stream’s sequence.
In customer environments , especially where teams cannot directly interact with or manually intervene in systems , the system should be able to recover automatically. In these situations, the consumer state itself is not corrupted but may reference an invalid sequence. This change allows the consumer to fall back to the correct stream sequence so it can continue functioning, rather than getting stuck and stopping message consumption.
This PR is open for discussion, and feedback is welcome on whether these changes are appropriate, should be adjusted, or belong elsewhere.
Resolves #7693
Signed-off-by: Tilak Raj [email protected]