Skip to content

Comments

Rollback the consensus module leader's next committed session id#1918

Draft
marc-adaptive wants to merge 9 commits intomasterfrom
fix/next-committed-session-id
Draft

Rollback the consensus module leader's next committed session id#1918
marc-adaptive wants to merge 9 commits intomasterfrom
fix/next-committed-session-id

Conversation

@marc-adaptive
Copy link
Contributor

This PR is to fix ConsensusModuleAgent#nextCommittedSessionId.

Previously the leader would bump nextCommittedSessionId once message was appended to log, which doesn't guarantee the session open is committed. In the case session opens are not committed and the log is truncated, a leader becoming a follower could have an incorrect nextCommittedSessionId resulting in an inconsistent snapshot across nodes.

Feel free to close if I'm wrong, misunderstanding something, or there's a better approach.

@vyazelenko vyazelenko requested review from mikeb01 and mjpt777 January 11, 2026 20:06
session.loadSnapshotState(correlationId, openedPosition, timeOfLastActivity, closeReason);

addSession(session);

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@marc-adaptive Could you run me though the logic of remove this? It is because the consensus module state values are now guaranteed to be higher than the ids of the stored cluster sessions?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible that after loading a snapshot stored from an older cluster into a cluster of this version that the nextSessionId/nextCommitedSessionId could be incorrect?

Copy link
Contributor Author

@marc-adaptive marc-adaptive Jan 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see how we could end up in that code path as snapshotted/bootstrapped session id value should always be >= values in sessions. But I may be wrong and it doesn't hurt to have this check so I will reverted this. I didn't consider case of older clusters. Thanks for catching.

Considering the cases

  • leader replay, live follower, follower replay, follower catchup - nextSessionId and nextCommittedSessionId will be the same. Loaded from snapshot/bootstrapper, #onReplaySessionOpen can increment, and value is snapshotted
  • live leader - nextSessionId and nextCommittedSessionId will diverge during the leadership term. Both loaded from snapshot/bootstrapper. #onSessionConnect will increment nextSessionId. #sweepUncommittedEntriesTo and #restoreUncommittedEntries can increment nextCommittedSessionId. nextCommittedSessionId is snapshotted. And on new election nextSessionId is brought down to nextCommittedSessionId.

@marc-adaptive marc-adaptive force-pushed the fix/next-committed-session-id branch 2 times, most recently from 32948bf to a6fd3ef Compare January 19, 2026 23:10
@marc-adaptive marc-adaptive marked this pull request as draft January 20, 2026 01:40
@marc-adaptive
Copy link
Contributor Author

This approach rolls back nextSessionId to nextCommittedSessionId when leadership term is over. Am unsure if this could be problematic if an authenticator assumes unique session ids. The other approach is to not roll back nextSessionId and update nextSessionId and nextCommittedSessionId independently in onReplaySessionOpen.

…der would bump nextCommittedSessionId once message was appended to log, which doesn't guarantee the session open is committed. In the case session opens are not committed and the log is truncated, a leader becoming a follower could have an incorrect nextCommittedSessionId resulting in an inconsistent snapshot across nodes.
…that we are storing the current state prior to a transition rather that a specific state
@marc-adaptive marc-adaptive force-pushed the fix/next-committed-session-id branch from a6fd3ef to de66910 Compare January 28, 2026 14:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants