Rollback the consensus module leader's next committed session id#1918
Rollback the consensus module leader's next committed session id#1918marc-adaptive wants to merge 9 commits intomasterfrom
Conversation
| session.loadSnapshotState(correlationId, openedPosition, timeOfLastActivity, closeReason); | ||
|
|
||
| addSession(session); | ||
|
|
There was a problem hiding this comment.
@marc-adaptive Could you run me though the logic of remove this? It is because the consensus module state values are now guaranteed to be higher than the ids of the stored cluster sessions?
There was a problem hiding this comment.
Is it possible that after loading a snapshot stored from an older cluster into a cluster of this version that the nextSessionId/nextCommitedSessionId could be incorrect?
There was a problem hiding this comment.
I don't see how we could end up in that code path as snapshotted/bootstrapped session id value should always be >= values in sessions. But I may be wrong and it doesn't hurt to have this check so I will reverted this. I didn't consider case of older clusters. Thanks for catching.
Considering the cases
- leader replay, live follower, follower replay, follower catchup - nextSessionId and nextCommittedSessionId will be the same. Loaded from snapshot/bootstrapper, #onReplaySessionOpen can increment, and value is snapshotted
- live leader - nextSessionId and nextCommittedSessionId will diverge during the leadership term. Both loaded from snapshot/bootstrapper. #onSessionConnect will increment nextSessionId. #sweepUncommittedEntriesTo and #restoreUncommittedEntries can increment nextCommittedSessionId. nextCommittedSessionId is snapshotted. And on new election nextSessionId is brought down to nextCommittedSessionId.
32948bf to
a6fd3ef
Compare
|
This approach rolls back nextSessionId to nextCommittedSessionId when leadership term is over. Am unsure if this could be problematic if an authenticator assumes unique session ids. The other approach is to not roll back nextSessionId and update nextSessionId and nextCommittedSessionId independently in onReplaySessionOpen. |
…der would bump nextCommittedSessionId once message was appended to log, which doesn't guarantee the session open is committed. In the case session opens are not committed and the log is truncated, a leader becoming a follower could have an incorrect nextCommittedSessionId resulting in an inconsistent snapshot across nodes.
…that we are storing the current state prior to a transition rather that a specific state
a6fd3ef to
de66910
Compare
This PR is to fix ConsensusModuleAgent#nextCommittedSessionId.
Previously the leader would bump nextCommittedSessionId once message was appended to log, which doesn't guarantee the session open is committed. In the case session opens are not committed and the log is truncated, a leader becoming a follower could have an incorrect nextCommittedSessionId resulting in an inconsistent snapshot across nodes.
Feel free to close if I'm wrong, misunderstanding something, or there's a better approach.