Cluster.NoLeaderException after cluster recovery #14446

krivulcik · 2022-06-27T09:31:01Z

krivulcik
Jun 27, 2022

We've had one node lose data.
We noticed that the recovery didn't go well (I'll report with some details and questions later), so we need to remove it from the cluster.
Any cluster operation currently ends with the following exception:

Raven.Client.Exceptions.Cluster.NoLeaderException: This node is elected to be the leader, but didn't took office yet.
   at Raven.Server.Web.RequestHandler.RedirectToLeader() in C:\Builds\RavenDB-Stable-5.3\53023\src\Raven.Server\Web\RequestHandler.cs:line 736
   at Raven.Server.Documents.Handlers.Admin.RachisAdminHandler.DeleteNode() in C:\Builds\RavenDB-Stable-5.3\53023\src\Raven.Server\Documents\Handlers\Admin\RachisAdminHandler.cs:line 549
   at Raven.Server.Routing.RequestRouter.HandlePath(RequestHandlerContext reqCtx) in C:\Builds\RavenDB-Stable-5.3\53023\src\Raven.Server\Routing\RequestRouter.cs:line 365
   at Raven.Server.RavenServerStartup.RequestHandler(HttpContext context) in C:\Builds\RavenDB-Stable-5.3\53023\src\Raven.Server\RavenServerStartup.cs:line 240

Node A is the node to remove. It is offline and configured to be Watcher. When I start the RavenDB service on the server, the situation doesn't change.
Node C is currently the Leader, node B is Member.
When I stop node C, it is elected to be the leader again.
When I try to force node C to step down, the same exception is returned.

As usually with increased load, there is constant voting in progress, and the Term increases steadily (by one every couple of seconds).

What is the best way to recover from this situation?
We need:
a) to remove node A from the cluster.
b) to stabilize the cluster topology.

We also see the following:

The connection with node B was suddenly broken..
System.IO.InvalidDataException: Expected to get type of 'AppendEntriesResponse' message, but got 'Error' message.
 ---> System.Exception: {"Type":"Error","ExceptionType":"RachisInvalidOperationException","Message":"FATAL ERROR: got an append entries request with index=2,460,346 term=324,057 while my term for this index is 324,056. (last commit index=2,460,355 with term=324,056), this means something went wrong badly.","Exception":"Raven.Server.Rachis.RachisInvalidOperationException: FATAL ERROR: got an append entries request with index=2,460,346 term=324,057 while my term for this index is 324,056. (last commit index=2,460,355 with term=324,056), this means something went wrong badly.\r\n   at Raven.Server.Rachis.RachisInvalidOperationException.Throw(String msg) in C:\\Builds\\RavenDB-Stable-5.3\\53023\\src\\Raven.Server\\Rachis\\Exceptions.cs:line 36\r\n   at Raven.Server.Rachis.RachisConsensus.ThrowFatalError(RachisEntry firstEntry, Nullable`1 myTermForTheIndex, Int64 lastCommitIndex, Int64 lastCommitTerm) in C:\\Builds\\RavenDB-Stable-5.3\\53023\\src\\Raven.Server\\Rachis\\RachisConsensus.cs:line 1683\r\n   at Raven.Server.Rachis.RachisConsensus.AppendToLog(ClusterOperationContext context, List`1 entries) in C:\\Builds\\RavenDB-Stable-5.3\\53023\\src\\Raven.Server\\Rachis\\RachisConsensus.cs:line 1604\r\n   at Raven.Server.Rachis.Follower.ApplyLeaderStateToLocalState(Stopwatch sp, ClusterOperationContext context, List`1 entries, AppendEntries appendEntries) in C:\\Builds\\RavenDB-Stable-5.3\\53023\\src\\Raven.Server\\Rachis\\Follower.cs:line 291\r\n   at Raven.Server.Rachis.Follower.FollowerSteadyState() in C:\\Builds\\RavenDB-Stable-5.3\\53023\\src\\Raven.Server\\Rachis\\Follower.cs:line 152\r\n   at Raven.Server.Rachis.Follower.Run(Object obj) in C:\\Builds\\RavenDB-Stable-5.3\\53023\\src\\Raven.Server\\Rachis\\Follower.cs:line 1131"}
   --- End of inner exception stack trace ---
   at Raven.Server.Rachis.Remote.RemoteConnection.ThrowUnexpectedMessage(String type, String expectedType, BlittableJsonReaderObject json) in C:\Builds\RavenDB-Stable-5.3\53023\src\Raven.Server\Rachis\Remote\RemoteConnection.cs:line 478
   at Raven.Server.Rachis.Remote.RemoteConnection.ValidateMessage(String expectedType, BlittableJsonReaderObject json) in C:\Builds\RavenDB-Stable-5.3\53023\src\Raven.Server\Rachis\Remote\RemoteConnection.cs:line 466
   at Raven.Server.Rachis.Remote.RemoteConnection.Read[T](JsonOperationContext context) in C:\Builds\RavenDB-Stable-5.3\53023\src\Raven.Server\Rachis\Remote\RemoteConnection.cs:line 319
   at Raven.Server.Rachis.FollowerAmbassador.Run() in C:\Builds\RavenDB-Stable-5.3\53023\src\Raven.Server\Rachis\FollowerAmbassador.cs:line 318

Formatted/unescaped inner exception:

FATAL ERROR: got an append entries request with index=2,460,346 term=324,057 while my term for this index is 324,056. (last commit index=2,460,355 with term=324,056), this means something went wrong badly.","Exception":"Raven.Server.Rachis.RachisInvalidOperationException: FATAL ERROR: got an append entries request with index=2,460,346 term=324,057 while my term for this index is 324,056. (last commit index=2,460,355 with term=324,056), this means something went wrong badly.
   at Raven.Server.Rachis.RachisInvalidOperationException.Throw(String msg) in C:\Builds\RavenDB-Stable-5.3\53023\src\Raven.Server\Rachis\Exceptions.cs:line 36
   at Raven.Server.Rachis.RachisConsensus.ThrowFatalError(RachisEntry firstEntry, Nullable`1 myTermForTheIndex, Int64 lastCommitIndex, Int64 lastCommitTerm) in C:\Builds\RavenDB-Stable-5.3\53023\src\Raven.Server\Rachis\RachisConsensus.cs:line 1683
   at Raven.Server.Rachis.RachisConsensus.AppendToLog(ClusterOperationContext context, List`1 entries) in C:\Builds\RavenDB-Stable-5.3\53023\src\Raven.Server\Rachis\RachisConsensus.cs:line 1604
   at Raven.Server.Rachis.Follower.ApplyLeaderStateToLocalState(Stopwatch sp, ClusterOperationContext context, List`1 entries, AppendEntries appendEntries) in C:\Builds\RavenDB-Stable-5.3\53023\src\Raven.Server\Rachis\Follower.cs:line 291
   at Raven.Server.Rachis.Follower.FollowerSteadyState() in C:\Builds\RavenDB-Stable-5.3\53023\src\Raven.Server\Rachis\Follower.cs:line 152
   at Raven.Server.Rachis.Follower.Run(Object obj) in C:\Builds\RavenDB-Stable-5.3\53023\src\Raven.Server\Rachis\Follower.cs:line 1131

Answered by garayx

Jun 27, 2022

Hi

If you are running with encryption ? if not, you can you try to stop RavenDB service on node A, rename (delete) the system database folder, and start the service ? This should replicate the system DB from scratch, then you can elect a leader and should be able to Request a snapshot by running server.serverStore.RequestSnapshot() in the JS admin console

View full answer

krivulcik · 2022-06-27T11:46:06Z

krivulcik
Jun 27, 2022
Author

After further investigation, it seems that the source of the issue is that the B node has a wrong state (index/term mismatch).

Because of this, it won't accept the noop RAFT message from node C, which would change C's state from LeaderElect to Leader.

The suggested fix described at https://issues.hibernatingrhinos.com/issue/RavenDB-18590#focus=Comments-67-356384.0-0 is the following:

Demote the problematic node to watcher (B in this case)
On this node - Request a snapshot by running server.serverStore.RequestSnapshot() in the JS admin console
Promote it back to member

However, it's not possible to demote the node B, as only the leader node can do that, and C is not the leader yet.

Is it possible to either forcefully demote a node or force a node to become the leader?

Alternatively, is it possible to reach a state of the cluster to get out of this situation?

The cluster is currently in the following state:
A: Watcher, turned off.
B: Member
C: LeaderElect

The cluster diagram shows node A to be active and also the connection between C and A to be active, even though node A doesn't run. Should this be the case?

0 replies

garayx · 2022-06-27T13:33:07Z

garayx
Jun 27, 2022
Maintainer

Hi

If you are running with encryption ? if not, you can you try to stop RavenDB service on node A, rename (delete) the system database folder, and start the service ? This should replicate the system DB from scratch, then you can elect a leader and should be able to Request a snapshot by running server.serverStore.RequestSnapshot() in the JS admin console

8 replies

krivulcik Jun 28, 2022
Author

I have an additional question, @karmeli87, @garayx.

Is the above process (stop the node, remove system database, start the node) safe to do in that the cluster will restore the database from healthy nodes? We suspect that the cluster doesn't behave quite as expected, so we'd like to eliminate the possibility of system database inconsistencies causing this.

Resetting the system database on the last remaining node (C) would likely ensure that it is synchronized across the cluster even though it should be already the case - the system database on node B was restored from the database on node C.

However, I want to check that triggering this shouldn't have other adverse effects under otherwise normal operating conditions.

karmeli87 Jun 28, 2022
Maintainer

Given that the cluster was broken it might happened that database topology was not consistent no all nodes and was unable to be updated because of the issue above.

In a normal operation rebuilding the database would cause that node to move into rehab due to the high document count difference recognized by the cluster observer.
The problem is that the cluster observer can make choices only when there is a leader, which wasn't your case.
However it is still possible to have race between moving into rehab and talking to the app. In that case a better option might be to remove that node from the database group and re-adding it (note: not to be confused with cluster nodes).

Also, your current cluster topology is 2 Members and 1 Watcher, meaning that if one of the member nodes fail (like happened for B) you will not be able to elect a leader because it will require a majority. In your case it will require both member nodes to be functional.

Hope it make sense :-)

krivulcik Jun 28, 2022
Author

Yes, that makes sense.

Having node A as a watcher was supposed to be a temporary state, and the problem with the leader being unable to take office started soon after demoting node A. I believe that node A was the leader before, and after stepping down and demoting it, the cluster got to the state described earlier.

After we provision the new server, we'll add it to the database groups and wait for the replication. Since it will be brand new, the recovery should occur without inconsistency issues, I hope.

But I want to confirm again: If I were to remove the system database on the currently healthy node C, it should recover the data from node B without any adverse effects, right? The current topology is that node A is offline, and nodes B and C are nominally operational on both the cluster level and database level for all databases.

garayx Jun 28, 2022
Maintainer

when you remove the system folder and turn the node back on, the leader will send the snapshot of the cluster to that node

krivulcik Jun 28, 2022
Author

Great, thanks for the information, Egor.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Cluster.NoLeaderException after cluster recovery #14446

Uh oh!

{{title}}

Uh oh!

Replies: 2 comments 8 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Cluster.NoLeaderException after cluster recovery #14446

Uh oh!

krivulcik Jun 27, 2022

Replies: 2 comments · 8 replies

Uh oh!

krivulcik Jun 27, 2022 Author

Uh oh!

garayx Jun 27, 2022 Maintainer

Uh oh!

krivulcik Jun 28, 2022 Author

Uh oh!

karmeli87 Jun 28, 2022 Maintainer

Uh oh!

krivulcik Jun 28, 2022 Author

Uh oh!

garayx Jun 28, 2022 Maintainer

Uh oh!

krivulcik Jun 28, 2022 Author

krivulcik
Jun 27, 2022

Replies: 2 comments 8 replies

krivulcik
Jun 27, 2022
Author

garayx
Jun 27, 2022
Maintainer

krivulcik Jun 28, 2022
Author

karmeli87 Jun 28, 2022
Maintainer

krivulcik Jun 28, 2022
Author

garayx Jun 28, 2022
Maintainer

krivulcik Jun 28, 2022
Author