You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: openraft/src/docs/faq/faq.md
+29Lines changed: 29 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -86,6 +86,32 @@ See: [`leader-id`](`crate::docs::data::leader_id`) for details.
86
86
Excessive error logging, like `ERROR openraft::replication: 248: RPCError err=NetworkError: ...`, occurs when a follower node becomes unresponsive. To alleviate this, implement a mechanism within [`RaftNetwork`][] that returns a [`Unreachable`][] error instead of a [`NetworkError`][] when immediate replication retries to the affected node are not advised.
87
87
88
88
89
+
### How to detect which nodes are currently down or unreachable?
90
+
91
+
To monitor node availability in your Raft cluster, use [`RaftMetrics`][] from
92
+
the leader node via [`Raft::metrics()`][]. This provides real-time visibility
93
+
into node reachability without requiring membership changes.
94
+
95
+
There are two primary approaches to detect unreachable nodes:
96
+
97
+
**Method 1: Monitor replication lag**
98
+
Check the field [`RaftMetrics::replication`][], which contains a
99
+
`BTreeMap<NodeId, Option<LogId>>` showing the last replicated log for each node.
100
+
If a node's replication significantly lags behind
101
+
[`RaftMetrics::last_log_index`][], it indicates replication issues and the node
0 commit comments