@@ -36,48 +36,88 @@ process again.
36
36
[[modules-discovery-troubleshooting]]
37
37
==== Troubleshooting discovery
38
38
39
- In most cases, the discovery process completes quickly, and the master node
40
- remains elected for a long period of time. If the cluster has no master for
41
- more than a few seconds or the master is unstable, the logs for each node will
42
- contain information explaining why:
39
+ In most cases, the discovery and election process completes quickly, and the
40
+ master node remains elected for a long period of time.
43
41
44
- * All nodes repeatedly log messages indicating that a master cannot be
45
- discovered or elected using a logger called
42
+ If your cluster doesn't have a stable master, many of its features won't work
43
+ correctly and {es} will report errors to clients and in its logs. You must fix
44
+ the master node's instability before addressing these other issues. It will not
45
+ be possible to solve any other issues while there is no elected master node or
46
+ the elected master node is unstable.
47
+
48
+ If your cluster has a stable master but some nodes can't discover or join it,
49
+ these nodes will report errors to clients and in their logs. You must address
50
+ the obstacles preventing these nodes from joining the cluster before addressing
51
+ other issues. It will not be possible to solve any other issues reported by
52
+ these nodes while they are unable to join the cluster.
53
+
54
+ If the cluster has no elected master node for more than a few seconds, the
55
+ master is unstable, or some nodes are unable to discover or join a stable
56
+ master, then {es} will record information in its logs explaining why. If the
57
+ problems persist for more than a few minutes, {es} will record additional
58
+ information in its logs. To properly troubleshoot discovery and election
59
+ problems, collect and analyse logs covering at least five minutes from all
60
+ nodes.
61
+
62
+ The following sections describe some common discovery and election problems.
63
+
64
+ ===== No master is elected
65
+
66
+ When a node wins the master election, it logs a message containing
67
+ `elected-as-master` and all nodes log a message containing
68
+ `master node changed` identifying the new elected master node.
69
+
70
+ If there is no elected master node and no node can win an election, all
71
+ nodes will repeatedly log messages about the problem using a logger called
46
72
`org.elasticsearch.cluster.coordination.ClusterFormationFailureHelper`. By
47
73
default, this happens every 10 seconds.
48
74
49
- * If a node wins the election, it logs a message containing
50
- `elected-as-master`. If this happens repeatedly, the master node is unstable.
75
+ Master elections only involve master-eligible nodes, so focus on the logs from
76
+ master-eligible nodes in this situation. These nodes' logs will indicate the
77
+ requirements for a master election, such as the discovery of a certain set of
78
+ nodes.
51
79
52
- * When a node discovers the master or believes the master to have failed, it
53
- logs a message containing `master node changed`.
80
+ If the logs indicate that {es} can't discover enough nodes to form a quorum,
81
+ you must address the reasons preventing {es} from discovering the missing
82
+ nodes. The missing nodes are needed to reconstruct the cluster metadata.
83
+ Without the cluster metadata, the data in your cluster is meaningless. The
84
+ cluster metadata is stored on a subset of the master-eligible nodes in the
85
+ cluster. If a quorum can't be discovered, the missing nodes were the ones
86
+ holding the cluster metadata.
54
87
55
- * If a node is unable to discover or elect a master for several minutes, it
56
- starts to report additional details about the failures in its logs. Be sure to
57
- capture log messages covering at least five minutes of discovery problems.
88
+ Ensure there are enough nodes running to form a quorum and that every node can
89
+ communicate with every other node over the network. {es} will report additional
90
+ details about network connectivity if the election problems persist for more
91
+ than a few minutes. If you can't start enough nodes to form a quorum, start a
92
+ new cluster and restore data from a recent snapshot. Refer to
93
+ <<modules-discovery-quorums>> for more information.
94
+
95
+ If the logs indicate that {es} _has_ discovered a possible quorum of nodes, the
96
+ typical reason that the cluster can't elect a master is that one of the other
97
+ nodes can't discover a quorum. Inspect the logs on the other master-eligible
98
+ nodes and ensure that they have all discovered enough nodes to form a quorum.
99
+
100
+ ===== Master is elected but unstable
101
+
102
+ When a node wins the master election, it logs a message containing
103
+ `elected-as-master`. If this happens repeatedly, the elected master node is
104
+ unstable. In this situation, focus on the logs from the master-eligible nodes
105
+ to understand why the election winner stops being the master and triggers
106
+ another election.
107
+
108
+ ===== Node cannot discover or join stable master
109
+
110
+ If there is a stable elected master but a node can't discover or join its
111
+ cluster, it will repeatedly log messages about the problem using the
112
+ `ClusterFormationFailureHelper` logger. Other log messages on the affected node
113
+ and the elected master may provide additional information about the problem.
114
+
115
+ ===== Node joins cluster and leaves again
116
+
117
+ If a node joins the cluster but {es} determines it to be faulty then it will be
118
+ removed from the cluster again. See <<cluster-fault-detection-troubleshooting>>
119
+ for more information.
58
120
59
- If your cluster doesn't have a stable master, many of its features won't work
60
- correctly. The cluster may report many kinds of error to clients and in its
61
- logs. You must fix the master node's instability before addressing these other
62
- issues. It will not be possible to solve any other issues while the master node
63
- is unstable.
64
-
65
- The logs from the `ClusterFormationFailureHelper` may indicate that a master
66
- election requires a certain set of nodes and that it has not discovered enough
67
- nodes to form a quorum. If so, you must address the reason preventing {es} from
68
- discovering the missing nodes. The missing nodes are needed to reconstruct the
69
- cluster metadata. Without the cluster metadata, the data in your cluster is
70
- meaningless. The cluster metadata is stored on a subset of the master-eligible
71
- nodes in the cluster. If a quorum cannot be discovered then the missing nodes
72
- were the ones holding the cluster metadata. If you cannot bring the missing
73
- nodes back into the cluster, start a new cluster and restore data from a recent
74
- snapshot. Refer to <<modules-discovery-quorums>> for more information.
75
-
76
- The logs from the `ClusterFormationFailureHelper` may also indicate that it has
77
- discovered a possible quorum of master-eligible nodes. If so, the usual reason
78
- that the cluster cannot elect a master is that one of the other nodes cannot
79
- discover a quorum. Inspect the logs on the other master-eligible nodes and
80
- ensure that every node has discovered a quorum.
81
121
82
122
[[built-in-hosts-providers]]
83
123
==== Seed hosts providers
0 commit comments