@@ -975,51 +975,52 @@ rabbitmqctl cluster_status
975975# => ...done.
976976```
977977
978- ## Forcing Node Boot in Case of Unavailable Peers {#forced-boot}
979978
980- In some cases the last node to go
981- offline cannot be brought back up. It can be removed from the
982- cluster using the ` forget_cluster_node ` [ rabbitmqctl] ( ./cli ) command.
979+ ## How to Remove a Node from the Cluster {#removing-nodes}
983980
984- Alternatively ` force_boot ` [ rabbitmqctl] ( ./cli ) command can be used
985- on a node to make it boot without trying to sync with any
986- peers (as if they were last to shut down). This is
987- usually only necessary if the last node to shut down or a
988- set of nodes will never be brought back online.
981+ Sometimes it is necessary to remove a node from the cluster.
982+
983+ The sequence of actions will be slightly different for the following
984+ most common scenarios:
989985
990- ## Breaking Up a Cluster {#removing-nodes}
986+ * The node is online and reachable
987+ * The node is offline and cannot be recovered
991988
992- Sometimes it is necessary to remove a node from a
993- cluster. The operator has to do this explicitly using a
994- ` rabbitmqctl ` command.
989+ In addition, if the cluster [ peer discovery mechanisms] ( ./cluster-formation )
990+ support node health checks and [ forced removal of nodes] ( ./cluster-formation#node-health-checks-and-cleanup ) not known to the discovery backend.
995991
996- Some [ peer discovery mechanisms] ( ./cluster-formation )
997- support node health checks and forced
998- removal of nodes not known to the discovery backend. That feature is
999- opt-in (deactivated by default).
992+ That feature is opt-in (deactivated by default).
1000993
1001- We first remove ` rabbit@rabbit3 ` from the cluster, returning it to
1002- independent operation. To do that, on ` rabbit@rabbit3 ` we
1003- stop the RabbitMQ application, reset the node, and restart the
1004- RabbitMQ application.
994+ Continuing with the three node cluster example used in this guide,
995+ let's demonstrate how to remove ` rabbit@rabbit3 ` from the cluster, returning it to
996+ independent operation.
997+
998+ ### Removal of a Reachable Node
999+
1000+ First step before removing a node from the cluster is to stop it:
10051001
10061002``` bash
10071003# on rabbit3
10081004rabbitmqctl stop_app
10091005# => Stopping node rabbit@rabbit3 ...done.
1006+ ```
10101007
1011- rabbitmqctl reset
1012- # => Resetting node rabbit@rabbit3 ...done.
1013- rabbitmqctl start_app
1014- # => Starting node rabbit@rabbit3 ...done.
1008+ Then use ` rabbitmqctl forget_cluster_node ` on another node
1009+ and specify the node to remove as ** the first positional argument** :
1010+
1011+ ``` bash
1012+ # on rabbit2
1013+ rabbitmqctl forget_cluster_node rabbit@rabbit3
1014+ # => Removing node rabbit@rabbit3 from cluster ...
10151015```
10161016
1017- Note that it would have been equally valid to list
1018- ` rabbit@rabbit3 ` as a node.
1017+ Running the
10191018
1019+ ``` shell
1020+ rabbitmq-diagnostics cluster_status
1021+ ```
10201022
1021- Running the <i >cluster_status</i > command on the nodes confirms
1022- that ` rabbit@rabbit3 ` now is no longer part of
1023+ command on the nodes confirms that ` rabbit@rabbit3 ` now is no longer part of
10231024the cluster and operates independently:
10241025
10251026``` bash
@@ -1036,17 +1037,32 @@ rabbitmqctl cluster_status
10361037# => [{nodes,[{disc,[rabbit@rabbit1,rabbit@rabbit2]}]},
10371038# => {running_nodes,[rabbit@rabbit1,rabbit@rabbit2]}]
10381039# => ...done.
1040+ ```
1041+
1042+ Now node ` rabbit@rabbit3 ` can be decomissioned to reset and started as
1043+ a standalone node:
1044+
10391045
1046+ ``` shell
10401047# on rabbit3
1048+ rabbitmqctl reset
1049+
1050+ rabbitmqctl start_app
1051+ # => Starting node rabbit@rabbit3 ...
1052+
10411053rabbitmqctl cluster_status
10421054# => Cluster status of node rabbit@rabbit3 ...
10431055# => [{nodes,[{disc,[rabbit@rabbit3]}]},{running_nodes,[rabbit@rabbit3]}]
10441056# => ...done.
10451057```
10461058
1047- We can also remove nodes remotely. This is useful, for example, when
1048- having to deal with an unresponsive node. We can for example remove
1049- ` rabbit@rabbit1 ` from ` rabbit@rabbit2 ` .
1059+ Nodes can be removed remotely, that is, from a different host, as long as CLI tools
1060+ on said host can [ connect and authenticate] ( ./cli ) to the target node.
1061+
1062+ This can useful, for example, when having to deal with a host that cannot be accessed.
1063+
1064+ In the rest of this example, ` rabbit@rabbit1 ` will be removed from its remaining
1065+ two node cluster with ` rabbit@rabbit2 ` :
10501066
10511067``` bash
10521068# on rabbit1
@@ -1059,16 +1075,32 @@ rabbitmqctl forget_cluster_node rabbit@rabbit1
10591075# => ...done.
10601076```
10611077
1062- Note that ` rabbit1 ` still thinks it's clustered with
1078+ ### Removal of Stopped Nodes and Their Revival
1079+
1080+ ::: important
1081+
1082+ A node that was removed from the cluster when stopped with ` rabbitmqctl stop_app `
1083+ must be either reset or decomissioned. If started without a reset,
1084+ it won't be able to rejoin its original cluster.
1085+
1086+ :::
1087+
1088+ At this point ` rabbit1 ` still thinks it is clustered with
10631089` rabbit2 ` , and trying to start it will result in an
1064- error. We will need to reset it to be able to start it again.
1090+ error because the rest of the cluster no longer considers it to be a known member:
10651091
10661092``` bash
10671093# on rabbit1
10681094rabbitmqctl start_app
10691095# => Starting node rabbit@rabbit1 ...
10701096# => Error: inconsistent_cluster: Node rabbit@rabbit1 thinks it's clustered with node rabbit@rabbit2, but rabbit@rabbit2 disagrees
1097+ ```
1098+
1099+ In order to completely detach it from the cluster, such
1100+ stopped node must be reset:
1101+
10711102
1103+ ``` shell
10721104rabbitmqctl reset
10731105# => Resetting node rabbit@rabbit1 ...done.
10741106
@@ -1078,7 +1110,7 @@ rabbitmqctl start_app
10781110```
10791111
10801112The ` cluster_status ` command now shows all three nodes
1081- operating as independent RabbitMQ brokers :
1113+ operating as independent RabbitMQ nodes (single node clusters) :
10821114
10831115``` bash
10841116# on rabbit1
@@ -1117,18 +1149,48 @@ rabbitmqctl start_app
11171149# => Starting node rabbit@rabbit2 ...done.
11181150```
11191151
1152+ ### Removal of Unresponsive Queues
1153+
1154+ When target node is not running, it can still be removed from the cluster using
1155+ using ` rabbitmqctl forget_cluster_node ` :
1156+
1157+ ``` bash
1158+ # Tell rabbit@rabbit1 to permanently remove rabbit@rabbit2
1159+ rabbitmqctl forget_cluster_node -n rabbit@rabbit1 rabbit@rabbit2
1160+ # => Removing node rabbit@rabbit1 from cluster ...
1161+ # => ...done.
1162+ ```
1163+
1164+ ### What Happens to Quorum Queue and Stream Replicas?
1165+
1166+ When a node is removed from the cluster using CLI tools, all [ quorum queue] ( ./quorum-queues#replica-management )
1167+ and [ stream replicas] ( ./streams#replica-management ) on the node will be removed,
1168+ even if that means that queues and streams would temporarily have an even (e.g. two) replicas.
1169+
1170+ ### Node Removal is Explicit (Manual) or Opt-in
1171+
11201172Besides ` rabbitmqctl forget_cluster_node ` and the automatic cleanup of unknown nodes
11211173by some [ peer discovery] ( ./cluster-formation ) plugins, there are no scenarios
11221174in which a RabbitMQ node will permanently remove its peer node from a cluster.
11231175
1124- ### How to Reset a Node {#resetting-nodes}
11251176
1126- Sometimes it may be necessary to reset a node (wipe all of its data) and later make it rejoin the cluster.
1127- Generally speaking, there are two possible scenarios: when the node is running, and when the node cannot start
1128- or won't respond to CLI tool commands e.g. due to an issue such as [ ERL-430] ( https://bugs.erlang.org/browse/ERL-430 ) .
1177+
1178+ ## How to Reset a Node {#resetting-nodes}
1179+
1180+ ::: danger
11291181
11301182Resetting a node will delete all of its data, cluster membership information, configured [ runtime parameters] ( ./parameters ) ,
1131- users, virtual hosts and any other node data. It will also permanently remove the node from its cluster.
1183+ users, virtual hosts and any other node data. It will also alter its internal identity.
1184+
1185+ :::
1186+
1187+ Sometimes it may be necessary to reset a node (what specifically this means, see below),
1188+ and later make it rejoin the cluster as a new node.
1189+
1190+ Generally speaking, there are two possible scenarios: when the node is running, and when the node cannot start
1191+ or won't respond to CLI tool commands for any reason.
1192+
1193+ ### Reset a Running and Responsive Node
11321194
11331195To reset a running and responsive node, first stop RabbitMQ on it using ` rabbitmqctl stop_app `
11341196and then reset it using ` rabbitmqctl reset ` :
@@ -1141,16 +1203,47 @@ rabbitmqctl reset
11411203# => Resetting node rabbit@rabbit1 ...done.
11421204```
11431205
1206+ ::: info
1207+
1208+ If the reset node is online and its cluster peers are reachable, the node
1209+ will first try to permanently remove itself from its cluster.
1210+
1211+ :::
1212+
1213+ ### Reset an Unresponsive Node
1214+
11441215In case of a non-responsive node, it must be stopped first using any means necessary.
11451216For nodes that fail to start this is already the case. Then [ override] ( ./relocate )
1146- the node's data directory location or [ re ] move the existing data store. This will make the node
1217+ the node's data directory location or remove the existing data store. This will make the node
11471218start as a blank one. It will have to be instructed to [ rejoin its original cluster] ( #cluster-formation ) , if any.
11481219
1149- A node that's been reset and rejoined its original cluster will sync all virtual hosts, users, permissions
1150- and topology (queues, exchanges, bindings), runtime parameters and policies. [ Quorum queue] ( ./quorum-queues )
1151- contents will be replicated if the node will be selected to host a replica.
1220+ ### Resetting a Node to Re-add It as a Brand New Node to Its Original Cluster
1221+
1222+ A reset node that was [ removed from the cluster] ( #removing-nodes ) can be re-added to its original
1223+ cluster as a brand new node.
1224+
1225+ In that case it will sync all virtual hosts, users, permissions and topology (queues, exchanges, bindings),
1226+ runtime parameters and policies.
1227+
1228+ For [ quorum queue] ( ./quorum-queues ) and [ stream] ( ./streams ) contents to be replicated to the new [ re] added node,
1229+ the node must be added to the list of nodes to place replicas on using ` rabbitmq-queues grow ` .
1230+
11521231Non-replicated queue contents on a reset node will be lost.
11531232
1233+
1234+ ## Forcing Node Boot in Case of Unavailable Peers {#forced-boot}
1235+
1236+ In some cases the last node to go
1237+ offline cannot be brought back up. It can be removed from the
1238+ cluster using the ` forget_cluster_node ` [ rabbitmqctl] ( ./cli ) command.
1239+
1240+ Alternatively ` force_boot ` [ rabbitmqctl] ( ./cli ) command can be used
1241+ on a node to make it boot without trying to sync with any
1242+ peers (as if they were last to shut down). This is
1243+ usually only necessary if the last node to shut down or a
1244+ set of nodes will never be brought back online.
1245+
1246+
11541247## Upgrading clusters {#upgrading}
11551248
11561249You can find instructions for upgrading a cluster in [ the upgrade
0 commit comments