Skip to content

Commit fa2086a

Browse files
authored
Update formatting and clarify instructions in troubleshooting guide
1 parent d91a54a commit fa2086a

File tree

1 file changed

+9
-9
lines changed

1 file changed

+9
-9
lines changed

support/azure/virtual-machines/linux/troubleshoot-rhel-pacemaker-cluster-services-resources-startup-issues.md

Lines changed: 9 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -667,9 +667,9 @@ Once the fencing operation is complete, the affected node typically doesn't rejo
667667

668668
### Cause for scenario 5
669669

670-
After the node was fenced, rebooted, and restarted its cluster services, it subsequently received a message stating "We were allegedly just fenced", which caused it to shut down its pacemaker and corosync services and prevented the cluster from starting. Node1 initiated a STONITH action against node2, and at `03:27:23`, when the network issue was resolved, node2 rejoined the corosync membership. Consequently, a new two-node membership was established, as shown in `/var/log/messages` for node1.
670+
After the node was fenced, rebooted, and restarted its cluster services, it subsequently received a message stating `We were allegedly just fenced`, which caused it to shut down its pacemaker and corosync services and prevented the cluster from starting. Node1 initiated a STONITH action against node2, and at `03:27:23`, when the network issue was resolved, node2 rejoined the corosync membership. Consequently, a new two-node membership was established, as shown in `/var/log/messages` for node1.
671671

672-
```bash
672+
```output
673673
Feb 20 03:26:56 node1 corosync[1722]: [TOTEM ] A processor failed, forming new configuration.
674674
Feb 20 03:27:23 node1 corosync[1722]: [TOTEM ] A new membership (1.116f4) was formed. Members left: 2
675675
Feb 20 03:27:24 node1 corosync[1722]: [QUORUM] Members[1]: 1
@@ -684,14 +684,14 @@ Feb 20 03:27:25 node1 corosync[1722]: [MAIN ] Completed service synchronizatio
684684

685685
node1 received confirmation that node2 had been successfully rebooted as shown in `/var/log/messages` for node2.
686686

687-
```bash
687+
```output
688688
Feb 20 03:27:46 node1 pacemaker-fenced[1736]: notice: Operation 'reboot' [43895] (call 28 from pacemaker-controld.1740) targeting node2 using xvm2 returned 0 (OK)
689689
```
690690

691691
To fully complete the STONITH action, the system needed to deliver the confirmation message to every node. Since node2 rejoined the group at `03:27:25` and no new membership excluding node2 had yet been formed due to the token and consensus timeouts not having expired, the confirmation message was delayed until node2 restarted its cluster services after boot. Upon receiving the message, node2 recognized that it had been fenced and consequently shut down its services as shown:
692692

693-
`/var/log/messages' in node1:
694-
```bash
693+
`/var/log/messages` in node1:
694+
```output
695695
Feb 20 03:29:02 node1 corosync[1722]: [TOTEM ] A processor failed, forming new configuration.
696696
Feb 20 03:29:10 node1 corosync[1722]: [TOTEM ] A new membership (1.116fc) was formed. Members joined: 2 left: 2
697697
Feb 20 03:29:10 node1 corosync[1722]: [QUORUM] Members[2]: 1 2
@@ -704,18 +704,18 @@ Feb 20 03:29:11 node1 corosync[1722]: [QUORUM] Members[1]: 1
704704
Feb 20 03:29:11 node1 corosync[1722]: [MAIN ] Completed service synchronization, ready to provide service.
705705
```
706706

707-
`/var/log/messages' in node2:
708-
```bash
707+
`/var/log/messages` in node2:
708+
```output
709709
Feb 20 03:29:11 [1155] node2 corosync notice [TOTEM ] A new membership (1.116fc) was formed. Members joined: 1
710710
Feb 20 03:29:11 [1155] node2 corosync notice [QUORUM] Members[2]: 1 2
711711
Feb 20 03:29:09 node2 pacemaker-controld [1323] (tengine_stonith_notify) crit: We were allegedly just fenced by node1 for node1!
712712
```
713713

714714
### Resolution for scenario 5
715715

716-
Configure a startup delay for the corosync service. This pause provides sufficient time for a new CPG membership to form and excluding the fenced node, so that the STONITH reboot process can complete by ensuring the completion message reaches all nodes in the membership.
716+
Configure a startup delay for the corosync service. This pause provides sufficient time for a new CPG(Closed Process Group) membership to form and excluding the fenced node, so that the STONITH reboot process can complete by ensuring the completion message reaches all nodes in the membership.
717717

718-
To achieve this, executing the following commands:
718+
To achieve this, execute the following commands:
719719

720720
1. Put the cluster into maintenance mode:
721721

0 commit comments

Comments
 (0)