You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: support/azure/virtual-machines/linux/troubleshoot-rhel-pacemaker-cluster-services-resources-startup-issues.md
+9-9Lines changed: 9 additions & 9 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -667,9 +667,9 @@ Once the fencing operation is complete, the affected node typically doesn't rejo
667
667
668
668
### Cause for scenario 5
669
669
670
-
After the node was fenced, rebooted, and restarted its cluster services, it subsequently received a message stating "We were allegedly just fenced", which caused it to shut down its pacemaker and corosync services and prevented the cluster from starting. Node1 initiated a STONITH action against node2, and at `03:27:23`, when the network issue was resolved, node2 rejoined the corosync membership. Consequently, a new two-node membership was established, as shown in`/var/log/messages`for node1.
670
+
After the node was fenced, rebooted, and restarted its cluster services, it subsequently received a message stating `We were allegedly just fenced`, which caused it to shut down its pacemaker and corosync services and prevented the cluster from starting. Node1 initiated a STONITH action against node2, and at `03:27:23`, when the network issue was resolved, node2 rejoined the corosync membership. Consequently, a new two-node membership was established, as shown in`/var/log/messages`for node1.
671
671
672
-
```bash
672
+
```output
673
673
Feb 20 03:26:56 node1 corosync[1722]: [TOTEM ] A processor failed, forming new configuration.
674
674
Feb 20 03:27:23 node1 corosync[1722]: [TOTEM ] A new membership (1.116f4) was formed. Members left: 2
675
675
Feb 20 03:27:24 node1 corosync[1722]: [QUORUM] Members[1]: 1
@@ -684,14 +684,14 @@ Feb 20 03:27:25 node1 corosync[1722]: [MAIN ] Completed service synchronizatio
684
684
685
685
node1 received confirmation that node2 had been successfully rebooted as shown in`/var/log/messages`for node2.
686
686
687
-
```bash
687
+
```output
688
688
Feb 20 03:27:46 node1 pacemaker-fenced[1736]: notice: Operation 'reboot' [43895] (call 28 from pacemaker-controld.1740) targeting node2 using xvm2 returned 0 (OK)
689
689
```
690
690
691
691
To fully complete the STONITH action, the system needed to deliver the confirmation message to every node. Since node2 rejoined the group at `03:27:25` and no new membership excluding node2 had yet been formed due to the token and consensus timeouts not having expired, the confirmation message was delayed until node2 restarted its cluster services after boot. Upon receiving the message, node2 recognized that it had been fenced and consequently shut down its services as shown:
692
692
693
-
`/var/log/messages' in node1:
694
-
```bash
693
+
`/var/log/messages`in node1:
694
+
```output
695
695
Feb 20 03:29:02 node1 corosync[1722]: [TOTEM ] A processor failed, forming new configuration.
696
696
Feb 20 03:29:10 node1 corosync[1722]: [TOTEM ] A new membership (1.116fc) was formed. Members joined: 2 left: 2
697
697
Feb 20 03:29:10 node1 corosync[1722]: [QUORUM] Members[2]: 1 2
Feb 20 03:29:09 node2 pacemaker-controld [1323] (tengine_stonith_notify) crit: We were allegedly just fenced by node1 for node1!
712
712
```
713
713
714
714
### Resolution for scenario 5
715
715
716
-
Configure a startup delay forthe corosync service. This pause provides sufficient time for a new CPG membership to form and excluding the fenced node, so that the STONITH reboot process can complete by ensuring the completion message reaches all nodesin the membership.
716
+
Configure a startup delay forthe corosync service. This pause provides sufficient time for a new CPG(Closed Process Group) membership to form and excluding the fenced node, so that the STONITH reboot process can complete by ensuring the completion message reaches all nodesin the membership.
717
717
718
-
To achieve this, executing the following commands:
0 commit comments