Skip to content

Commit e5c493f

Browse files
Merge pull request #303689 from dennispadia/dp-sbdchange
Update SBD_DELAY_START, TimeoutSec, stonith-timeout value in pacemaker setup. Change in order constraint for HANA on ANF
2 parents b6e66b2 + ee51a70 commit e5c493f

File tree

3 files changed

+184
-153
lines changed

3 files changed

+184
-153
lines changed

articles/sap/workloads/high-availability-guide-rhel-pacemaker.md

Lines changed: 27 additions & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -43,12 +43,12 @@ Read the following SAP Notes and articles first:
4343
## Overview
4444

4545
> [!IMPORTANT]
46-
> Pacemaker clusters that span multiple Virtual networks(VNets)/subnets are not covered by standard support policies.
46+
> Pacemaker clusters that span multiple Virtual networks(VNets)/subnets aren't covered by standard support policies.
4747
4848
There are two options available on Azure for configuring the fencing in a pacemaker cluster for RHEL: Azure fence agent, which restarts a failed node via the Azure APIs, or you can use SBD device.
4949

5050
> [!IMPORTANT]
51-
> In Azure, RHEL high availability cluster with storage based fencing (fence_sbd) uses software-emulated watchdog. It is important to review [Software-Emulated Watchdog Known Limitations](https://access.redhat.com/articles/7034141) and [Support Policies for RHEL High Availability Clusters - sbd and fence_sbd](https://access.redhat.com/articles/2800691) when selecting SBD as the fencing mechanism.
51+
> In Azure, RHEL high availability cluster with storage based fencing (fence_sbd) uses software-emulated watchdog. It's important to review [Software-Emulated Watchdog Known Limitations](https://access.redhat.com/articles/7034141) and [Support Policies for RHEL High Availability Clusters - sbd and fence_sbd](https://access.redhat.com/articles/2800691) when selecting SBD as the fencing mechanism.
5252
5353
### Use an SBD device
5454

@@ -66,9 +66,9 @@ You can configure the SBD device by using either of two options:
6666
![Diagram of pacemaker with iSCSI target server as SBD device in RHEL](./media/high-availability-guide-suse-pacemaker/pacemaker.png)
6767

6868
> [!IMPORTANT]
69-
> When you're planning to deploy and configure Linux pacemaker cluster nodes and SBD devices, do not allow the routing between your virtual machines and the VMs that are hosting the SBD devices to pass through any other devices, such as a [network virtual appliance (NVA)](https://azure.microsoft.com/solutions/network-appliances/).
69+
> When you're planning to deploy and configure Linux pacemaker cluster nodes and SBD devices, don't allow the routing between your virtual machines and the VMs that are hosting the SBD devices to pass through any other devices, such as a [network virtual appliance (NVA)](https://azure.microsoft.com/solutions/network-appliances/).
7070
>
71-
> Maintenance events and other issues with the NVA can have a negative impact on the stability and reliability of the overall cluster configuration. For more information, see [user-defined routing rules](../../virtual-network/virtual-networks-udr-overview.md).
71+
> Maintenance events and other issues with the NVA can have a negative effect on the stability and reliability of the overall cluster configuration. For more information, see [user-defined routing rules](../../virtual-network/virtual-networks-udr-overview.md).
7272
7373
* SBD with Azure shared disk
7474

@@ -105,7 +105,7 @@ You first need to create the iSCSI target virtual machines. You can share iSCSI
105105

106106
1. Deploy virtual machines that run on supported RHEL OS version, and connect to them via SSH. The VMs don't have to be of large size. VM sizes such as Standard_E2s_v3 or Standard_D2s_v3 are sufficient. Be sure to use Premium storage for the OS disk.
107107

108-
2. It isn't necessary to use RHEL for SAP with HA and Update Services, or RHEL for SAP Apps OS image for the iSCSI target server. A standard RHEL OS image can be used instead. However, be aware that the support life cycle varies between different OS product releases.
108+
2. It isn't necessary to use RHEL for SAP with HA and Update Services, or RHEL for SAP Apps OS image for the iSCSI target server. A standard RHEL OS image can be used instead. However, the support life cycle varies between different OS product releases.
109109

110110
3. Run following commands on all iSCSI target virtual machines.
111111

@@ -376,7 +376,8 @@ On the cluster nodes, connect and discover iSCSI device that was created in the
376376
[...]
377377
SBD_STARTMODE=always
378378
[...]
379-
SBD_DELAY_START=yes
379+
# # In some cases, a longer delay than the default "msgwait" seconds is needed. So, set a specific delay value, in seconds. See, `man sbd` for more information.
380+
SBD_DELAY_START=216
380381
[...]
381382
```
382383
@@ -397,12 +398,13 @@ On the cluster nodes, connect and discover iSCSI device that was created in the
397398
398399
```bash
399400
sudo mkdir /etc/systemd/system/sbd.service.d
400-
echo -e "[Service]\nTimeoutSec=144" | sudo tee /etc/systemd/system/sbd.service.d/sbd_delay_start.conf
401+
echo -e "[Service]\nTimeoutSec=259" | sudo tee /etc/systemd/system/sbd.service.d/sbd_delay_start.conf
401402
sudo systemctl daemon-reload
402403
403404
systemctl show sbd | grep -i timeout
404-
# TimeoutStartUSec=2min 24s
405-
# TimeoutStopUSec=2min 24s
405+
# TimeoutStartUSec=4min 19s
406+
# TimeoutStopUSec=4min 19s
407+
# TimeoutAbortUSec=4min 19s
406408
```
407409
408410
## SBD with an Azure shared disk
@@ -507,7 +509,7 @@ foreach ($vmName in $vmNames) {
507509
sudo vi /etc/sysconfig/sbd
508510
```
509511
510-
2. Change the property of the SBD device, enable the pacemaker integration, and change the start mode of SBD
512+
2. Change the property of the SBD device, enable the pacemaker integration, change the start mode of SBD, and adjust SBD_DELAY_START value.
511513
512514
```bash
513515
[...]
@@ -517,7 +519,8 @@ foreach ($vmName in $vmNames) {
517519
[...]
518520
SBD_STARTMODE=always
519521
[...]
520-
SBD_DELAY_START=yes
522+
# In some cases, a longer delay than the default "msgwait" seconds is needed. So, set a specific delay value, in seconds. See, `man sbd` for more information.
523+
SBD_DELAY_START=216
521524
[...]
522525
```
523526
@@ -538,12 +541,13 @@ foreach ($vmName in $vmNames) {
538541
539542
```bash
540543
sudo mkdir /etc/systemd/system/sbd.service.d
541-
echo -e "[Service]\nTimeoutSec=144" | sudo tee /etc/systemd/system/sbd.service.d/sbd_delay_start.conf
544+
echo -e "[Service]\nTimeoutSec=259" | sudo tee /etc/systemd/system/sbd.service.d/sbd_delay_start.conf
542545
sudo systemctl daemon-reload
543546
544547
systemctl show sbd | grep -i timeout
545-
# TimeoutStartUSec=2min 24s
546-
# TimeoutStopUSec=2min 24s
548+
# TimeoutStartUSec=4min 19s
549+
# TimeoutStopUSec=4min 19s
550+
# TimeoutAbortUSec=4min 19s
547551
```
548552
549553
## Azure fence agent configuration
@@ -574,7 +578,7 @@ The fencing device uses either a managed identity for Azure resource or a servic
574578
1. Make a note of the **Value**. It's used as the **password** for the service principal.
575579
1. Select **Overview**. Make a note of the **Application ID**. It's used as the username (**login ID** in the following steps) of the service principal.
576580

577-
---
581+
---
578582

579583
2. Create a custom role for the fence agent
580584

@@ -618,7 +622,7 @@ The fencing device uses either a managed identity for Azure resource or a servic
618622
619623
Make sure to assign the role for both cluster nodes.
620624
621-
---
625+
---
622626
623627
## Cluster installation
624628
@@ -799,7 +803,7 @@ Based on the selected fencing mechanism, follow only one section for relevant in
799803
2. **[1]** For the SBD device configured using iSCSI target servers or Azure shared disk, run the following commands.
800804
801805
```bash
802-
sudo pcs property set stonith-timeout=144
806+
sudo pcs property set stonith-timeout=210
803807
sudo pcs property set stonith-enabled=true
804808
805809
# Replace the device IDs with your device ID.
@@ -812,7 +816,7 @@ Based on the selected fencing mechanism, follow only one section for relevant in
812816
813817
```bash
814818
sudo pcs cluster stop --all
815-
819+
816820
# It would take time to start the cluster as "SBD_DELAY_START" is set to "yes"
817821
sudo pcs cluster start --all
818822
```
@@ -838,7 +842,7 @@ Based on the selected fencing mechanism, follow only one section for relevant in
838842
> When using Azure government cloud, you must specify `cloud=` option when configuring fence agent. For example, `cloud=usgov` for the Azure US government cloud. For details on RedHat support on Azure government cloud, see [Support Policies for RHEL High Availability Clusters - Microsoft Azure Virtual Machines as Cluster Members](https://access.redhat.com/articles/3131341).
839843
840844
> [!TIP]
841-
> The option `pcmk_host_map` is *only* required in the command if the RHEL hostnames and the Azure VM names are *not* identical. Specify the mapping in the format **hostname:vm-name**. For more information, see [What format should I use to specify node mappings to fencing devices in pcmk_host_map?](https://access.redhat.com/solutions/2619961).
845+
> The option `pcmk_host_map` is *only* required in the command if the RHEL hostnames and the Azure VM names aren't* identical. Specify the mapping in the format **hostname:vm-name**. For more information, see [What format should I use to specify node mappings to fencing devices in pcmk_host_map?](https://access.redhat.com/solutions/2619961).
842846

843847
#### [Managed identity](#tab/msi)
844848

@@ -859,7 +863,7 @@ Based on the selected fencing mechanism, follow only one section for relevant in
859863
subscriptionId="subscription id" pcmk_host_map="prod-cl1-0:prod-cl1-0-vm-name;prod-cl1-1:prod-cl1-1-vm-name" \
860864
power_timeout=240 pcmk_reboot_timeout=900 pcmk_monitor_timeout=120 pcmk_monitor_retries=4 pcmk_action_limit=3 \
861865
op monitor interval=3600
862-
866+
863867
# Run following command if you are setting up fence agent on (two-node cluster and pacemaker version less than 2.0.4-6.el8)
864868
sudo pcs stonith create rsc_st_azure fence_azure_arm msi=true resourceGroup="resource group" \
865869
subscriptionId="subscription id" pcmk_host_map="prod-cl1-0:prod-cl1-0-vm-name;prod-cl1-1:prod-cl1-1-vm-name" \
@@ -888,7 +892,7 @@ Based on the selected fencing mechanism, follow only one section for relevant in
888892
pcmk_host_map="prod-cl1-0:prod-cl1-0-vm-name;prod-cl1-1:prod-cl1-1-vm-name" \
889893
power_timeout=240 pcmk_reboot_timeout=900 pcmk_monitor_timeout=120 pcmk_monitor_retries=4 pcmk_action_limit=3 \
890894
op monitor interval=3600
891-
895+
892896
# Run following command if you are setting up fence agent on (two-node cluster and pacemaker version less than 2.0.4-6.el8)
893897
sudo pcs stonith create rsc_st_azure fence_azure_arm username="login ID" password="password" \
894898
resourceGroup="resource group" tenantId="tenant ID" subscriptionId="subscription id" \
@@ -897,7 +901,7 @@ Based on the selected fencing mechanism, follow only one section for relevant in
897901
op monitor interval=3600
898902
```
899903

900-
---
904+
---
901905

902906
If you're using a fencing device based on service principal configuration, read [Change from SPN to MSI for Pacemaker clusters by using Azure fencing](https://techcommunity.microsoft.com/t5/running-sap-applications-on-the/sap-on-azure-high-availability-change-from-spn-to-msi-for/ba-p/3609278) and learn how to convert to managed identity configuration.
903907
@@ -1008,6 +1012,7 @@ The following Red Hat KB articles contain important information about configurin
10081012
* For information on how to change the default timeout, see [How do I configure kdump for use with the RHEL 6, 7, 8 HA Add-On?](https://access.redhat.com/articles/67570).
10091013
* For information on how to reduce failover delay when you use `fence_kdump`, see [Can I reduce the expected delay of failover when adding fence_kdump configuration?](https://access.redhat.com/solutions/5512331).
10101014
1015+
10111016
Run the following optional steps to add `fence_kdump` as a first-level fencing configuration, in addition to the Azure fence agent configuration.
10121017
10131018
1. **[A]** Verify that `kdump` is active and configured.

0 commit comments

Comments
 (0)