Skip to content

Commit c0def78

Browse files
Merge pull request #233043 from rdeltcheva/rhel9-chg
Adjustments for RHEL 9.0
2 parents e0683d7 + 9be41cf commit c0def78

File tree

1 file changed

+42
-26
lines changed

1 file changed

+42
-26
lines changed

articles/sap/workloads/high-availability-guide-rhel-pacemaker.md

Lines changed: 42 additions & 26 deletions
Original file line numberDiff line numberDiff line change
@@ -14,7 +14,7 @@ ms.topic: article
1414
ms.tgt_pltfrm: vm-windows
1515
ms.workload: infrastructure-services
1616
ms.custom: subject-rbac-steps
17-
ms.date: 09/22/2022
17+
ms.date: 04/03/2022
1818
ms.author: radeltch
1919

2020
---
@@ -33,10 +33,12 @@ ms.author: radeltch
3333
[2191498]:https://launchpad.support.sap.com/#/notes/2191498
3434
[2243692]:https://launchpad.support.sap.com/#/notes/2243692
3535
[1999351]:https://launchpad.support.sap.com/#/notes/1999351
36+
[3108316]:https://launchpad.support.sap.com/#/notes/3108316
37+
[3108302]:https://launchpad.support.sap.com/#/notes/3108302
3638

3739
[virtual-machines-linux-maintenance]:../../virtual-machines/maintenance-and-updates.md#maintenance-that-doesnt-require-a-reboot
3840

39-
The article describes how to configure basic Pacemaker cluster on Red Hat Enterprise Server(RHEL). The instructions cover both RHEL 7 and RHEL 8.
41+
The article describes how to configure basic Pacemaker cluster on Red Hat Enterprise Server(RHEL). The instructions cover RHEL 7, RHEL 8 and RHEL 9.
4042

4143
## Prerequisites
4244
Read the following SAP Notes and papers first:
@@ -47,8 +49,10 @@ Read the following SAP Notes and papers first:
4749
* The supported SAP software, and operating system (OS) and database combinations.
4850
* The required SAP kernel version for Windows and Linux on Microsoft Azure.
4951
* SAP Note [2015553] lists prerequisites for SAP-supported SAP software deployments in Azure.
50-
* SAP Note [2002167] has recommended OS settings for Red Hat Enterprise Linux
52+
* SAP Note [2002167] recommends OS settings for Red Hat Enterprise Linux
53+
* SAP Note [3108316] recommends OS settings for Red Hat Enterprise Linux 9.x
5154
* SAP Note [2009879] has SAP HANA Guidelines for Red Hat Enterprise Linux
55+
* SAP Note [3108302] has SAP HANA Guidelines for Red Hat Enterprise Linux 9.x
5256
* SAP Note [2178632] has detailed information about all monitoring metrics reported for SAP in Azure.
5357
* SAP Note [2191498] has the required SAP Host Agent version for Linux in Azure.
5458
* SAP Note [2243692] has information about SAP licensing on Linux in Azure.
@@ -78,11 +82,11 @@ Read the following SAP Notes and papers first:
7882
> Red Hat doesn't support software-emulated watchdog. Red Hat doesn't support SBD on cloud platforms. For details see [Support Policies for RHEL High Availability Clusters - sbd and fence_sbd](https://access.redhat.com/articles/2800691).
7983
> The only supported fencing mechanism for Pacemaker Red Hat Enterprise Linux clusters on Azure, is Azure fence agent.
8084
81-
The following items are prefixed with either **[A]** - applicable to all nodes, **[1]** - only applicable to node 1 or **[2]** - only applicable to node 2. Differences in the commands or the configuration between RHEL 7 and RHEL 8 are marked in the document.
85+
The following items are prefixed with either **[A]** - applicable to all nodes, **[1]** - only applicable to node 1 or **[2]** - only applicable to node 2. Differences in the commands or the configuration between RHEL 7 and RHEL 8/RHEL 9 are marked in the document.
8286

83-
1. **[A]** Register - optional step. This step is not required, if using RHEL SAP HA-enabled images.
87+
1. **[A]** Register - optional step. This step isn't required, if using RHEL SAP HA-enabled images.
8488

85-
Register your virtual machines and attach it to a pool that contains repositories for RHEL 7.
89+
For example, if deploying on RHEL 7, register your virtual machine and attach it to a pool that contains repositories for RHEL 7.
8690

8791
<pre><code>sudo subscription-manager register
8892
# List the available pools
@@ -92,7 +96,7 @@ The following items are prefixed with either **[A]** - applicable to all nodes,
9296

9397
By attaching a pool to an Azure Marketplace PAYG RHEL image, you will be effectively double-billed for your RHEL usage: once for the PAYG image, and once for the RHEL entitlement in the pool you attach. To mitigate this situation, Azure now provides BYOS RHEL images. For more information, see [Red Hat Enterprise Linux bring-your-own-subscription Azure images](../../virtual-machines/workloads/redhat/byos.md).
9498

95-
1. **[A]** Enable RHEL for SAP repos - optional step. This step is not required, if using RHEL SAP HA-enabled images.
99+
1. **[A]** Enable RHEL for SAP repos - optional step. This step isn't required, if using RHEL SAP HA-enabled images.
96100

97101
In order to install the required packages on RHEL 7, enable the following repositories.
98102

@@ -105,9 +109,9 @@ The following items are prefixed with either **[A]** - applicable to all nodes,
105109

106110
1. **[A]** Install RHEL HA Add-On
107111

108-
<pre><code>sudo yum install -y pcs pacemaker fence-agents-azure-arm nmap-ncat
109-
</code></pre>
110-
112+
```sudo yum install -y pcs pacemaker fence-agents-azure-arm nmap-ncat
113+
```
114+
111115
> [!IMPORTANT]
112116
> We recommend the following versions of Azure Fence agent (or later) for customers to benefit from a faster failover time, if a resource stop fails or the cluster nodes cannot communicate which each other anymore:
113117
> RHEL 7.7 or higher use the latest available version of fence-agents package
@@ -123,7 +127,13 @@ The following items are prefixed with either **[A]** - applicable to all nodes,
123127
> RHEL 8.1: fence-agents-4.2.1-30.el8_1.4
124128
> RHEL 7.9: fence-agents-4.2.1-41.el7_9.4.
125129
126-
Check the version of the Azure fence agent. If necessary, update it to a version equal to or later than the stated above.
130+
> [!IMPORTANT]
131+
> On RHEL 9, we recommend the following package versions (or later) to avoid issues with Azure Fence agent:
132+
> fence-agents-4.10.0-20.el9_0.7
133+
> fence-agents-common-4.10.0-20.el9_0.6
134+
> ha-cloud-support-4.10.0-20.el9_0.6.x86_64.rpm
135+
136+
Check the version of the Azure fence agent. If necessary, update it to the minimum required version or later.
127137

128138
<pre><code># Check the version of the Azure Fence Agent
129139
sudo yum info fence-agents-azure-arm
@@ -132,13 +142,19 @@ The following items are prefixed with either **[A]** - applicable to all nodes,
132142
> [!IMPORTANT]
133143
> If you need to update the Azure Fence agent, and if using custom role, make sure to update the custom role to include action **powerOff**. For details see [Create a custom role for the fence agent](#1-create-a-custom-role-for-the-fence-agent).
134144
145+
1. If deploying on RHEL 9, install also the resource agents for cloud deployment:
146+
147+
```
148+
sudo yum install -y resource-agents-cloud
149+
```
150+
135151
1. **[A]** Setup host name resolution
136152
137153
You can either use a DNS server or modify the /etc/hosts on all nodes. This example shows how to use the /etc/hosts file.
138154
Replace the IP address and the hostname in the following commands.
139155
140156
>[!IMPORTANT]
141-
> If using host names in the cluster configuration, it is vital to have reliable host name resolution. The cluster communication will fail, if the names are not available and that can lead to cluster failover delays.
157+
> If using host names in the cluster configuration, it's vital to have reliable host name resolution. The cluster communication will fail, if the names are not available and that can lead to cluster failover delays.
142158
> The benefit of using /etc/hosts is that your cluster becomes independent of DNS, which could be a single point of failures too.
143159
144160
<pre><code>sudo vi /etc/hosts
@@ -183,7 +199,7 @@ The following items are prefixed with either **[A]** - applicable to all nodes,
183199
sudo pcs cluster start --all
184200
</code></pre>
185201
186-
If building a cluster on **RHEL 8.x**, use the following commands:
202+
If building a cluster on **RHEL 8.x/RHEL 9.x**, use the following commands:
187203
<pre><code>sudo pcs host auth <b>prod-cl1-0</b> <b>prod-cl1-1</b> -u hacluster
188204
sudo pcs cluster setup <b>nw1-azr</b> <b>prod-cl1-0</b> <b>prod-cl1-1</b> totem token=30000
189205
sudo pcs cluster start --all
@@ -233,7 +249,7 @@ The following items are prefixed with either **[A]** - applicable to all nodes,
233249
The fencing device uses either a managed identity for Azure resource or service principal to authorize against Microsoft Azure.
234250
235251
### Using Managed Identity
236-
To create a managed identity (MSI), [create a system-assigned](../../active-directory/managed-identities-azure-resources/qs-configure-portal-windows-vm.md#system-assigned-managed-identity) managed identity for each VM in the cluster. Should a system-assigned managed identity already exist, it will be used. User assigned managed identities should not be used with Pacemaker at this time. Fence device, based on managed identity is supported on RHEL 7.9 and RHEL 8.x.
252+
To create a managed identity (MSI), [create a system-assigned](../../active-directory/managed-identities-azure-resources/qs-configure-portal-windows-vm.md#system-assigned-managed-identity) managed identity for each VM in the cluster. Should a system-assigned managed identity already exist, it will be used. User assigned managed identities should not be used with Pacemaker at this time. Fence device, based on managed identity is supported on RHEL 7.9 and RHEL 8.x/RHEL 9.x.
237253
238254
### Using Service Principal
239255
Follow these steps to create a service principal, if not using managed identity.
@@ -245,15 +261,15 @@ Follow these steps to create a service principal, if not using managed identity.
245261
1. Click New Registration
246262
1. Enter a Name, select "Accounts in this organization directory only"
247263
2. Select Application Type "Web", enter a sign-on URL (for example http:\//localhost) and click Add
248-
The sign-on URL is not used and can be any valid URL
264+
The sign-on URL isn't used and can be any valid URL
249265
1. Select Certificates and Secrets, then click New client secret
250266
1. Enter a description for a new key, select "Never expires" and click Add
251267
1. Make a node the Value. It is used as the **password** for the service principal
252-
1. Select Overview. Make a note the Application ID. It is used as the username (**login ID** in the steps below) of the service principal
268+
1. Select Overview. Make a note the Application ID. It's used as the username (**login ID** in the steps below) of the service principal
253269
254270
### **[1]** Create a custom role for the fence agent
255271
256-
Neither managed identity nor service principal has permissions to access your Azure resources by default. You need to give the managed identity or service principal permissions to start and stop (power-off) all virtual machines of the cluster. If you did not already create the custom role, you can create it using [PowerShell](../../role-based-access-control/custom-roles-powershell.md) or [Azure CLI](../../role-based-access-control/custom-roles-cli.md)
272+
Neither managed identity nor service principal has permissions to access your Azure resources by default. You need to give the managed identity or service principal permissions to start and stop (power-off) all virtual machines of the cluster. If you didn't already create the custom role, you can create it using [PowerShell](../../role-based-access-control/custom-roles-powershell.md) or [Azure CLI](../../role-based-access-control/custom-roles-cli.md)
257273
258274
Use the following content for the input file. You need to adapt the content to your subscriptions that is, replace *xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx* and *yyyyyyyy-yyyy-yyyy-yyyy-yyyyyyyyyyyy* with the Ids of your subscription. If you only have one subscription, remove the second entry in AssignableScopes.
259275
@@ -287,7 +303,7 @@ Assign the custom role "Linux Fence Agent Role" that was created in the last cha
287303
288304
#### Using Service Principal
289305

290-
Assign the custom role "Linux Fence Agent Role" that was created in the last chapter to the service principal. Do not use the Owner role anymore! For detailed steps, see [Assign Azure roles using the Azure portal](../../role-based-access-control/role-assignments-portal.md).
306+
Assign the custom role "Linux Fence Agent Role" that was created in the last chapter to the service principal. Don't use the Owner role anymore! For detailed steps, see [Assign Azure roles using the Azure portal](../../role-based-access-control/role-assignments-portal.md).
291307
Make sure to assign the role for both cluster nodes.
292308

293309
### **[1]** Create the fencing devices
@@ -305,14 +321,14 @@ sudo pcs property set stonith-timeout=900
305321

306322
#### [Managed Identity](#tab/msi)
307323

308-
For RHEL **7.X**, use the following command to configure the fence device:
324+
For RHEL **7.x**, use the following command to configure the fence device:
309325
<pre><code>sudo pcs stonith create rsc_st_azure fence_azure_arm <b>msi=true</b> resourceGroup="<b>resource group</b>" \
310326
subscriptionId="<b>subscription id</b>" <b>pcmk_host_map="prod-cl1-0:prod-cl1-0-vm-name;prod-cl1-1:prod-cl1-1-vm-name"</b> \
311327
power_timeout=240 pcmk_reboot_timeout=900 pcmk_monitor_timeout=120 pcmk_monitor_retries=4 pcmk_action_limit=3 pcmk_delay_max=15 \
312328
op monitor interval=3600
313329
</code></pre>
314330

315-
For RHEL **8.X**, use the following command to configure the fence device:
331+
For RHEL **8.x/9.x**, use the following command to configure the fence device:
316332
<pre><code>sudo pcs stonith create rsc_st_azure fence_azure_arm <b>msi=true</b> resourceGroup="<b>resource group</b>" \
317333
subscriptionId="<b>subscription id</b>" <b>pcmk_host_map="prod-cl1-0:prod-cl1-0-vm-name;prod-cl1-1:prod-cl1-1-vm-name"</b> \
318334
power_timeout=240 pcmk_reboot_timeout=900 pcmk_monitor_timeout=120 pcmk_monitor_retries=4 pcmk_action_limit=3 pcmk_delay_max=15 \
@@ -329,7 +345,7 @@ power_timeout=240 pcmk_reboot_timeout=900 pcmk_monitor_timeout=120 pcmk_monitor_
329345
op monitor interval=3600
330346
</code></pre>
331347

332-
For RHEL **8.x**, use the following command to configure the fence device:
348+
For RHEL **8.x/9.x**, use the following command to configure the fence device:
333349
<pre><code>sudo pcs stonith create rsc_st_azure fence_azure_arm username="<b>login ID</b>" password="<b>password</b>" \
334350
resourceGroup="<b>resource group</b>" tenantId="<b>tenant ID</b>" subscriptionId="<b>subscription id</b>" \
335351
<b>pcmk_host_map="prod-cl1-0:prod-cl1-0-vm-name;prod-cl1-1:prod-cl1-1-vm-name"</b> \
@@ -339,14 +355,14 @@ op monitor interval=3600
339355

340356
---
341357

342-
If you are using fencing device, based on service principal configuration, read [Change from SPN to MSI for Pacemaker clusters using Azure fencing](https://techcommunity.microsoft.com/t5/running-sap-applications-on-the/sap-on-azure-high-availability-change-from-spn-to-msi-for/ba-p/3609278) and learn how to convert to managed identity configuration.
358+
If you're using fencing device, based on service principal configuration, read [Change from SPN to MSI for Pacemaker clusters using Azure fencing](https://techcommunity.microsoft.com/t5/running-sap-applications-on-the/sap-on-azure-high-availability-change-from-spn-to-msi-for/ba-p/3609278) and learn how to convert to managed identity configuration.
343359

344360
> [!TIP]
345361
> Only configure the `pcmk_delay_max` attribute in two node Pacemaker clusters. For more information on preventing fence races in a two node Pacemaker cluster, see [Delaying fencing in a two node cluster to prevent fence races of "fence death" scenarios](https://access.redhat.com/solutions/54829).
346362
347363

348364
> [!IMPORTANT]
349-
> The monitoring and fencing operations are de-serialized. As a result, if there is a longer running monitoring operation and simultaneous fencing event, there is no delay to the cluster failover, due to the already running monitoring operation.
365+
> The monitoring and fencing operations are deserialized. As a result, if there is a longer running monitoring operation and simultaneous fencing event, there is no delay to the cluster failover, due to the already running monitoring operation.
350366
351367
### **[1]** Enable the use of a fencing device
352368

@@ -362,7 +378,7 @@ If you are using fencing device, based on service principal configuration, read
362378
> [!TIP]
363379
> This section is only applicable, if it is desired to configure special fencing device `fence_kdump`.
364380
365-
If there is a need to collect diagnostic information within the VM, it may be useful to configure additional fencing device, based on fence agent `fence_kdump`. The `fence_kdump` agent can detect that a node entered kdump crash recovery and can allow the crash recovery service to complete, before other fencing methods are invoked. Note that `fence_kdump` is not a replacement for traditional fence mechanisms, like Azure Fence Agent when using Azure VMs.
381+
If there is a need to collect diagnostic information within the VM, it may be useful to configure additional fencing device, based on fence agent `fence_kdump`. The `fence_kdump` agent can detect that a node entered kdump crash recovery and can allow the crash recovery service to complete, before other fencing methods are invoked. Note that `fence_kdump` isn't a replacement for traditional fence mechanisms, like Azure Fence Agent when using Azure VMs.
366382

367383
> [!IMPORTANT]
368384
> Be aware that when `fence_kdump` is configured as a first level fencing device, it will introduce delays in the fencing operations and respectively delays in the application resources failover.
@@ -376,8 +392,8 @@ The following Red Hat KBs contain important information about configuring `fence
376392

377393
* [How do I configure fence_kdump in a Red Hat Pacemaker cluster](https://access.redhat.com/solutions/2876971)
378394
* [How to configure/manage fencing levels in RHEL cluster with Pacemaker](https://access.redhat.com/solutions/891323)
379-
* [fence_kdump fails with "timeout after X seconds" in a RHEL 6 0r 7 HA cluster with kexec-tools older than 2.0.14](https://access.redhat.com/solutions/2388711)
380-
* For information how to change change the default timeout see [How do I configure kdump for use with the RHEL 6,7,8 HA Add-On](https://access.redhat.com/articles/67570)
395+
* [fence_kdump fails with "timeout after X seconds" in a RHEL 6 or 7 HA cluster with kexec-tools older than 2.0.14](https://access.redhat.com/solutions/2388711)
396+
* For information how to change the default timeout see [How do I configure kdump for use with the RHEL 6,7,8 HA Add-On](https://access.redhat.com/articles/67570)
381397
* For information on how to reduce failover delay, when using `fence_kdump` see [Can I reduce the expected delay of failover when adding fence_kdump configuration](https://access.redhat.com/solutions/5512331)
382398

383399
Execute the following optional steps to add `fence_kdump` as a first level fencing configuration, in addition to the Azure Fence Agent configuration.

0 commit comments

Comments
 (0)