Skip to content

Commit d24575f

Browse files
Merge pull request #225322 from msftrobiro/sap-hana-scaleout-suse-upd
HANA scale-out multi-target
2 parents bc420d6 + 2cd5d48 commit d24575f

File tree

1 file changed

+60
-48
lines changed

1 file changed

+60
-48
lines changed

articles/virtual-machines/workloads/sap/sap-hana-high-availability-scale-out-hsr-suse.md

Lines changed: 60 additions & 48 deletions
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@ ms.service: virtual-machines-sap
99
ms.topic: article
1010
ms.tgt_pltfrm: vm-windows
1111
ms.workload: infrastructure-services
12-
ms.date: 12/07/2022
12+
ms.date: 01/27/2023
1313
ms.author: radeltch
1414

1515
---
@@ -47,7 +47,7 @@ ms.author: radeltch
4747

4848
This article describes how to deploy a highly available SAP HANA system in a scale-out configuration with HANA system replication (HSR) and Pacemaker on Azure SUSE Linux Enterprise Server virtual machines (VMs). The shared file systems in the presented architecture are NFS mounted and are provided by [Azure NetApp Files](../../../azure-netapp-files/azure-netapp-files-introduction.md) or [NFS share on Azure Files](../../../storage/files/files-nfs-protocol.md).
4949

50-
In the example configurations, installation commands, and so on, the HANA instance is **03** and the HANA system ID is **HN1**. The examples are based on HANA 2.0 SP5 and SUSE Linux Enterprise Server 12 SP5.
50+
In the example configurations, installation commands, and so on, the HANA instance is **03** and the HANA system ID is **HN1**.
5151

5252
Before you begin, refer to the following SAP notes and papers:
5353

@@ -191,7 +191,6 @@ For the configuration presented in this document, deploy seven virtual machines:
191191
1. Select the virtual machines of the HANA cluster (the NICs for the `client` subnet).
192192
1. Select **Add**.
193193
2. Select **Save**.
194-
195194
196195
1. Next, create a health probe:
197196
@@ -216,7 +215,6 @@ For the configuration presented in this document, deploy seven virtual machines:
216215
> [!Note]
217216
> When VMs without public IP addresses are placed in the backend pool of internal (no public IP address) Standard Azure load balancer, there will be no outbound internet connectivity, unless additional configuration is performed to allow routing to public end points. For details on how to achieve outbound connectivity see [Public endpoint connectivity for Virtual Machines using Azure Standard Load Balancer in SAP high-availability scenarios](./high-availability-guide-standard-load-balancer-outbound-connections.md).
218217
219-
220218
> [!IMPORTANT]
221219
> Do not enable TCP timestamps on Azure VMs placed behind Azure Load Balancer. Enabling TCP timestamps will cause the health probes to fail. Set parameter **net.ipv4.tcp_timestamps** to **0**. For details see [Load Balancer health probes](../../../load-balancer/load-balancer-custom-probe-overview.md).
222220
> See also SAP note [2382421](https://launchpad.support.sap.com/#/notes/2382421).
@@ -230,7 +228,6 @@ The next sections describe the steps to deploy NFS - you'll need to select only
230228
> [!TIP]
231229
> You chose to deploy `/hana/shared` on [NFS share on Azure Files](../../../storage/files/files-nfs-protocol.md) or [NFS volume on Azure NetApp Files](../../../azure-netapp-files/azure-netapp-files-introduction.md).
232230
233-
234231
#### Deploy the Azure NetApp Files infrastructure
235232
236233
Deploy ANF volumes for the `/hana/shared` file system. You will need a separate `/hana/shared` volume for each HANA system replication site. For more information, see [Set up the Azure NetApp Files infrastructure](./sap-hana-scale-out-standby-netapp-files-suse.md#set-up-the-azure-netapp-files-infrastructure).
@@ -240,7 +237,6 @@ In this example, the following Azure NetApp Files volumes were used:
240237
* volume **HN1**-shared-s1 (nfs://10.23.1.7/**HN1**-shared-s1)
241238
* volume **HN1**-shared-s2 (nfs://10.23.1.7/**HN1**-shared-s2)
242239
243-
244240
#### Deploy the NFS on Azure Files infrastructure
245241
246242
Deploy Azure Files NFS shares for the `/hana/shared` file system. You will need a separate `/hana/shared` Azure Files NFS share for each HANA system replication site. For more information, see [How to create an NFS share](../../../storage/files/storage-files-how-to-create-nfs-shares.md?tabs=azure-portal).
@@ -253,9 +249,9 @@ In this example, the following Azure Files NFS shares were used:
253249
## Operating system configuration and preparation
254250
255251
The instructions in the next sections are prefixed with one of the following abbreviations:
256-
* **[A]**: Applicable to all nodes
252+
* **[A]**: Applicable to all nodes, including majority maker
257253
* **[AH]**: Applicable to all HANA DB nodes
258-
* **[M]**: Applicable to the majority maker node
254+
* **[M]**: Applicable to the majority maker node only
259255
* **[AH1]**: Applicable to all HANA DB nodes on SITE 1
260256
* **[AH2]**: Applicable to all HANA DB nodes on SITE 2
261257
* **[1]**: Applicable only to HANA DB node 1, SITE 1
@@ -307,6 +303,9 @@ Configure and prepare your OS by doing the following steps:
307303
308304
3. **[A]** SUSE delivers special resource agents for SAP HANA and by default agents for SAP HANA scale-up are installed. Uninstall the packages for scale-up, if installed and install the packages for scenario SAP HANA scale-out. The step needs to be performed on all cluster VMs, including the majority maker.
309305
306+
> [!NOTE]
307+
> SAPHanaSR-ScaleOut version 0.181 or higher must be installed.
308+
310309
```bash
311310
# Uninstall scale-up packages and patterns
312311
sudo zypper remove patterns-sap-hana
@@ -326,7 +325,7 @@ You chose to deploy the SAP shared directories on [NFS share on Azure Files](../
326325
327326
In this example, the shared HANA file systems are deployed on Azure NetApp Files and mounted over NFSv4.1. Follow the steps in this section, only if you are using NFS on Azure NetApp Files.
328327
329-
1. **[A]** Prepare the OS for running SAP HANA on NetApp Systems with NFS, as described in SAP note [3024346 - Linux Kernel Settings for NetApp NFS](https://launchpad.support.sap.com/#/notes/3024346). Create configuration file */etc/sysctl.d/91-NetApp-HANA.conf* for the NetApp configuration settings.
328+
1. **[AH]** Prepare the OS for running SAP HANA on NetApp Systems with NFS, as described in SAP note [3024346 - Linux Kernel Settings for NetApp NFS](https://launchpad.support.sap.com/#/notes/3024346). Create configuration file */etc/sysctl.d/91-NetApp-HANA.conf* for the NetApp configuration settings.
330329
331330
<pre><code>
332331
vi /etc/sysctl.d/91-NetApp-HANA.conf
@@ -343,7 +342,7 @@ In this example, the shared HANA file systems are deployed on Azure NetApp Files
343342
net.ipv4.tcp_sack = 1
344343
</code></pre>
345344
346-
2. **[A]** Adjust the sunrpc settings, as recommended in SAP note [3024346 - Linux Kernel Settings for NetApp NFS](https://launchpad.support.sap.com/#/notes/3024346).
345+
2. **[AH]** Adjust the sunrpc settings, as recommended in SAP note [3024346 - Linux Kernel Settings for NetApp NFS](https://launchpad.support.sap.com/#/notes/3024346).
347346
348347
<pre><code>
349348
vi /etc/modprobe.d/sunrpc.conf
@@ -818,30 +817,40 @@ Create a dummy file system cluster resource, which will monitor and report failu
818817

819818
`on-fail=fence` attribute is also added to the monitor operation. With this option, if the monitor operation fails on a node, that node is immediately fenced.
820819

821-
## Implement HANA hooks SAPHanaSR and susChkSrv
820+
## Implement HANA HA hooks SAPHanaSrMultiTarget and susChkSrv
822821

823-
This important step is to optimize the integration with the cluster and detection when a cluster failover is possible. It is highly recommended to configure the SAPHanaSR Python hook. For HANA 2.0 SP5 and above, implementing both SAPHanaSR and susChkSrv hook is recommended.
822+
This important step is to optimize the integration with the cluster and detection when a cluster failover is possible. It is highly recommended to configure SAPHanaSrMultiTarget Python hook. For HANA 2.0 SP5 and higher, implementing both SAPHanaSrMultiTarget and susChkSrv hooks is recommended.
824823

825-
SusChkSrv extends the functionality of the main SAPHanaSR HA provider. It acts in the situation when HANA process hdbindexserver crashes. If a single process crashes typically HANA tries to restart it. Restarting the indexserver process can take a long time, during which the HANA database is not responsive.
824+
> [!NOTE]
825+
> SAPHanaSrMultiTarget HA provider replaces SAPHanaSR for HANA scale-out. SAPHanaSR was described in earlier version of this document.
826+
> See [SUSE blog post](https://www.suse.com/c/sap-hana-scale-out-multi-target-upgrade/) about changes with the new HANA HA hook.
826827

827-
With susChkSrv implemented, an immediate and configurable action is executed, instead of waiting on hdbindexserver process to restart on the same node. In HANA scale-out susChkSrv acts for every HANA VM independently. The configured action will kill HANA or fence the affected VM, which triggers a failover by SAPHanaSR in the configured timeout period.
828+
Provided steps for SAPHanaSrMultiTarget hook are for a new installation. Upgrading an existing environment from SAPHanaSR to SAPHanaSrMultiTarget provider requires several changes and are _NOT_ described in this document. If the existing environment uses no third site for disaster recovery and [HANA multi-target system replication](https://help.sap.com/docs/SAP_HANA_PLATFORM/4e9b18c116aa42fc84c7dbfd02111aba/ba457510958241889a459e606bbcf3d3.html) is not used, SAPHanaSR HA provider can remain in use.
828829

829-
> [!NOTE]
830-
> susChkSrv Python hook requires SAP HANA 2.0 SP5 and SAPHanaSR-ScaleOut version 0.184.1 or higher must be installed.
830+
SusChkSrv extends the functionality of the main SAPHanaSrMultiTarget HA provider. It acts in the situation when HANA process hdbindexserver crashes. If a single process crashes typically HANA tries to restart it. Restarting the indexserver process can take a long time, during which the HANA database isn't responsive. With susChkSrv implemented, an immediate and configurable action is executed, instead of waiting on hdbindexserver process to restart on the same node. In HANA scale-out susChkSrv acts for every HANA VM independently. The configured action will kill HANA or fence the affected VM, which triggers a failover in the configured timeout period.
831+
832+
SUSE SLES 15 SP1 or higher is required for operation of both HANA HA hooks. Following table shows other dependencies.
833+
834+
|SAP HANA HA hook | HANA version required | SAPHanaSR-ScaleOut required |
835+
|----------------------| ----------------------- | --------------------------- |
836+
| SAPHanaSrMultiTarget | HANA 2.0 SPS4 or higher | 0.180 or higher |
837+
| susChkSrv | HANA 2.0 SPS5 or higher | 0.184.1 or higher |
838+
839+
Steps to implement both hooks:
831840
832841
1. **[1,2]** Stop HANA on both system replication sites. Execute as <sid\>adm:
833842
834843
```bash
835844
sapcontrol -nr 03 -function StopSystem
836845
```
837846
838-
2. **[1,2]** Adjust `global.ini` on each cluster site. If the requirements for susChkSrv hook are not met, remove the entire block `[ha_dr_provider_suschksrv]` from below section.
839-
You can adjust the behavior of susChkSrv with parameter action_on_lost. Valid values are [ ignore | stop | kill | fence ].
847+
2. **[1,2]** Adjust `global.ini` on each cluster site. If the prerequisites for susChkSrv hook aren't met, entire block `[ha_dr_provider_suschksrv]` shouldn't be configured.
848+
You can adjust the behavior of susChkSrv with parameter action_on_lost. Valid values are `[ ignore | stop | kill | fence ]`.
840849
841850
```bash
842-
# add to global.ini
843-
[ha_dr_provider_SAPHanaSR]
844-
provider = SAPHanaSR
851+
# add to global.ini on both sites. Do not copy global.ini between sites.
852+
[ha_dr_provider_saphanasrmultitarget]
853+
provider = SAPHanaSrMultiTarget
845854
path = /usr/share/SAPHanaSR-ScaleOut
846855
execution_order = 1
847856
@@ -852,20 +861,19 @@ You can adjust the behavior of susChkSrv with parameter action_on_lost. Valid va
852861
action_on_lost = kill
853862
854863
[trace]
855-
ha_dr_saphanasr = info
864+
ha_dr_saphanasrmultitarget = info
856865
```
857866
858-
Configuration pointing to the standard location /usr/share/SAPHanaSR-ScaleOut brings a benefit, that the python hook code is automatically updated through OS or package updates and it gets used by HANA at next restart. With an optional, own path, such as /hana/shared/myHooks you can decouple OS updates from the used hook version.
867+
Default location of the HA hooks as deliveredy SUSE is /usr/share/SAPHanaSR-ScaleOut. Using the standard location brings a benefit, that the python hook code is automatically updated through OS or package updates and gets used by HANA at next restart. With an optional own path, such as /hana/shared/myHooks you can decouple OS updates from the used hook version.
859868
860-
3. **[AH]** The cluster requires sudoers configuration on the cluster node for <sid\>adm. In this example that is achieved by creating a new file. Execute the commands as `root` adapt the values of hn1/HN1 with correct SID.
869+
3. **[AH]** The cluster requires sudoers configuration on the cluster nodes for <sid\>adm. In this example that is achieved by creating a new file. Execute the commands as `root` adapt the values of hn1 with correct lowercase SID.
861870
862871
```bash
863872
cat << EOF > /etc/sudoers.d/20-saphana
864-
# SAPHanaSR-ScaleOut needs for srHook
865-
Cmnd_Alias SOK = /usr/sbin/crm_attribute -n hana_hn1_glob_srHook -v SOK -t crm_config -s SAPHanaSR
866-
Cmnd_Alias SFAIL = /usr/sbin/crm_attribute -n hana_hn1_glob_srHook -v SFAIL -t crm_config -s SAPHanaSR
867-
hn1adm ALL=(ALL) NOPASSWD: SOK, SFAIL
868-
hn1adm ALL=(ALL) NOPASSWD: /usr/sbin/SAPHanaSR-hookHelper --sid=HN1 --case=fenceMe
873+
# SAPHanaSR-ScaleOut needs for HA/DR hook scripts
874+
so1adm ALL=(ALL) NOPASSWD: /usr/sbin/crm_attribute -n hana_hn1_site_srHook_*
875+
so1adm ALL=(ALL) NOPASSWD: /usr/sbin/crm_attribute -n hana_hn1_gsh *
876+
so1adm ALL=(ALL) NOPASSWD: /usr/sbin/SAPHanaSR-hookHelper --sid=hn1 *
869877
EOF
870878
```
871879
@@ -875,22 +883,20 @@ Configuration pointing to the standard location /usr/share/SAPHanaSR-ScaleOut br
875883
sapcontrol -nr 03 -function StartSystem
876884
```
877885
878-
5. **[1]** Verify the hook installation. Execute as <sid\>adm on the active HANA system replication site.
886+
5. **[A]** Verify the hook installation is active on all cluster nodes. Execute as <sid\>adm.
879887
880888
```bash
881889
cdtrace
882-
awk '/ha_dr_SAPHanaSR.*crm_attribute/ \
883-
{ printf "%s %s %s %s\n",$2,$3,$5,$16 }' nameserver_*
890+
grep HADR.*load.*SAPHanaSrMultiTarget nameserver_*.trc | tail -3
884891
# Example output
885-
# 2021-03-31 01:02:42.695244 ha_dr_SAPHanaSR SFAIL
886-
# 2021-03-31 01:02:58.966856 ha_dr_SAPHanaSR SFAIL
887-
# 2021-03-31 01:03:04.453100 ha_dr_SAPHanaSR SFAIL
888-
# 2021-03-31 01:03:04.619768 ha_dr_SAPHanaSR SFAIL
889-
# 2021-03-31 01:03:04.743444 ha_dr_SAPHanaSR SFAIL
890-
# 2021-03-31 01:04:15.062181 ha_dr_SAPHanaSR SOK
892+
# nameserver_hana-s1-db1.31001.000.trc:[14162]{-1}[-1/-1] 2023-01-26 12:53:55.728027 i ha_dr_provider HADRProviderManager.cpp(00083) : loading HA/DR Provider 'SAPHanaSrMultiTarget' from /usr/share/SAPHanaSR-ScaleOut/
893+
grep SAPHanaSr.*init nameserver_*.trc | tail -3
894+
# Example output
895+
# nameserver_hana-s1-db1.31001.000.trc:[17636]{-1}[-1/-1] 2023-01-26 16:30:19.256705 i ha_dr_SAPHanaSrM SAPHanaSrMultiTarget.py(00080) : SAPHanaSrMultiTarget.init() CALLING CRM: <sudo /usr/sbin/crm_attribute -n hana_hn1_gsh -v 2.2 -l reboot> rc=0
896+
# nameserver_hana-s1-db1.31001.000.trc:[17636]{-1}[-1/-1] 2023-01-26 16:30:19.256739 i ha_dr_SAPHanaSrM SAPHanaSrMultiTarget.py(00081) : SAPHanaSrMultiTarget.init() Running srHookGeneration 2.2, see attribute hana_hn1_gsh too
891897
```
892898
893-
Verify the susChkSrv hook installation. Execute as <sid\>adm on all HANA VMs
899+
Verify the susChkSrv hook installation. Execute as <sid\>adm.
894900
```bash
895901
cdtrace
896902
egrep '(LOST:|STOP:|START:|DOWN:|init|load|fail)' nameserver_suschksrv.trc
@@ -970,26 +976,32 @@ Configuration pointing to the standard location /usr/share/SAPHanaSR-ScaleOut br
970976
sudo crm configure rsc_defaults resource-stickiness=1000
971977
sudo crm configure rsc_defaults migration-threshold=50
972978
```
973-
3. **[1]** verify the communication between the HOOK and the cluster
974-
```bash
975-
crm_attribute -G -n hana_hn1_glob_srHook
976-
# Expected result
977-
# crm_attribute -G -n hana_hn1_glob_srHook
978-
# scope=crm_config name=hana_hn1_glob_srHook value=SOK
979-
```
980979
981-
4. **[1]** Place the cluster out of maintenance mode. Make sure that the cluster status is ok and that all of the resources are started.
980+
3. **[1]** Place the cluster out of maintenance mode. Make sure that the cluster status is ok and that all of the resources are started.
982981
```bash
983982
# Cleanup any failed resources - the following command is example
984983
crm resource cleanup rsc_SAPHana_HN1_HDB03
985984

986985
# Place the cluster out of maintenance mode
987986
sudo crm configure property maintenance-mode=false
988987
```
988+
989+
4. **[1]** Verify the communication between the HANA HA hook and the cluster, showing status SOK for SID and both replication sites with status P(rimary) or S(econdary).
990+
```bash
991+
sudo /usr/sbin/SAPHanaSR-showAttr
992+
# Expected result
993+
# Global cib-time maintenance prim sec sync_state upd
994+
# ---------------------------------------------------------------------
995+
# HN1 Fri Jan 27 10:38:46 2023 false HANA_S1 - SOK ok
996+
#
997+
# Sites lpt lss mns srHook srr
998+
# -----------------------------------------------
999+
# HANA_S1 1674815869 4 hana-s1-db1 PRIM P
1000+
# HANA_S2 30 4 hana-s2-db1 SWAIT S
1001+
```
9891002
9901003
> [!NOTE]
9911004
> The timeouts in the above configuration are just examples and may need to be adapted to the specific HANA setup. For instance, you may need to increase the start timeout, if it takes longer to start the SAP HANA database.
992-
9931005
9941006
## Test SAP HANA failover
9951007

0 commit comments

Comments
 (0)