Skip to content

Commit 6b21fbb

Browse files
committed
HANA scale-out multi-target
1 parent f59b438 commit 6b21fbb

File tree

2 files changed

+117
-46
lines changed

2 files changed

+117
-46
lines changed
Loading

articles/virtual-machines/workloads/sap/sap-hana-high-availability-scale-out-hsr-suse.md

Lines changed: 117 additions & 46 deletions
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@ ms.service: virtual-machines-sap
99
ms.topic: article
1010
ms.tgt_pltfrm: vm-windows
1111
ms.workload: infrastructure-services
12-
ms.date: 12/07/2022
12+
ms.date: 01/27/2023
1313
ms.author: radeltch
1414

1515
---
@@ -191,7 +191,6 @@ For the configuration presented in this document, deploy seven virtual machines:
191191
1. Select the virtual machines of the HANA cluster (the NICs for the `client` subnet).
192192
1. Select **Add**.
193193
2. Select **Save**.
194-
195194
196195
1. Next, create a health probe:
197196
@@ -216,7 +215,6 @@ For the configuration presented in this document, deploy seven virtual machines:
216215
> [!Note]
217216
> When VMs without public IP addresses are placed in the backend pool of internal (no public IP address) Standard Azure load balancer, there will be no outbound internet connectivity, unless additional configuration is performed to allow routing to public end points. For details on how to achieve outbound connectivity see [Public endpoint connectivity for Virtual Machines using Azure Standard Load Balancer in SAP high-availability scenarios](./high-availability-guide-standard-load-balancer-outbound-connections.md).
218217
219-
220218
> [!IMPORTANT]
221219
> Do not enable TCP timestamps on Azure VMs placed behind Azure Load Balancer. Enabling TCP timestamps will cause the health probes to fail. Set parameter **net.ipv4.tcp_timestamps** to **0**. For details see [Load Balancer health probes](../../../load-balancer/load-balancer-custom-probe-overview.md).
222220
> See also SAP note [2382421](https://launchpad.support.sap.com/#/notes/2382421).
@@ -230,7 +228,6 @@ The next sections describe the steps to deploy NFS - you'll need to select only
230228
> [!TIP]
231229
> You chose to deploy `/hana/shared` on [NFS share on Azure Files](../../../storage/files/files-nfs-protocol.md) or [NFS volume on Azure NetApp Files](../../../azure-netapp-files/azure-netapp-files-introduction.md).
232230
233-
234231
#### Deploy the Azure NetApp Files infrastructure
235232
236233
Deploy ANF volumes for the `/hana/shared` file system. You will need a separate `/hana/shared` volume for each HANA system replication site. For more information, see [Set up the Azure NetApp Files infrastructure](./sap-hana-scale-out-standby-netapp-files-suse.md#set-up-the-azure-netapp-files-infrastructure).
@@ -240,7 +237,6 @@ In this example, the following Azure NetApp Files volumes were used:
240237
* volume **HN1**-shared-s1 (nfs://10.23.1.7/**HN1**-shared-s1)
241238
* volume **HN1**-shared-s2 (nfs://10.23.1.7/**HN1**-shared-s2)
242239
243-
244240
#### Deploy the NFS on Azure Files infrastructure
245241
246242
Deploy Azure Files NFS shares for the `/hana/shared` file system. You will need a separate `/hana/shared` Azure Files NFS share for each HANA system replication site. For more information, see [How to create an NFS share](../../../storage/files/storage-files-how-to-create-nfs-shares.md?tabs=azure-portal).
@@ -253,9 +249,9 @@ In this example, the following Azure Files NFS shares were used:
253249
## Operating system configuration and preparation
254250
255251
The instructions in the next sections are prefixed with one of the following abbreviations:
256-
* **[A]**: Applicable to all nodes
252+
* **[A]**: Applicable to all nodes, including majority maker
257253
* **[AH]**: Applicable to all HANA DB nodes
258-
* **[M]**: Applicable to the majority maker node
254+
* **[M]**: Applicable to the majority maker node only
259255
* **[AH1]**: Applicable to all HANA DB nodes on SITE 1
260256
* **[AH2]**: Applicable to all HANA DB nodes on SITE 2
261257
* **[1]**: Applicable only to HANA DB node 1, SITE 1
@@ -307,6 +303,9 @@ Configure and prepare your OS by doing the following steps:
307303
308304
3. **[A]** SUSE delivers special resource agents for SAP HANA and by default agents for SAP HANA scale-up are installed. Uninstall the packages for scale-up, if installed and install the packages for scenario SAP HANA scale-out. The step needs to be performed on all cluster VMs, including the majority maker.
309305
306+
> [!NOTE]
307+
> SAPHanaSR-ScaleOut version 0.181 or higher must be installed.
308+
310309
```bash
311310
# Uninstall scale-up packages and patterns
312311
sudo zypper remove patterns-sap-hana
@@ -326,7 +325,7 @@ You chose to deploy the SAP shared directories on [NFS share on Azure Files](../
326325
327326
In this example, the shared HANA file systems are deployed on Azure NetApp Files and mounted over NFSv4.1. Follow the steps in this section, only if you are using NFS on Azure NetApp Files.
328327
329-
1. **[A]** Prepare the OS for running SAP HANA on NetApp Systems with NFS, as described in SAP note [3024346 - Linux Kernel Settings for NetApp NFS](https://launchpad.support.sap.com/#/notes/3024346). Create configuration file */etc/sysctl.d/91-NetApp-HANA.conf* for the NetApp configuration settings.
328+
1. **[AH]** Prepare the OS for running SAP HANA on NetApp Systems with NFS, as described in SAP note [3024346 - Linux Kernel Settings for NetApp NFS](https://launchpad.support.sap.com/#/notes/3024346). Create configuration file */etc/sysctl.d/91-NetApp-HANA.conf* for the NetApp configuration settings.
330329
331330
<pre><code>
332331
vi /etc/sysctl.d/91-NetApp-HANA.conf
@@ -343,7 +342,7 @@ In this example, the shared HANA file systems are deployed on Azure NetApp Files
343342
net.ipv4.tcp_sack = 1
344343
</code></pre>
345344
346-
2. **[A]** Adjust the sunrpc settings, as recommended in SAP note [3024346 - Linux Kernel Settings for NetApp NFS](https://launchpad.support.sap.com/#/notes/3024346).
345+
2. **[AH]** Adjust the sunrpc settings, as recommended in SAP note [3024346 - Linux Kernel Settings for NetApp NFS](https://launchpad.support.sap.com/#/notes/3024346).
347346
348347
<pre><code>
349348
vi /etc/modprobe.d/sunrpc.conf
@@ -818,31 +817,38 @@ Create a dummy file system cluster resource, which will monitor and report failu
818817

819818
`on-fail=fence` attribute is also added to the monitor operation. With this option, if the monitor operation fails on a node, that node is immediately fenced.
820819

821-
## Implement HANA hooks SAPHanaSR and susChkSrv
820+
## Implement HANA HA hooks SAPHanaSrMultiTarget and susChkSrv
822821

823-
This important step is to optimize the integration with the cluster and detection when a cluster failover is possible. It is highly recommended to configure the SAPHanaSR Python hook. For HANA 2.0 SP5 and above, implementing both SAPHanaSR and susChkSrv hook is recommended.
822+
This important step is to optimize the integration with the cluster and detection when a cluster failover is possible. It is highly recommended to configure SAPHanaSrMultiTarget Python hook. For HANA 2.0 SP5 and above, implementing both SAPHanaSrMultiTarget and susChkSrv hooks is recommended.
824823

825-
SusChkSrv extends the functionality of the main SAPHanaSR HA provider. It acts in the situation when HANA process hdbindexserver crashes. If a single process crashes typically HANA tries to restart it. Restarting the indexserver process can take a long time, during which the HANA database is not responsive.
824+
> [!NOTE]
825+
> SAPHanaSrMultiTarget HA provider replaces SAPHanaSR for HANA scale-out. SAPHanaSR was described in earlier version of this document.
826+
> See [SUSE blog post](https://www.suse.com/c/sap-hana-scale-out-multi-target-upgrade/) about changes with the new HANA HA hook.
827+
> This document provides steps for a new installation with the new provider. Upgrading an existing environment from SAPHanaSR to SAPHanaSrMultiTarget provider requires several changes and are _NOT_ described in this document. If the existing environment uses no third site for disaster recovery and [HANA multi-target system replication](https://help.sap.com/docs/SAP_HANA_PLATFORM/4e9b18c116aa42fc84c7dbfd02111aba/ba457510958241889a459e606bbcf3d3.html) is not used, SAPHanaSR HA provider can remain in use.
826828

827-
With susChkSrv implemented, an immediate and configurable action is executed, instead of waiting on hdbindexserver process to restart on the same node. In HANA scale-out susChkSrv acts for every HANA VM independently. The configured action will kill HANA or fence the affected VM, which triggers a failover by SAPHanaSR in the configured timeout period.
829+
SusChkSrv extends the functionality of the main SAPHanaSrMultiTarget HA provider. It acts in the situation when HANA process hdbindexserver crashes. If a single process crashes typically HANA tries to restart it. Restarting the indexserver process can take a long time, during which the HANA database is not responsive. With susChkSrv implemented, an immediate and configurable action is executed, instead of waiting on hdbindexserver process to restart on the same node. In HANA scale-out susChkSrv acts for every HANA VM independently. The configured action will kill HANA or fence the affected VM, which triggers a failover by SAPHanaSR in the configured timeout period.
828830

829-
> [!NOTE]
830-
> susChkSrv Python hook requires SAP HANA 2.0 SP5 and SAPHanaSR-ScaleOut version 0.184.1 or higher must be installed.
831+
SUSE SLES 15 SP1 or higher is required for operation of both HANA HA hooks. Below table shows other dependencies.
832+
833+
|SAP HANA HA hook | HANA version required | SAPHanaSR-ScaleOut required |
834+
|----------------------| ----------------------- | --------------------------- |
835+
| SAPHanaSrMultiTarget | HANA 2.0 SPS4 or higher | 0.181 or higher |
836+
| susChkSrv | HANA 2.0 SPS5 or higher | 0.184.1 or higher |
831837

832838
1. **[1,2]** Stop HANA on both system replication sites. Execute as <sid\>adm:
833839

834840
```bash
835841
sapcontrol -nr 03 -function StopSystem
836842
```
837843

838-
2. **[1,2]** Adjust `global.ini` on each cluster site. If the requirements for susChkSrv hook are not met, remove the entire block `[ha_dr_provider_suschksrv]` from below section.
844+
2. **[1,2]** Adjust `global.ini` on each cluster site. If the prerequisites for susChkSrv hook are not met, remove the entire block `[ha_dr_provider_suschksrv]` from below section.
839845
You can adjust the behavior of susChkSrv with parameter action_on_lost. Valid values are [ ignore | stop | kill | fence ].
840846

841847
```bash
842848
# add to global.ini
843-
[ha_dr_provider_SAPHanaSR]
844-
provider = SAPHanaSR
845-
path = /usr/share/SAPHanaSR-ScaleOut
849+
[ha_dr_provider_saphanasrmultitarget]
850+
provider = SAPHanaSrMultiTarget
851+
path = /usr/share/SAPHanaSR-ScaleOut/
846852
execution_order = 1
847853
848854
[ha_dr_provider_suschksrv]
@@ -852,21 +858,21 @@ You can adjust the behavior of susChkSrv with parameter action_on_lost. Valid va
852858
action_on_lost = kill
853859
854860
[trace]
855-
ha_dr_saphanasr = info
861+
ha_dr_saphanasrmultitarget = info
856862
```
857863

858-
Configuration pointing to the standard location /usr/share/SAPHanaSR-ScaleOut brings a benefit, that the python hook code is automatically updated through OS or package updates and it gets used by HANA at next restart. With an optional, own path, such as /hana/shared/myHooks you can decouple OS updates from the used hook version.
864+
Configuration pointing to the standard location /usr/share/SAPHanaSR-ScaleOut brings a benefit, that the python hook code is automatically updated through OS or package updates and it gets used by HANA at next restart. With an optional own path, such as /hana/shared/myHooks you can decouple OS updates from the used hook version.
859865

860-
3. **[AH]** The cluster requires sudoers configuration on the cluster node for <sid\>adm. In this example that is achieved by creating a new file. Execute the commands as `root` adapt the values of hn1/HN1 with correct SID.
866+
3. **[AH]** The cluster requires sudoers configuration on the cluster nodes for <sid\>adm. In this example that is achieved by creating a new file. Execute the commands as `root` adapt the values of hn1 with correct lowercase SID.
861867

862868
```bash
863869
cat << EOF > /etc/sudoers.d/20-saphana
864-
# SAPHanaSR-ScaleOut needs for srHook
865-
Cmnd_Alias SOK = /usr/sbin/crm_attribute -n hana_hn1_glob_srHook -v SOK -t crm_config -s SAPHanaSR
866-
Cmnd_Alias SFAIL = /usr/sbin/crm_attribute -n hana_hn1_glob_srHook -v SFAIL -t crm_config -s SAPHanaSR
867-
hn1adm ALL=(ALL) NOPASSWD: SOK, SFAIL
868-
hn1adm ALL=(ALL) NOPASSWD: /usr/sbin/SAPHanaSR-hookHelper --sid=HN1 --case=fenceMe
870+
# SAPHanaSR-ScaleOut needs for HA/DR hook scripts
871+
so1adm ALL=(ALL) NOPASSWD: /usr/sbin/crm_attribute -n hana_hn1_site_srHook_*
872+
so1adm ALL=(ALL) NOPASSWD: /usr/sbin/crm_attribute -n hana_hn1_gsh *
873+
so1adm ALL=(ALL) NOPASSWD: /usr/sbin/SAPHanaSR-hookHelper --sid=hn1 *
869874
EOF
875+
870876
```
871877
872878
4. **[1,2]** Start SAP HANA on both replication sites. Execute as <sid\>adm.
@@ -875,22 +881,20 @@ Configuration pointing to the standard location /usr/share/SAPHanaSR-ScaleOut br
875881
sapcontrol -nr 03 -function StartSystem
876882
```
877883
878-
5. **[1]** Verify the hook installation. Execute as <sid\>adm on the active HANA system replication site.
884+
5. **[A]** Verify the hook installation is active on all cluster nodes. Execute as <sid\>adm.
879885
880886
```bash
881887
cdtrace
882-
awk '/ha_dr_SAPHanaSR.*crm_attribute/ \
883-
{ printf "%s %s %s %s\n",$2,$3,$5,$16 }' nameserver_*
888+
grep HADR.*load.*SAPHanaSrMultiTarget nameserver_*.trc | tail -3
884889
# Example output
885-
# 2021-03-31 01:02:42.695244 ha_dr_SAPHanaSR SFAIL
886-
# 2021-03-31 01:02:58.966856 ha_dr_SAPHanaSR SFAIL
887-
# 2021-03-31 01:03:04.453100 ha_dr_SAPHanaSR SFAIL
888-
# 2021-03-31 01:03:04.619768 ha_dr_SAPHanaSR SFAIL
889-
# 2021-03-31 01:03:04.743444 ha_dr_SAPHanaSR SFAIL
890-
# 2021-03-31 01:04:15.062181 ha_dr_SAPHanaSR SOK
890+
# nameserver_hana-s1-db1.31001.000.trc:[14162]{-1}[-1/-1] 2023-01-26 12:53:55.728027 i ha_dr_provider HADRProviderManager.cpp(00083) : loading HA/DR Provider 'SAPHanaSrMultiTarget' from /usr/share/SAPHanaSR-ScaleOut/
891+
grep SAPHanaSr.*init nameserver_*.trc | tail -3
892+
# Example output
893+
# nameserver_hana-s1-db1.31001.000.trc:[17636]{-1}[-1/-1] 2023-01-26 16:30:19.256705 i ha_dr_SAPHanaSrM SAPHanaSrMultiTarget.py(00080) : SAPHanaSrMultiTarget.init() CALLING CRM: <sudo /usr/sbin/crm_attribute -n hana_hn1_gsh -v 2.2 -l reboot> rc=0
894+
# nameserver_hana-s1-db1.31001.000.trc:[17636]{-1}[-1/-1] 2023-01-26 16:30:19.256739 i ha_dr_SAPHanaSrM SAPHanaSrMultiTarget.py(00081) : SAPHanaSrMultiTarget.init() Running srHookGeneration 2.2, see attribute hana_hn1_gsh too
891895
```
892896
893-
Verify the susChkSrv hook installation. Execute as <sid\>adm on all HANA VMs
897+
Verify the susChkSrv hook installation. Execute as <sid\>adm.
894898
```bash
895899
cdtrace
896900
egrep '(LOST:|STOP:|START:|DOWN:|init|load|fail)' nameserver_suschksrv.trc
@@ -970,26 +974,93 @@ Configuration pointing to the standard location /usr/share/SAPHanaSR-ScaleOut br
970974
sudo crm configure rsc_defaults resource-stickiness=1000
971975
sudo crm configure rsc_defaults migration-threshold=50
972976
```
973-
3. **[1]** verify the communication between the HOOK and the cluster
974-
```bash
975-
crm_attribute -G -n hana_hn1_glob_srHook
976-
# Expected result
977-
# crm_attribute -G -n hana_hn1_glob_srHook
978-
# scope=crm_config name=hana_hn1_glob_srHook value=SOK
979-
```
980977
981-
4. **[1]** Place the cluster out of maintenance mode. Make sure that the cluster status is ok and that all of the resources are started.
978+
3. **[1]** Place the cluster out of maintenance mode. Make sure that the cluster status is ok and that all of the resources are started.
982979
```bash
983980
# Cleanup any failed resources - the following command is example
984981
crm resource cleanup rsc_SAPHana_HN1_HDB03
985982
986983
# Place the cluster out of maintenance mode
987984
sudo crm configure property maintenance-mode=false
988985
```
986+
987+
4. **[1]** Verify the communication between the HANA HA hook and the cluster, showing status SOK for SID and both replication sites with status P(rimary) or S(econdary).
988+
```bash
989+
sudo /usr/sbin/SAPHanaSR-showAttr
990+
# Expected result
991+
# Global cib-time maintenance prim sec sync_state upd
992+
# ---------------------------------------------------------------------
993+
# HN1 Fri Jan 27 10:38:46 2023 false HANA_S1 - SOK ok
994+
#
995+
# Sites lpt lss mns srHook srr
996+
# -----------------------------------------------
997+
# HANA_S1 1674815869 4 hana-s1-db1 PRIM P
998+
# HANA_S2 30 4 hana-s2-db1 SWAIT S
999+
```
9891000
9901001
> [!NOTE]
9911002
> The timeouts in the above configuration are just examples and may need to be adapted to the specific HANA setup. For instance, you may need to increase the start timeout, if it takes longer to start the SAP HANA database.
992-
1003+
1004+
1005+
## (Optional) Enabling HANA multi-target system replication for DR purposes
1006+
1007+
<details>
1008+
<summary>Expand</summary>
1009+
1010+
With new SAP HANA HA provider SAPHanaSrMultiTarget, a third system replication site as disaster recovery (DR) can be used with a HANA scale-out system. The cluster environment is aware of a multi-target DR setup. Failure of the third site will not trigger any cluster action. Cluster is detects the replication status of connected sites and the monitored attributed can change between SOK and SFAIL. Maximum of one system replication to an HANA database outside the linux cluster is supported.
1011+
1012+
> [!NOTE]
1013+
Example of a multi-target system replication system. See [SAP documentation](https://help.sap.com/docs/SAP_HANA_PLATFORM/4e9b18c116aa42fc84c7dbfd02111aba/2e6c71ab55f147e19b832565311a8e4e.html) for further details.
1014+
[./media/sap-hana-high-availability/sap-hana-high-availability-scale-out-hsr-suse-multi-target.png]
1015+
1016+
1. Deploy Azure resources for the third site. Depending on your requirements, a different Azure region is used for disaster recovery purposes.
1017+
Steps required for the HANA scale-out on third site are same as described in this document for SITE1 and SITE2, with the following exceptions:
1018+
- No load balancer for third site and no integration with cluster load balancer for VMs of third site.
1019+
- OS packages SAPHanaSR-ScaleOut, SAPHanaSR-ScaleOut-doc and OS package pattern ha_sles are _NOT_ installed on third site VMs.
1020+
- No majority maker VM for third site, since there is no cluster integration.
1021+
- NFS volume /hana/shared for third site exclusive use must be created.
1022+
- No integration into the cluster for VMs or HANA resources of the third site.
1023+
- No SAP HANA HA hooks setup for third site.
1024+
1025+
2. With SAP HANA scale-out on third site operational, register the third site with the primary site.
1026+
In the example name SITE-DR is used for third site.
1027+
```bash
1028+
# Execute on the third site
1029+
su - hn1adm
1030+
# Make sure HANA is not running on the third site. If it is started, stop HANA
1031+
sapcontrol -nr 03 -function StopSystem
1032+
sapcontrol -nr 03 -function WaitforStopped 600 10
1033+
# Register the HANA third site
1034+
hdbnsutil -sr_register --name=HANA_DR --remoteHost=hana-s1-db1 --remoteInstance=03 --replicationMode=async
1035+
```
1036+
1037+
3. Verify HANA system replication shows both secondary and third site.
1038+
```bash
1039+
# Verify HANA HSR is in sync, execute on primary
1040+
sudo su - hn1adm -c "python /usr/sap/HN1/HDB03/exe/python_support/systemReplicationStatus.py"
1041+
```
1042+
1043+
4. Check the SAPHanaSR attribute for third site. SITE-DR should show up with status SOK in the sites section.
1044+
```bash
1045+
# Check SAPHanaSR attribute
1046+
sudo SAPHanaSR-showAttr
1047+
# Expected result
1048+
# Global cib-time maintenance prim sec sync_state upd
1049+
# ---------------------------------------------------------------------
1050+
# HN1 Fri Jan 27 10:38:46 2023 false HANA_S1 - SOK ok
1051+
#
1052+
# Sites lpt lss mns srHook srr
1053+
# ------------------------------------------------
1054+
# SITE-DR SOK
1055+
# HANA_S1 1674815869 4 hana-s1-db1 PRIM P
1056+
# HANA_S2 30 4 hana-s2-db1 SWAIT S
1057+
```
1058+
1059+
Failure of the third site will not trigger any cluster action. Cluster is detects the replication status of connected sites and the monitored attributed can change between SOK and SFAIL.
1060+
1061+
If cluster parameter AUTOMATED_REGISTER="true" is set in the cluster after conclusion of testing, HANA parameter `register_secondaries_on_takeover = true` can be configured in `[system_replication]` block of global.ini on the two SAP HANA sites in the Linux cluster. Such configuration would re-register the third site after a takeover between the first two sites to keep a multi-target setup.
1062+
1063+
</details>
9931064
9941065
## Test SAP HANA failover
9951066

0 commit comments

Comments
 (0)