Skip to content

Commit eacb9e3

Browse files
Merge pull request #224386 from msftrobiro/sap-scaleout-suschksrv
Add HANA scale-out susChkSrv
2 parents efa890a + 35f01c6 commit eacb9e3

File tree

1 file changed

+61
-39
lines changed

1 file changed

+61
-39
lines changed

articles/virtual-machines/workloads/sap/sap-hana-high-availability-scale-out-hsr-suse.md

Lines changed: 61 additions & 39 deletions
Original file line numberDiff line numberDiff line change
@@ -305,17 +305,15 @@ Configure and prepare your OS by doing the following steps:
305305
> [!TIP]
306306
> Avoid setting net.ipv4.ip_local_port_range and net.ipv4.ip_local_reserved_ports explicitly in the sysctl configuration files to allow SAP Host Agent to manage the port ranges. For more details see SAP note [2382421](https://launchpad.support.sap.com/#/notes/2382421).
307307
308-
3. **[A]** SUSE delivers special resource agents for SAP HANA and by default agents for SAP HANA ScaleUp are installed. Uninstall the packages for ScaleUp, if installed and install the packages for scenario SAP HANAScaleOut. The step needs to be performed on all cluster VMs, including the majority maker.
308+
3. **[A]** SUSE delivers special resource agents for SAP HANA and by default agents for SAP HANA scale-up are installed. Uninstall the packages for scale-up, if installed and install the packages for scenario SAP HANA scale-out. The step needs to be performed on all cluster VMs, including the majority maker.
309309
310310
```bash
311-
# Uninstall ScaleUp packages and patterns
312-
zypper remove patterns-sap-hana
313-
zypper remove SAPHanaSR
314-
zypper remove SAPHanaSR-doc
315-
zypper remove yast2-sap-ha
316-
# Install the ScaleOut packages and patterns
317-
zypper in SAPHanaSR-ScaleOut SAPHanaSR-ScaleOut-doc
318-
zypper in -t pattern ha_sles
311+
# Uninstall scale-up packages and patterns
312+
sudo zypper remove patterns-sap-hana
313+
sudo zypper remove SAPHanaSR SAPHanaSR-doc yast2-sap-ha
314+
# Install the scale-out packages and patterns
315+
sudo zypper in SAPHanaSR-ScaleOut SAPHanaSR-ScaleOut-doc
316+
sudo zypper in -t pattern ha_sles
319317
```
320318
321319
4. **[AH]** Prepare the VMs - apply the recommended settings per SAP note [2205917] for SUSE Linux Enterprise Server for SAP Applications.
@@ -820,67 +818,91 @@ Create a dummy file system cluster resource, which will monitor and report failu
820818

821819
`on-fail=fence` attribute is also added to the monitor operation. With this option, if the monitor operation fails on a node, that node is immediately fenced.
822820

823-
## Create SAP HANA cluster resources
821+
## Implement HANA hooks SAPHanaSR and susChkSrv
824822

825-
1. **[1,2]** Install the HANA "system replication hook". The hook needs to be installed on one HANA DB node on each system replication site.
823+
This important step is to optimize the integration with the cluster and detection when a cluster failover is possible. It is highly recommended to configure the SAPHanaSR Python hook. For HANA 2.0 SP5 and above, implementing both SAPHanaSR and susChkSrv hook is recommended.
826824

827-
1. Prepare the hook as `root`
828-
```bash
829-
mkdir -p /hana/shared/myHooks
830-
cp /usr/share/SAPHanaSR-ScaleOut/SAPHanaSR.py /hana/shared/myHooks
831-
chown -R hn1adm:sapsys /hana/shared/myHooks
832-
```
825+
SusChkSrv extends the functionality of the main SAPHanaSR HA provider. It acts in the situation when HANA process hdbindexserver crashes. If a single process crashes typically HANA tries to restart it. Restarting the indexserver process can take a long time, during which the HANA database is not responsive.
833826

834-
2. Stop HANA on both system replication sites. Execute as <sid\>adm:
835-
```bash
836-
sapcontrol -nr 03 -function StopSystem
837-
```
827+
With susChkSrv implemented, an immediate and configurable action is executed, instead of waiting on hdbindexserver process to restart on the same node. In HANA scale-out susChkSrv acts for every HANA VM independently. The configured action will kill HANA or fence the affected VM, which triggers a failover by SAPHanaSR in the configured timeout period.
828+
829+
> [!NOTE]
830+
> susChkSrv Python hook requires SAP HANA 2.0 SP5 and SAPHanaSR-ScaleOut version 0.184.1 or higher must be installed.
831+
832+
1. **[1,2]** Stop HANA on both system replication sites. Execute as <sid\>adm:
833+
834+
```bash
835+
sapcontrol -nr 03 -function StopSystem
836+
```
837+
838+
2. **[1,2]** Adjust `global.ini` on each cluster site. If the requirements for susChkSrv hook are not met, remove the entire block `[ha_dr_provider_suschksrv]` from below section.
839+
You can adjust the behavior of susChkSrv with parameter action_on_lost. Valid values are [ ignore | stop | kill | fence ].
838840

839-
3. Adjust `global.ini`
840841
```bash
841842
# add to global.ini
842843
[ha_dr_provider_SAPHanaSR]
843844
provider = SAPHanaSR
844-
path = /hana/shared/myHooks
845+
path = /usr/share/SAPHanaSR-ScaleOut
845846
execution_order = 1
846847
848+
[ha_dr_provider_suschksrv]
849+
provider = susChkSrv
850+
path = /usr/share/SAPHanaSR-ScaleOut
851+
execution_order = 3
852+
action_on_lost = kill
853+
847854
[trace]
848855
ha_dr_saphanasr = info
849856
```
850857

851-
2. **[AH]** The cluster requires sudoers configuration on the cluster node for <sid\>adm. In this example that is achieved by creating a new file. Execute the commands as `root`.
858+
Configuration pointing to the standard location /usr/share/SAPHanaSR-ScaleOut brings a benefit, that the python hook code is automatically updated through OS or package updates and it gets used by HANA at next restart. With an optional, own path, such as /hana/shared/myHooks you can decouple OS updates from the used hook version.
859+
860+
3. **[AH]** The cluster requires sudoers configuration on the cluster node for <sid\>adm. In this example that is achieved by creating a new file. Execute the commands as `root` adapt the values of hn1/HN1 with correct SID.
861+
852862
```bash
853863
cat << EOF > /etc/sudoers.d/20-saphana
854864
# SAPHanaSR-ScaleOut needs for srHook
855865
Cmnd_Alias SOK = /usr/sbin/crm_attribute -n hana_hn1_glob_srHook -v SOK -t crm_config -s SAPHanaSR
856866
Cmnd_Alias SFAIL = /usr/sbin/crm_attribute -n hana_hn1_glob_srHook -v SFAIL -t crm_config -s SAPHanaSR
857867
hn1adm ALL=(ALL) NOPASSWD: SOK, SFAIL
868+
hn1adm ALL=(ALL) NOPASSWD: /usr/sbin/SAPHanaSR-hookHelper --sid=HN1 --case=fenceMe
858869
EOF
859870
```
860871
861-
3. **[1,2]** Start SAP HANA on both replication sites. Execute as <sid\>adm.
872+
4. **[1,2]** Start SAP HANA on both replication sites. Execute as <sid\>adm.
862873
863874
```bash
864875
sapcontrol -nr 03 -function StartSystem
865876
```
866877
867-
4. **[1]** Verify the hook installation. Execute as <sid\>adm on the active HANA system replication site.
878+
5. **[1]** Verify the hook installation. Execute as <sid\>adm on the active HANA system replication site.
868879
869880
```bash
870881
cdtrace
871-
awk '/ha_dr_SAPHanaSR.*crm_attribute/ \
872-
{ printf "%s %s %s %s\n",$2,$3,$5,$16 }' nameserver_*
873-
874-
# 2021-03-31 01:02:42.695244 ha_dr_SAPHanaSR SFAIL
875-
# 2021-03-31 01:02:58.966856 ha_dr_SAPHanaSR SFAIL
876-
# 2021-03-31 01:03:04.453100 ha_dr_SAPHanaSR SFAIL
877-
# 2021-03-31 01:03:04.619768 ha_dr_SAPHanaSR SFAIL
878-
# 2021-03-31 01:03:04.743444 ha_dr_SAPHanaSR SFAIL
879-
# 2021-03-31 01:04:15.062181 ha_dr_SAPHanaSR SOK
882+
awk '/ha_dr_SAPHanaSR.*crm_attribute/ \
883+
{ printf "%s %s %s %s\n",$2,$3,$5,$16 }' nameserver_*
884+
# Example output
885+
# 2021-03-31 01:02:42.695244 ha_dr_SAPHanaSR SFAIL
886+
# 2021-03-31 01:02:58.966856 ha_dr_SAPHanaSR SFAIL
887+
# 2021-03-31 01:03:04.453100 ha_dr_SAPHanaSR SFAIL
888+
# 2021-03-31 01:03:04.619768 ha_dr_SAPHanaSR SFAIL
889+
# 2021-03-31 01:03:04.743444 ha_dr_SAPHanaSR SFAIL
890+
# 2021-03-31 01:04:15.062181 ha_dr_SAPHanaSR SOK
891+
```
880892
893+
Verify the susChkSrv hook installation. Execute as <sid\>adm on all HANA VMs
894+
```bash
895+
cdtrace
896+
egrep '(LOST:|STOP:|START:|DOWN:|init|load|fail)' nameserver_suschksrv.trc
897+
# Example output
898+
# 2023-01-19 08:23:10.581529 [1674116590-10005] susChkSrv.init() version 0.7.7, parameter info: action_on_lost=fence stop_timeout=20 kill_signal=9
899+
# 2023-01-19 08:23:31.553566 [1674116611-14022] START: indexserver event looks like graceful tenant start
900+
# 2023-01-19 08:23:52.834813 [1674116632-15235] START: indexserver event looks like graceful tenant start (indexserver started)
881901
```
882902
883-
5. **[1]** Create the HANA cluster resources. Execute the following commands as `root`.
903+
## Create SAP HANA cluster resources
904+
905+
1. **[1]** Create the HANA cluster resources. Execute the following commands as `root`.
884906
1. Make sure the cluster is already maintenance mode.
885907
886908
2. Next, create the HANA Topology resource.
@@ -943,20 +965,20 @@ Create a dummy file system cluster resource, which will monitor and report failu
943965
sudo crm configure location loc_SAPHanaTop_not_on_majority_maker cln_SAPHanaTopology_HN1_HDB03 -inf: hana-s-mm
944966
```
945967
946-
6. **[1]** Configure additional cluster properties
968+
2. **[1]** Configure additional cluster properties
947969
```bash
948970
sudo crm configure rsc_defaults resource-stickiness=1000
949971
sudo crm configure rsc_defaults migration-threshold=50
950972
```
951-
7. **[1]** verify the communication between the HOOK and the cluster
973+
3. **[1]** verify the communication between the HOOK and the cluster
952974
```bash
953975
crm_attribute -G -n hana_hn1_glob_srHook
954976
# Expected result
955977
# crm_attribute -G -n hana_hn1_glob_srHook
956978
# scope=crm_config name=hana_hn1_glob_srHook value=SOK
957979
```
958980
959-
8. **[1]** Place the cluster out of maintenance mode. Make sure that the cluster status is ok and that all of the resources are started.
981+
4. **[1]** Place the cluster out of maintenance mode. Make sure that the cluster status is ok and that all of the resources are started.
960982
```bash
961983
# Cleanup any failed resources - the following command is example
962984
crm resource cleanup rsc_SAPHana_HN1_HDB03

0 commit comments

Comments
 (0)