Skip to content

Commit 0573840

Browse files
inital commit for ascs tests
1 parent d4a748d commit 0573840

31 files changed

+2386
-94
lines changed

docs/SCS_HIGH_AVAILABILITY.md

Lines changed: 9 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -6,4 +6,12 @@
66
|------------------------------|---------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----------------------------------------------------------------------------------------------------------------|
77
| HA Parameters Validation | Configuration | The HA parameter validation test validates HA configuration including Corosync settings, Pacemaker resources, SBD device configuration, and SCS system replication setup. | [ha-config.yml](../src/roles/ha_scs/tasks/ha-config.yml) |
88
| Resource Migration | Failover | The Resource Migration test validates planned failover scenarios by controlling resource movement between SCS nodes, ensuring proper role changes and data synchronization. | [ascs-migration.yml](../src/roles/ha_scs/tasks/ascs-migration.yml) |
9-
| ASCS Node Crash | Network | The ASCS Node Crash test simulates cluster behavior when the ASCS node crashes. It simulates an ASCS node failure by forcefully terminating the process, then verifies automatic failover to the ERS node, monitors system replication status, and confirms service recovery without data loss. | [ascs-node-crash.yml](../src/roles/ha_scs/tasks/ascs-node-crash.yml) |
9+
| ASCS Node Crash | Failover | The ASCS Node Crash test simulates cluster behavior when the ASCS node crashes. It simulates an ASCS node failure by forcefully terminating the process, then verifies automatic failover to the ERS node, monitors system replication status, and confirms service recovery without data loss. | [ascs-node-crash.yml](../src/roles/ha_scs/tasks/ascs-node-crash.yml) |
10+
| Block Network Communication | Network | The Block Network test validates cluster behavior during network partition scenarios by implementing iptables rules to block communication between ASCS and ERS nodes. It verifies split-brain prevention mechanisms, validates proper failover execution when nodes become isolated, and ensures cluster stability and data consistency after network connectivity is restored. | [block-network.yml](../src/roles/ha_scs/tasks/block-network.yml) |
11+
| Kill Message Server Process | Process | The Message Server Process Kill test simulates failure of the message server process on the ASCS node by forcefully terminating it using the kill -9 signal. It verifies proper cluster reaction, automatic failover to the ERS node, and ensures service continuity after the process failure. | [kill-message-server.yml](../src/roles/ha_scs/tasks/kill-message-server.yml) |
12+
| Kill Enqueue Server Process | Process | The Enqueue Server Process Kill test simulates failure of the enqueue server process on the ASCS node by forcefully terminating it using the kill -9 signal. It validates proper cluster behavior, automatic failover execution, and confirms that the enqueue table is properly replicated to preserve the lock state. | [kill-enqueue-server.yml](../src/roles/ha_scs/tasks/kill-enqueue-server.yml) |
13+
| Kill Enqueue Replication Server Process | Process | The Enqueue Replication Server Process Kill test simulates failure of the replication server process on the ERS node by forcefully terminating it using the kill -9 signal. This test handles both ENSA1 and ENSA2 architectures. It validates the automatic restart of the process and ensures continued replication of the enqueue table. | [kill-enqueue-replication.yml](../src/roles/ha_scs/tasks/kill-enqueue-replication.yml) |
14+
| Kill sapstartsrv Process for ASCS | Process | The sapstartsrv Process Kill test simulates failure of the SAP Start Service for the ASCS instance by forcefully terminating it using the kill -9 signal. It validates proper cluster reaction, automatic failover to the ERS node, and verifies service restoration after the process failure. | [kill-sapstartsrv.yml](../src/roles/ha_scs/tasks/kill-sapstartsrv.yml) |
15+
| Manual Restart of ASCS Instance | Control | The Manual Restart test validates cluster behavior when the ASCS instance is manually stopped using sapcontrol. It verifies proper cluster reaction to a controlled instance shutdown, ensures automatic failover to the ERS node, and confirms service continuity throughout the operation. | [manual-restart.yml](../src/roles/ha_scs/tasks/manual-restart.yml) |
16+
| HAFailoverToNode Test | Control | The HAFailoverToNode test validates SAP's built-in high availability functionality by using the sapcontrol command to trigger a controlled failover. It executes 'HAFailoverToNode' as the SAP administrator user, which initiates a clean migration of the ASCS instance to another node. | [ha-failover-to-node.yml](../src/roles/ha_scs/tasks/ha-failover-to-node.yml) |
17+
| SAPControl Config Validation | Configuration | The SAPControl Config Validation test runs multiple sapcontrol commands to validate the SCS configuration. It executes commands like HAGetFailoverConfig, HACheckFailoverConfig, and HACheckConfig, capturing their outputs and statuses to ensure proper configuration and functionality. | [sapcontrol-config.yml](../src/roles/ha_scs/tasks/sapcontrol-config.yml) |

src/module_utils/get_cluster_status.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -86,7 +86,7 @@ def _validate_cluster_basic_status(self, cluster_status_xml: ET.Element):
8686
self.result["message"] = f"Node {node.attrib['name']} is not online"
8787
self.log(logging.WARNING, self.result["message"])
8888

89-
def _process_node_attributes(self, node_attributes: ET.Element) -> Dict[str, Any]:
89+
def _process_node_attributes(self, cluster_status_xml: ET.Element) -> Dict[str, Any]:
9090
"""
9191
Abstract method to process node attributes.
9292
@@ -115,7 +115,7 @@ def run(self) -> Dict[str, str]:
115115
self.log(logging.INFO, "Cluster status retrieved")
116116

117117
self._validate_cluster_basic_status(cluster_status_xml)
118-
self._process_node_attributes(cluster_status_xml.find("node_attributes"))
118+
self._process_node_attributes(cluster_status_xml=cluster_status_xml)
119119

120120
if not self._is_cluster_stable():
121121
self.result["message"] = "Pacemaker cluster isn't stable"

src/modules/get_cluster_status_db.py

Lines changed: 4 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -155,12 +155,12 @@ def _get_automation_register(self) -> None:
155155
except Exception:
156156
self.result["AUTOMATED_REGISTER"] = "unknown"
157157

158-
def _process_node_attributes(self, node_attributes: ET.Element) -> Dict[str, Any]:
158+
def _process_node_attributes(self, cluster_status_xml: ET.Element) -> Dict[str, Any]:
159159
"""
160160
Processes node attributes and identifies primary and secondary nodes.
161161
162-
:param node_attributes: XML element containing node attributes.
163-
:type node_attributes: ET.Element
162+
:param cluster_status_xml: XML element containing node attributes.
163+
:type cluster_status_xml: ET.Element
164164
:return: Dictionary with primary and secondary node information.
165165
:rtype: Dict[str, Any]
166166
"""
@@ -172,7 +172,7 @@ def _process_node_attributes(self, node_attributes: ET.Element) -> Dict[str, Any
172172
"replication_mode": "",
173173
"primary_site_name": "",
174174
}
175-
175+
node_attributes = cluster_status_xml.find("node_attributes")
176176
attribute_map = {
177177
f"hana_{self.database_sid}_op_mode": "operation_mode",
178178
f"hana_{self.database_sid}_srmode": "replication_mode",
@@ -212,7 +212,6 @@ def _process_node_attributes(self, node_attributes: ET.Element) -> Dict[str, Any
212212
result["secondary_node"] = node_name
213213
result["cluster_status"]["secondary"] = node_attributes_dict
214214

215-
# Update instance attributes
216215
self.result.update(result)
217216
return result
218217

src/modules/get_cluster_status_scs.py

Lines changed: 54 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,7 @@
55
Python script to get and validate the status of an SCS cluster.
66
"""
77

8+
import logging
89
import xml.etree.ElementTree as ET
910
from typing import Dict, Any
1011
from ansible.module_utils.basic import AnsibleModule
@@ -91,49 +92,85 @@ class SCSClusterStatusChecker(BaseClusterStatusChecker):
9192
Class to check the status of a pacemaker cluster in an SAP SCS environment.
9293
"""
9394

94-
def __init__(self, sap_sid: str, ansible_os_family: str = ""):
95+
def __init__(
96+
self,
97+
sap_sid: str,
98+
ansible_os_family: str = "",
99+
ascs_instance_number: str = "00",
100+
ers_instance_number: str = "01",
101+
):
95102
super().__init__(ansible_os_family)
96103
self.sap_sid = sap_sid
104+
self.ascs_instance_number = ascs_instance_number
105+
self.ers_instance_number = ers_instance_number
97106
self.result.update(
98107
{
99108
"ascs_node": "",
100109
"ers_node": "",
101110
}
102111
)
103112

104-
def _process_node_attributes(self, node_attributes: ET.Element) -> Dict[str, Any]:
113+
def _process_node_attributes(self, cluster_status_xml: ET.Element) -> Dict[str, Any]:
105114
"""
106115
Processes node attributes and identifies ASCS and ERS nodes.
107116
108-
:param node_attributes: XML element containing node attributes.
109-
:type node_attributes: ET.Element
117+
:param cluster_status_xml: XML element containing node attributes.
118+
:type cluster_status_xml: ET.Element
110119
:return: Dictionary with ASCS and ERS node information.
111120
:rtype: Dict[str, Any]
112121
"""
113-
all_nodes = [node.attrib.get("name") for node in node_attributes]
122+
resources = cluster_status_xml.find("resources")
123+
node_attributes = cluster_status_xml.find("node_attributes")
124+
ascs_resource_id = f"rsc_sap_{self.sap_sid.upper()}_ASCS{self.ascs_instance_number}"
125+
ers_resource_id = f"rsc_sap_{self.sap_sid.upper()}_ERS{self.ers_instance_number}"
126+
114127
for node in node_attributes:
115128
node_name = node.attrib["name"]
116129
for attribute in node:
117130
if attribute.attrib["name"] == f"runs_ers_{self.sap_sid.upper()}":
118131
if attribute.attrib["value"] == "1":
119132
self.result["ers_node"] = node_name
120-
else:
133+
elif attribute.attrib["value"] == "0":
121134
self.result["ascs_node"] = node_name
122135

123-
if self.result["ascs_node"] == "" and self.result["ers_node"] != "":
124-
self.result["ascs_node"] = next(
125-
(n for n in all_nodes if n != self.result["ers_node"]), ""
126-
)
136+
if resources is not None:
137+
ascs_resource = resources.find(f".//resource[@id='{ascs_resource_id}']")
138+
ers_resource = resources.find(f".//resource[@id='{ers_resource_id}']")
139+
140+
if ascs_resource is not None:
141+
failed = ascs_resource.attrib.get("failed", "false").lower() == "true"
142+
active = ascs_resource.attrib.get("active", "false").lower() == "true"
143+
if not failed and active:
144+
node_element = ascs_resource.find("node")
145+
if node_element is not None:
146+
self.result["ascs_node"] = node_element.attrib.get(
147+
"name", self.result["ascs_node"]
148+
)
149+
else:
150+
self.result["ascs_node"] = ""
151+
152+
if ers_resource is not None:
153+
failed = ers_resource.attrib.get("failed", "false").lower() == "true"
154+
active = ers_resource.attrib.get("active", "false").lower() == "true"
155+
if not failed and active:
156+
node_element = ers_resource.find("node")
157+
if node_element is not None:
158+
self.result["ers_node"] = node_element.attrib.get(
159+
"name", self.result["ers_node"]
160+
)
161+
else:
162+
self.result["ers_node"] = ""
163+
127164
return self.result
128165

129166
def _is_cluster_ready(self) -> bool:
130167
"""
131-
Check if the cluster is ready by verifying the ASCS node.
168+
Check if the cluster is ready by verifying at least one of ASCS or ERS nodes.
132169
133-
:return: True if the cluster is ready, False otherwise.
170+
:return: True if either ASCS or ERS node is available, False otherwise.
134171
:rtype: bool
135172
"""
136-
return self.result["ascs_node"] != ""
173+
return self.result["ascs_node"] != "" or self.result["ers_node"] != ""
137174

138175
def _is_cluster_stable(self) -> bool:
139176
"""
@@ -151,6 +188,8 @@ def run_module() -> None:
151188
"""
152189
module_args = dict(
153190
sap_sid=dict(type="str", required=True),
191+
scs_instance_number=dict(type="str", required=False),
192+
ers_instance_number=dict(type="str", required=False),
154193
ansible_os_family=dict(type="str", required=False),
155194
)
156195

@@ -159,6 +198,8 @@ def run_module() -> None:
159198
checker = SCSClusterStatusChecker(
160199
sap_sid=module.params["sap_sid"],
161200
ansible_os_family=module.params["ansible_os_family"],
201+
ascs_instance_number=module.params["scs_instance_number"],
202+
ers_instance_number=module.params["ers_instance_number"],
162203
)
163204
checker.run()
164205

src/roles/ha_scs/tasks/ascs-migration.yml

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -43,10 +43,11 @@
4343
get_cluster_status_scs:
4444
sap_sid: "{{ sap_sid | lower }}"
4545
ansible_os_family: "{{ ansible_os_family | upper }}"
46+
scs_instance_number: "{{ scs_instance_number }}"
47+
ers_instance_number: "{{ ers_instance_number }}"
4648
register: cluster_status_test_execution
4749
retries: 50
4850
delay: 10
49-
failed_when: false
5051
until: |
5152
cluster_status_test_execution.ascs_node == cluster_status_pre.ers_node
5253
and cluster_status_test_execution.ers_node == cluster_status_pre.ascs_node

src/roles/ha_scs/tasks/ascs-node-crash.yml

Lines changed: 41 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -28,6 +28,11 @@
2828
when: ansible_hostname == cluster_status_pre.ascs_node
2929
become: true
3030
block:
31+
- name: "Test Execution: Check for ENSA version"
32+
ansible.builtin.shell: pgrep -f 'enqr.sap{{ sap_sid | upper }}' | wc -l
33+
register: ensa2_check
34+
failed_when: false
35+
3136
- name: "Test Execution: Start timer"
3237
ansible.builtin.set_fact:
3338
test_execution_start: "{{ now(utc=true, fmt='%Y-%m-%d %H:%M:%S') }}"
@@ -43,18 +48,46 @@
4348
when: ansible_hostname == cluster_status_pre.ers_node
4449
become: true
4550
block:
46-
- name: "Test Execution: Validate SCS cluster status"
51+
- name: "Test Execution: Validate ASCS node has stopped"
4752
get_cluster_status_scs:
4853
sap_sid: "{{ sap_sid | lower }}"
4954
ansible_os_family: "{{ ansible_os_family | upper }}"
50-
register: cluster_status_test_execution
55+
scs_instance_number: "{{ scs_instance_number }}"
56+
ers_instance_number: "{{ ers_instance_number }}"
57+
register: cluster_status_test_execution_pre
5158
retries: 50
5259
delay: 10
5360
failed_when: false
61+
until: cluster_status_test_execution_pre.ascs_node == ""
62+
63+
- name: "Test Execution: Validate SCS cluster status ENSA1"
64+
when: hostvars[cluster_status_pre.ascs_node].ensa2_check.stdout == "0"
65+
get_cluster_status_scs:
66+
sap_sid: "{{ sap_sid | lower }}"
67+
ansible_os_family: "{{ ansible_os_family | upper }}"
68+
scs_instance_number: "{{ scs_instance_number }}"
69+
ers_instance_number: "{{ ers_instance_number }}"
70+
register: cluster_status_test_execution
71+
retries: 50
72+
delay: 10
5473
until: |
5574
cluster_status_test_execution.ascs_node == cluster_status_pre.ers_node
5675
and cluster_status_test_execution.ers_node == cluster_status_pre.ascs_node
5776
77+
- name: "Test Execution: Validate SCS cluster status ENSA2"
78+
when: hostvars[cluster_status_pre.ascs_node].ensa2_check.stdout != "0"
79+
get_cluster_status_scs:
80+
sap_sid: "{{ sap_sid | lower }}"
81+
ansible_os_family: "{{ ansible_os_family | upper }}"
82+
scs_instance_number: "{{ scs_instance_number }}"
83+
ers_instance_number: "{{ ers_instance_number }}"
84+
register: cluster_status_test_execution
85+
retries: 50
86+
delay: 10
87+
until: |
88+
cluster_status_test_execution.ascs_node != ""
89+
and cluster_status_test_execution.ers_node != ""
90+
5891
- name: "Test Execution: Simulate ASCS Node Crash"
5992
when: ansible_hostname == cluster_status_pre.ascs_node
6093
become: true
@@ -72,15 +105,18 @@
72105
ansible.builtin.set_fact:
73106
test_case_message_from_test_case: |
74107
Old ASCS: {{ cluster_status_pre.ascs_node }}
75-
New ASCS: {{ hostvars[cluster_status_pre.ers_node].cluster_status_test_execution.ascs_node }}
108+
New ASCS: {{ hostvars[cluster_status_pre.ers_node].cluster_status_test_execution.ascs_node
109+
or cluster_status_test_execution.ascs_node }}
76110
Old ERS: {{ cluster_status_pre.ers_node }}
77-
New ERS: {{ hostvars[cluster_status_pre.ers_node].cluster_status_test_execution.ers_node }}
111+
New ERS: {{ hostvars[cluster_status_pre.ers_node].cluster_status_test_execution.ers_node
112+
or cluster_status_test_execution.ers_node }}
78113
test_case_details_from_test_case: {
79114
"Pre Validations: Validate HANA DB cluster status": "{{ cluster_status_pre }}",
80115
"Pre Validations: CleanUp any failed resource": "{{ cleanup_failed_resource_pre }}",
81116
"Test Execution: Crash ASCS resource": "{{ ascs_crash_result }}",
82117
"Test Execution: Cleanup resources": "{{ cleanup_failed_resource_test_execution }}",
83-
"Post Validations Result": "{{ hostvars[cluster_status_pre.ers_node].cluster_status_test_execution }}",
118+
"Post Validations Result": "{{ hostvars[cluster_status_pre.ers_node].cluster_status_test_execution
119+
or cluster_status_test_execution }}",
84120
}
85121
# /*---------------------------------------------------------------------------
86122
# | Post Validations |

0 commit comments

Comments
 (0)