Skip to content

Commit a14d395

Browse files
ddjaincursoragent
andauthored
feat(ci): add pytest-based CI test framework v2 with ephemeral namespace isolation (#1172) (#1171)
* feat: add pytest-based CI test framework v2 with ephemeral namespace isolation Signed-off-by: ddjain <darjain@redhat.com> * feat(ci): add tests_v2 pytest functional test framework Signed-off-by: ddjain <darjain@redhat.com> Co-authored-by: Cursor <cursoragent@cursor.com> * feat: improve naming convention Signed-off-by: ddjain <darjain@redhat.com> * improve local setup script. Signed-off-by: ddjain <darjain@redhat.com> * added CI job for v2 test Signed-off-by: ddjain <darjain@redhat.com> * disabled broken test Signed-off-by: ddjain <darjain@redhat.com> * improved CI pipeline execution time Signed-off-by: ddjain <darjain@redhat.com> * chore: remove unwanted/generated files from PR Signed-off-by: ddjain <darjain@redhat.com> * clean up gitignore file Signed-off-by: ddjain <darjain@redhat.com> * fix copilot comments Signed-off-by: ddjain <darjain@redhat.com> * fixed copilot suggestion Signed-off-by: ddjain <darjain@redhat.com> * uncommented out test upload stage Signed-off-by: ddjain <darjain@redhat.com> * exclude CI/tests_v2 from test coverage reporting Signed-off-by: ddjain <darjain@redhat.com> * uploading style.css to fix broken report artifacts Signed-off-by: ddjain <darjain@redhat.com> * added openshift supported labels in namespace creatation api Signed-off-by: ddjain <darjain@redhat.com> --------- Signed-off-by: ddjain <darjain@redhat.com> Co-authored-by: Cursor <cursoragent@cursor.com>
1 parent f655ec1 commit a14d395

28 files changed

+2247
-0
lines changed

.coveragerc

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2,3 +2,4 @@
22
omit =
33
tests/*
44
krkn/tests/**
5+
CI/tests_v2/*

.github/workflows/tests_v2.yml

Lines changed: 53 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,53 @@
1+
name: Tests v2 (pytest functional)
2+
on:
3+
pull_request:
4+
push:
5+
branches:
6+
- main
7+
jobs:
8+
tests-v2:
9+
name: Tests v2 (pytest functional)
10+
runs-on: ubuntu-latest
11+
steps:
12+
- name: Check out code
13+
uses: actions/checkout@v3
14+
15+
- name: Create KinD cluster
16+
uses: redhat-chaos/actions/kind@main
17+
18+
- name: Pre-load test images into KinD
19+
run: |
20+
docker pull nginx:alpine
21+
kind load docker-image nginx:alpine
22+
docker pull quay.io/krkn-chaos/krkn:tools
23+
kind load docker-image quay.io/krkn-chaos/krkn:tools
24+
25+
- name: Install Python
26+
uses: actions/setup-python@v4
27+
with:
28+
python-version: '3.11'
29+
architecture: 'x64'
30+
cache: 'pip'
31+
32+
- name: Install dependencies
33+
run: |
34+
sudo apt-get install -y build-essential python3-dev
35+
pip install --upgrade pip
36+
pip install -r requirements.txt
37+
pip install -r CI/tests_v2/requirements.txt
38+
39+
- name: Run tests_v2
40+
run: |
41+
KRKN_TEST_COVERAGE=1 python -m pytest CI/tests_v2/ -v --timeout=300 --reruns=1 --reruns-delay=5 \
42+
--html=CI/tests_v2/report.html -n auto --junitxml=CI/tests_v2/results.xml
43+
44+
- name: Upload tests_v2 artifacts
45+
if: always()
46+
uses: actions/upload-artifact@v4
47+
with:
48+
name: tests-v2-results
49+
path: |
50+
CI/tests_v2/report.html
51+
CI/tests_v2/results.xml
52+
CI/tests_v2/assets/
53+
if-no-files-found: ignore

.gitignore

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -64,6 +64,10 @@ CI/out/*
6464
CI/ci_results
6565
CI/legacy/*node.yaml
6666
CI/results.markdown
67+
# CI tests_v2 (pytest-html / pytest outputs)
68+
CI/tests_v2/results.xml
69+
CI/tests_v2/report.html
70+
CI/tests_v2/assets/
6771

6872
#env
6973
chaos/*

CI/tests_v2/CONTRIBUTING_TESTS.md

Lines changed: 175 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,175 @@
1+
# Adding a New Scenario Test (CI/tests_v2)
2+
3+
This guide explains how to add a new chaos scenario test to the v2 pytest framework. The layout is **folder-per-scenario**: each scenario has its own directory under `scenarios/<scenario_name>/` containing the test file, Kubernetes resources, and the Krkn scenario base YAML.
4+
5+
## Option 1: Scaffold script (recommended)
6+
7+
From the **repository root**:
8+
9+
```bash
10+
python CI/tests_v2/scaffold.py --scenario service_hijacking
11+
```
12+
13+
This creates:
14+
15+
- `CI/tests_v2/scenarios/service_hijacking/test_service_hijacking.py` — A test class extending `BaseScenarioTest` with a stub `test_happy_path` and `WORKLOAD_MANIFEST` pointing to the folder’s `resource.yaml`.
16+
- `CI/tests_v2/scenarios/service_hijacking/resource.yaml` — A placeholder Deployment (namespace is patched at deploy time).
17+
- `CI/tests_v2/scenarios/service_hijacking/scenario_base.yaml` — A placeholder Krkn scenario; edit this with the structure expected by your scenario type.
18+
19+
The script automatically registers the marker in `CI/tests_v2/pytest.ini`. For example, it adds:
20+
21+
```
22+
service_hijacking: marks a test as a service_hijacking scenario test
23+
```
24+
25+
**Next steps after scaffolding:**
26+
27+
1. Verify the marker was added to `pytest.ini` (the scaffold does this automatically).
28+
2. Edit `scenario_base.yaml` with the structure your Krkn scenario type expects (see `scenarios/application_outage/scenario_base.yaml` and `scenarios/pod_disruption/scenario_base.yaml` for examples). The top-level key should match `SCENARIO_NAME`.
29+
3. If your scenario uses a **list** structure (like pod_disruption) instead of a **dict** with a top-level key, set `NAMESPACE_KEY_PATH` (e.g. `[0, "config", "namespace_pattern"]`) and `NAMESPACE_IS_REGEX = True` if the namespace is a regex pattern.
30+
4. The generated `test_happy_path` already uses `self.run_scenario(self.tmp_path, ns)` and assertions. Add more test methods (e.g. negative tests with `@pytest.mark.no_workload`) as needed.
31+
5. Adjust `resource.yaml` if your scenario needs a different workload (e.g. specific image or labels).
32+
33+
If your Kraken scenario type string is not `<scenario>_scenarios`, pass it explicitly:
34+
35+
```bash
36+
python CI/tests_v2/scaffold.py --scenario node_disruption --scenario-type node_scenarios
37+
```
38+
39+
## Option 2: Manual setup
40+
41+
1. **Create the scenario folder**
42+
`CI/tests_v2/scenarios/<scenario_name>/`.
43+
44+
2. **Add resource.yaml**
45+
Kubernetes manifest(s) for the workload (Deployment or Pod). Use a distinct label (e.g. `app: <scenario>-target`). Omit or leave `metadata.namespace`; the framework patches it at deploy time.
46+
47+
3. **Add scenario_base.yaml**
48+
The canonical Krkn scenario structure. Tests will load this, patch namespace (and any overrides), write to `tmp_path`, and pass to `build_config`. See existing scenarios for the format your scenario type expects.
49+
50+
4. **Add test_<scenario>.py**
51+
- Import `BaseScenarioTest` from `lib.base` and helpers from `lib.utils` (e.g. `assert_kraken_success`, `get_pods_list`, `scenario_dir` if needed).
52+
- Define a class extending `BaseScenarioTest` with:
53+
- `WORKLOAD_MANIFEST = "CI/tests_v2/scenarios/<scenario_name>/resource.yaml"`
54+
- `WORKLOAD_IS_PATH = True`
55+
- `LABEL_SELECTOR = "app=<label>"`
56+
- `SCENARIO_NAME = "<scenario_name>"`
57+
- `SCENARIO_TYPE = "<scenario_type>"` (e.g. `application_outages_scenarios`)
58+
- `NAMESPACE_KEY_PATH`: path to the namespace field (e.g. `["application_outage", "namespace"]` for dict-based, or `[0, "config", "namespace_pattern"]` for list-based)
59+
- `NAMESPACE_IS_REGEX = False` (or `True` for regex patterns like pod_disruption)
60+
- `OVERRIDES_KEY_PATH = ["<top-level key>"]` if the scenario supports overrides (e.g. duration, block).
61+
- Add `@pytest.mark.functional` and `@pytest.mark.<scenario>` on the class.
62+
- In at least one test, call `self.run_scenario(self.tmp_path, self.ns)` and assert with `assert_kraken_success`, `assert_pod_count_unchanged`, and `assert_all_pods_running_and_ready`. Use `self.k8s_core`, `self.tmp_path`, etc. (injected by the base class).
63+
64+
5. **Register the marker**
65+
In `CI/tests_v2/pytest.ini`, under `markers`:
66+
```
67+
<scenario>: marks a test as a <scenario> scenario test
68+
```
69+
70+
## Conventions
71+
72+
- **Folder-per-scenario**: One directory per scenario under `scenarios/`. All assets (test, resource.yaml, scenario_base.yaml, and any extra YAMLs) live there for easy tracking and onboarding.
73+
- **Ephemeral namespace**: Every test gets a unique `krkn-test-<uuid>` namespace. The base class deploys the workload into it before the test; no manual deploy is required.
74+
- **Negative tests**: For tests that don’t need a workload (e.g. invalid scenario, bad namespace), use `@pytest.mark.no_workload`. The test will still get a namespace but no workload will be deployed.
75+
- **Scenario type**: `SCENARIO_TYPE` must match the key in Kraken’s config (e.g. `application_outages_scenarios`, `pod_disruption_scenarios`). See `CI/tests_v2/config/common_test_config.yaml` and the scenario plugin’s `get_scenario_types()`.
76+
- **Assertions**: Use `assert_kraken_success(result, context=f"namespace={ns}", tmp_path=self.tmp_path)` so failures include stdout/stderr and optional log files.
77+
- **Timeouts**: Use constants from `lib.base` (`READINESS_TIMEOUT`, `POLICY_WAIT_TIMEOUT`, etc.) instead of magic numbers.
78+
79+
## Exit Code Handling
80+
81+
Kraken uses the following exit codes: **0** = success; **1** = scenario failure (e.g. post scenarios still failing); **2** = critical alerts fired; **3+** = health check / KubeVirt check failures; **-1** = infrastructure error (bad config, no kubeconfig).
82+
83+
- **Happy-path tests**: Use `assert_kraken_success(result, ...)`. By default only exit code 0 is accepted.
84+
- **Alert-aware tests**: If you enable `check_critical_alerts` and expect alerts, use `assert_kraken_success(result, allowed_codes=(0, 2), ...)` so exit code 2 is treated as acceptable.
85+
- **Expected-failure tests**: Use `assert_kraken_failure(result, context=..., tmp_path=self.tmp_path)` for negative tests (invalid scenario, bad namespace, etc.). This gives the same diagnostic quality (log dump, tmp_path hint) as success assertions. Prefer this over a bare `assert result.returncode != 0`.
86+
87+
## Running your new tests
88+
89+
```bash
90+
pytest CI/tests_v2/ -v -m <scenario>
91+
```
92+
93+
For debugging with logs and keeping failed namespaces:
94+
95+
```bash
96+
pytest CI/tests_v2/ -v -m <scenario> --log-cli-level=DEBUG --keep-ns-on-fail
97+
```
98+
99+
---
100+
101+
## Naming Conventions
102+
103+
Follow these conventions so the framework stays consistent as new scenarios are added.
104+
105+
### Quick Reference
106+
107+
| Element | Pattern | Example |
108+
|---|---|---|
109+
| Scenario folder | `scenarios/<snake_case>/` | `scenarios/node_disruption/` |
110+
| Test file | `test_<scenario>.py` | `test_node_disruption.py` |
111+
| Test class | `Test<CamelCase>(BaseScenarioTest)` | `TestNodeDisruption` |
112+
| Pytest marker | `@pytest.mark.<scenario>` (matches folder) | `@pytest.mark.node_disruption` |
113+
| Scenario YAML | `scenario_base.yaml` ||
114+
| Workload YAML | `resource.yaml` ||
115+
| Extra YAMLs | `<descriptive_name>.yaml` | `nginx_http.yaml` |
116+
| Lib modules | `lib/<concern>.py` | `lib/deploy.py` |
117+
| Public fixtures | `<verb>_<noun>` or `<noun>` | `run_kraken`, `test_namespace` |
118+
| Private/autouse fixtures | `_<descriptive>` | `_cleanup_stale_namespaces` |
119+
| Assertion helpers | `assert_<condition>` | `assert_pod_count_unchanged` |
120+
| Query helpers | `get_<resource>` or `find_<resource>_by_<criteria>` | `get_pods_list`, `find_network_policy_by_prefix` |
121+
| Env var overrides | `KRKN_TEST_<NAME>` | `KRKN_TEST_READINESS_TIMEOUT` |
122+
123+
### Folders
124+
125+
- One folder per scenario under `scenarios/`. The folder name is `snake_case` and must match the `SCENARIO_NAME` class attribute in the test.
126+
- Shared framework code lives in `lib/`. Each module covers a single concern (`k8s`, `namespace`, `deploy`, `kraken`, `utils`, `base`, `preflight`).
127+
- Do **not** add scenario-specific code to `lib/`; keep it in the scenario folder as module-level helpers.
128+
129+
### Files
130+
131+
- Test files: `test_<scenario>.py`. This is required for pytest discovery (`test_*.py`).
132+
- Workload manifests: always `resource.yaml`. If a scenario needs additional K8s resources (e.g. a Service for traffic testing), use a descriptive name like `nginx_http.yaml`.
133+
- Scenario config: always `scenario_base.yaml`. This is the template that `load_and_patch_scenario` loads and patches.
134+
135+
### Classes
136+
137+
- One test class per file: `Test<CamelCase>` extending `BaseScenarioTest`.
138+
- The CamelCase name must be the PascalCase equivalent of the folder name (e.g. `pod_disruption` -> `TestPodDisruption`).
139+
140+
### Test Methods
141+
142+
- Prefix: `test_` (pytest requirement).
143+
- Use descriptive names that convey **what is being verified**, not implementation details.
144+
- Good: `test_pod_crash_and_recovery`, `test_traffic_blocked_during_outage`, `test_invalid_scenario_fails`.
145+
- Avoid: `test_run_1`, `test_scenario`, `test_it_works`.
146+
147+
### Fixtures
148+
149+
- **Public fixtures** (intended for use in tests): use `<verb>_<noun>` or plain `<noun>`. Examples: `run_kraken`, `deploy_workload`, `test_namespace`, `kubectl`.
150+
- **Private/autouse fixtures** (framework internals): prefix with `_`. Examples: `_kube_config_loaded`, `_preflight_checks`, `_inject_common_fixtures`.
151+
- K8s client fixtures use the `k8s_` prefix: `k8s_core`, `k8s_apps`, `k8s_networking`, `k8s_client`.
152+
153+
### Helpers and Utilities
154+
155+
- **Assertions**: `assert_<what_is_expected>`. Always raise `AssertionError` with a message that includes the namespace.
156+
- **K8s queries**: `get_<resource>_list` for direct API calls, `find_<resource>_by_<criteria>` for filtered lookups.
157+
- **Private helpers**: prefix with `_` for module-internal functions (e.g. `_pods`, `_policies`, `_get_nested`).
158+
159+
### Constants and Environment Variables
160+
161+
- Timeout constants: `UPPER_CASE` in `lib/base.py`. Each is overridable via an env var prefixed `KRKN_TEST_`.
162+
- Feature flags: `KRKN_TEST_DRY_RUN`, `KRKN_TEST_COVERAGE`. Always use the `KRKN_TEST_` prefix so all tunables are discoverable with `grep KRKN_TEST_`.
163+
164+
### Markers
165+
166+
- Every test class gets `@pytest.mark.functional` (framework-wide) and `@pytest.mark.<scenario>` (scenario-specific).
167+
- The scenario marker name matches the folder name exactly.
168+
- Behavioral modifiers use plain descriptive names: `no_workload`, `order`.
169+
- Register all custom markers in `pytest.ini` to avoid warnings.
170+
171+
## Adding Dependencies
172+
173+
- **Runtime (Kraken needs it)**: Add to the **root** `requirements.txt`. Pin a version (e.g. `package==1.2.3` or `package>=1.2,<2`).
174+
- **Test-only (only CI/tests_v2 needs it)**: Add to **`CI/tests_v2/requirements.txt`**. Pin a version there as well.
175+
- After changing either file, run `make setup` (or `make -f CI/tests_v2/Makefile setup`) from the repo root to verify both files install cleanly together.

CI/tests_v2/Makefile

Lines changed: 97 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,97 @@
1+
# CI/tests_v2 functional tests - single entry point.
2+
# Run from repo root: make -f CI/tests_v2/Makefile <target>
3+
# Or from CI/tests_v2: make <target> (REPO_ROOT is resolved automatically).
4+
5+
# Resolve repo root: go to Makefile dir then up two levels (CI/tests_v2 -> repo root)
6+
REPO_ROOT := $(shell cd "$(dir $(firstword $(MAKEFILE_LIST)))" && cd ../.. && pwd)
7+
VENV := $(REPO_ROOT)/venv
8+
PYTHON := $(VENV)/bin/python
9+
PIP := $(VENV)/bin/pip
10+
CLUSTER_NAME ?= ci-krkn
11+
TESTS_DIR := $(REPO_ROOT)/CI/tests_v2
12+
13+
.PHONY: setup preflight test test-fast test-debug test-scenario test-dry-run clean help
14+
15+
help:
16+
@echo "CI/tests_v2 functional tests - usage: make [target]"
17+
@echo ""
18+
@echo "Targets:"
19+
@echo " setup Create venv (if missing), install Python deps, create KinD cluster (kind-config-dev.yml)."
20+
@echo " Run once before first test. Override cluster config: KIND_CONFIG=path make setup"
21+
@echo ""
22+
@echo " preflight Check Python 3.9+, kind, kubectl, Docker, cluster reachability, test deps."
23+
@echo " Invoked automatically by test targets; run standalone to validate environment."
24+
@echo ""
25+
@echo " test Full run: retries (2), timeout 300s, HTML report, JUnit XML, coverage."
26+
@echo " Use for CI or final verification. Output: report.html, results.xml"
27+
@echo ""
28+
@echo " test-fast Quick run: no retries, 120s timeout, no report. For fast local iteration."
29+
@echo ""
30+
@echo " test-debug Debug run: verbose (-s), keep failed namespaces (--keep-ns-on-fail), DEBUG logging."
31+
@echo " Use when investigating failures; inspect kept namespaces with kubectl."
32+
@echo ""
33+
@echo " test-scenario Run only one scenario. Requires SCENARIO=<marker>."
34+
@echo " Example: make test-scenario SCENARIO=pod_disruption"
35+
@echo ""
36+
@echo " test-dry-run Validate scenario plumbing only (no Kraken execution). Sets KRKN_TEST_DRY_RUN=1."
37+
@echo ""
38+
@echo " clean Delete KinD cluster $(CLUSTER_NAME) and remove report.html, results.xml."
39+
@echo ""
40+
@echo " help Show this help."
41+
@echo ""
42+
@echo "Run from repo root: make -f CI/tests_v2/Makefile <target>"
43+
@echo "Or from CI/tests_v2: make <target>"
44+
45+
setup: $(VENV)/.installed
46+
@echo "Running cluster setup..."
47+
$(MAKE) -f $(TESTS_DIR)/Makefile preflight
48+
cd $(REPO_ROOT) && ./CI/tests_v2/setup_env.sh
49+
@echo "Setup complete. Run 'make test' or 'make -f CI/tests_v2/Makefile test' from repo root."
50+
51+
$(VENV)/.installed: $(REPO_ROOT)/requirements.txt $(TESTS_DIR)/requirements.txt
52+
@if [ ! -d "$(VENV)" ]; then python3 -m venv $(VENV); echo "Created venv at $(VENV)"; fi
53+
$(PYTHON) -m pip install -q --upgrade pip
54+
# Root = Kraken runtime; tests_v2 = test-only plugins; both required for functional tests.
55+
$(PIP) install -q -r $(REPO_ROOT)/requirements.txt
56+
$(PIP) install -q -r $(TESTS_DIR)/requirements.txt
57+
@touch $(VENV)/.installed
58+
@echo "Python deps installed."
59+
60+
preflight:
61+
@echo "Preflight: checking Python, tools, and cluster..."
62+
@command -v python3 >/dev/null 2>&1 || { echo "Error: python3 not found."; exit 1; }
63+
@python3 -c "import sys; exit(0 if sys.version_info >= (3, 9) else 1)" || { echo "Error: Python 3.9+ required."; exit 1; }
64+
@command -v kind >/dev/null 2>&1 || { echo "Error: kind not installed."; exit 1; }
65+
@command -v kubectl >/dev/null 2>&1 || { echo "Error: kubectl not installed."; exit 1; }
66+
@docker info >/dev/null 2>&1 || { echo "Error: Docker not running (required for KinD)."; exit 1; }
67+
@if kind get clusters 2>/dev/null | grep -qx "$(CLUSTER_NAME)"; then \
68+
kubectl cluster-info >/dev/null 2>&1 || { echo "Error: Cluster $(CLUSTER_NAME) exists but cluster-info failed."; exit 1; }; \
69+
else \
70+
echo "Note: Cluster $(CLUSTER_NAME) not found. Run 'make setup' to create it."; \
71+
fi
72+
@$(PYTHON) -c "import pytest_rerunfailures, pytest_html, pytest_timeout, pytest_order" 2>/dev/null || \
73+
{ echo "Error: Install test deps with 'make setup' or pip install -r CI/tests_v2/requirements.txt"; exit 1; }
74+
@echo "Preflight OK."
75+
76+
test: preflight
77+
cd $(REPO_ROOT) && KRKN_TEST_COVERAGE=1 $(PYTHON) -m pytest $(TESTS_DIR)/ -v --timeout=300 --reruns=2 --reruns-delay=10 \
78+
--html=$(TESTS_DIR)/report.html -n auto --junitxml=$(TESTS_DIR)/results.xml
79+
80+
test-fast: preflight
81+
cd $(REPO_ROOT) && $(PYTHON) -m pytest $(TESTS_DIR)/ -v -p no:rerunfailures -n auto --timeout=120
82+
83+
test-debug: preflight
84+
cd $(REPO_ROOT) && $(PYTHON) -m pytest $(TESTS_DIR)/ -v -s -p no:rerunfailures --timeout=300 \
85+
--keep-ns-on-fail --log-cli-level=DEBUG
86+
87+
test-scenario: preflight
88+
@if [ -z "$(SCENARIO)" ]; then echo "Error: set SCENARIO=pod_disruption (or application_outage, etc.)"; exit 1; fi
89+
cd $(REPO_ROOT) && $(PYTHON) -m pytest $(TESTS_DIR)/ -v -m "$(SCENARIO)" --timeout=300 --reruns=2 --reruns-delay=10
90+
91+
test-dry-run: preflight
92+
cd $(REPO_ROOT) && KRKN_TEST_DRY_RUN=1 $(PYTHON) -m pytest $(TESTS_DIR)/ -v
93+
94+
clean:
95+
@kind delete cluster --name $(CLUSTER_NAME) 2>/dev/null || true
96+
@rm -f $(TESTS_DIR)/report.html $(TESTS_DIR)/results.xml
97+
@echo "Cleaned cluster and report artifacts."

0 commit comments

Comments
 (0)