Skip to content

Commit 202e835

Browse files
authored
Merge pull request #8136 from aojea/testing-strategy
kubernetes testing strategy
2 parents aa351ee + 463cbd6 commit 202e835

File tree

3 files changed

+120
-0
lines changed

3 files changed

+120
-0
lines changed
181 KB
Loading
Lines changed: 115 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,115 @@
1+
## Defining a Robust Testing Strategy
2+
3+
This document outlines a testing strategy for Kubernetes features based on past experiences and lessons learned taking into consideration the existing CI system's characteristics.
4+
5+
### The Testing Pyramid
6+
7+
The [**testing pyramid**](https://martinfowler.com/articles/practical-test-pyramid.html) is
8+
a metaphor, nothing else, that helps visualize how to structure software tests. It's not a rigid prescription, but a general guideline for creating a balanced and effective testing strategy.
9+
10+
![testing-pyramid](./sig_testing_kubecon_na_2022_pyramid.png)
11+
12+
Prioritize tests based on the testing pyramid, refer to the [Testing Guide](./testing.md):
13+
14+
- **Unit Tests:** The foundation. Fast (a package *should* be able to run all its unit tests in seconds), isolated (doesn't depend on environment or non-local network), and cover individual components.
15+
- **Integration Tests:** Verify interactions between components within your subsystem. These are preferred for tests which require cluster components to be run with test-specific configurations.
16+
- **E2E Tests:** Test the entire system, including interactions with external dependencies. These are the most expensive and prone to flakiness. Every cluster component configuration variant requires a distinct e2e job.
17+
18+
### CI Job Types
19+
20+
The Kubernetes job uses [prow](https://prow.k8s.io) to implement the CI system. We can differentiate between different types of jobs:
21+
22+
- **Presubmit:** Runs before code is merged.
23+
- **Blocking:** Prevents merging if tests fail. Use cautiously due to potential project-wide impact. We aim to have a very high bar on these jobs and ask for proof
24+
of stability, reliability and performance.
25+
- **Non-Blocking/Informational:** Provides feedback without blocking merges.
26+
- **Postsubmit:** Runs after code is merged. Useful for building artifacts.
27+
- **Periodic:** Runs at scheduled intervals. Ideal for monitoring trends and catching regressions.
28+
29+
#### SIG Release Blocking and Informing jobs
30+
31+
SIG Release maintains two sets of jobs that decide whether the release is
32+
healthy: Blocking and Informing.
33+
34+
If your feature or area is critical for the release please follow the instructions provided in https://github.com/kubernetes/sig-release/blob/master/release-blocking-jobs.md to promote your periodic jobs to be Blocking or Informing.
35+
36+
A condition necessary for Presubmit Blocking jobs is to be also a Release Blocking jobs.
37+
Presubmit Blocking jobs should be even faster than Release Blocking jobs (under one hour, preferably under 30 minutes).
38+
Jobs should only be promoted to presubmit blocking if we are **frequently** identifying bugs only after they disrupt the release blocking jobs. A few times per release is not sufficient, you can use testgrid to bisect the commits that merged between pass / fail and revert or fix the relevant PR.
39+
40+
Presubmit blocking jobs are much more expensive to run than release blocking jobs because they must run on every pushed PR, not just merged code, and issues with them disrupt ALL contributors, so we do not add presubmit blocking jobs lightly.
41+
42+
### Mitigating E2E Test Flakiness
43+
44+
E2E tests, especially in a complex project like Kubernetes, are susceptible to flakiness due to external dependencies. To mitigate this:
45+
46+
- **Identify Patterns:** Utilize periodic jobs and Testgrid to monitor test results and identify patterns in failures.
47+
- **Isolate Failures:** Improve test isolation to minimize the impact of external factors.
48+
- **Retry Mechanisms:** Implement retry mechanisms in CI jobs to handle transient failures.
49+
It is important to differentiate clearly what are retriable errors, or you may have the risk
50+
of masking legit bugs that will be present in clusters running in production.
51+
- **Robust Infrastructure:** Ensure the test infrastructure itself is reliable and stable.
52+
53+
### Testing Strategy for Specific Features/Areas
54+
55+
If your focus is on a specific feature or area within Kubernetes, it is your responsibility
56+
to ensure that tests for that a) run in the CI and b) remain healthy.
57+
SIG Testing provides the *tooling* for running tests, but is not
58+
responsible for *running* specific tests.
59+
60+
Consider this strategy:
61+
62+
1. **Periodic Jobs:**
63+
- Run expensive E2E tests periodically (e.g., every 6 hours).
64+
- Use Testgrid to monitor trends and receive alerts for failures. This helps identify patterns and troubleshoot issues effectively.
65+
- Subscribe to alerts, Testgrid provides early signals if changes elsewhere in Kubernetes break your feature.
66+
67+
2. **Non-Blocking Presubmit Jobs:**
68+
- Configure presubmit jobs to run only when specific files or folders are modified. This can be done using the `run_if_changed` [trigger in prow](https://docs.prow.k8s.io/docs/jobs/#triggering-jobs-based-on-changes).
69+
- Use OWNERS files to require approval from maintainers of the relevant codebase. This acts as a "soft block," ensuring review and accountability without the risk of halting the entire project.
70+
- Encourages maintainers to take ownership of their code's quality and stability.
71+
72+
### Example: CI Configuration
73+
74+
There are a large number of CI jobs configurations that depend on multiple facvtos, here's a basic example. Remember to adapt it to your specific needs.
75+
76+
```yaml
77+
presubmits:
78+
kubernetes/my-feature:
79+
- name: my-feature-e2e-tests
80+
always_run: false
81+
run_if_changed: 'my-feature/**/*'
82+
optional: true
83+
decorate: true
84+
path_alias: 'kubernetes/my-feature'
85+
spec:
86+
containers:
87+
- image: gcr.io/k8s-testimages/kubekins-e2e:v20241104-master-5917669-master
88+
command:
89+
- runner.sh
90+
- ./test/e2e.sh
91+
args:
92+
# Run tests labeled with "MyFeature", and only with that.
93+
-ginkgo.label-filter='Feature: containsAny MyFeature && Feature: isSubsetOf MyFeature && !Flaky'
94+
annotations:
95+
testgrid-dashboards: sig-my-feature
96+
testgrid-tab-name: My Feature E2E Tests
97+
98+
periodics:
99+
kubernetes/my-feature:
100+
- name: my-feature-periodic-e2e-tests
101+
interval: 6h
102+
decorate: true
103+
path_alias: 'kubernetes/my-feature'
104+
spec:
105+
containers:
106+
- image: gcr.io/k8s-testimages/kubekins-e2e:v20241104-master-5917669-master
107+
command:
108+
- runner.sh
109+
- ./test/e2e.sh
110+
args:
111+
- -ginkgo.label-filter='Feature: containsAny MyFeature && Feature: isSubsetOf MyFeature && !Flaky'
112+
annotations:
113+
testgrid-dashboards: sig-my-feature
114+
testgrid-tab-name: My Feature Periodic E2E Tests
115+
```

contributors/devel/sig-testing/testing.md

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -198,6 +198,11 @@ Please refer to [Integration Testing in Kubernetes](integration-tests.md).
198198

199199
Please refer to [End-to-End Testing in Kubernetes](e2e-tests.md).
200200

201+
## Testing Strategy
202+
203+
Either if you are a feature owner or subsystem or area maintaner, you have to define a
204+
testing strategy for your area, please refer to [Defining a Robust Testing Strategy in Kubernetes](testing-strategy.md).
205+
201206
## Running your contribution through Kubernetes CI
202207
Once you open a PR, [`prow`][prow-url] runs pre-submit tests in CI. You can find more about `prow` in [kubernetes/test-infra][prow-git] and in [this blog post][prow-doc] on automation involved in testing PRs to Kubernetes.
203208

0 commit comments

Comments
 (0)