Skip to content

Commit df90461

Browse files
committed
kubernetes testing strategy
Since this is a recurrent topic and we are mainly operating by tribal knowledge, I think that is good to have it written so we can use it as reference for future discussions. Change-Id: I300b06d82a1064bfc3b5bc904f02f8b786cd5480
1 parent cc28c5c commit df90461

File tree

2 files changed

+104
-0
lines changed

2 files changed

+104
-0
lines changed
Lines changed: 99 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,99 @@
1+
## Defining a Robust Testing Strategy
2+
3+
This document outlines a testing strategy for Kubernetes features based on the **testing pyramid**, with considerations on the existing CI system's characteristics.
4+
5+
### The Testing Pyramid
6+
7+
Prioritize tests based on the testing pyramid, refer to the [Testing Guide](./testing.md):
8+
9+
- **Unit Tests:** The foundation. Fast, isolated, and cover individual components.
10+
- **Integration Tests:** Verify interactions between components within your subsystem.
11+
- **E2E Tests:** Test the entire system, including interactions with external dependencies. These are the most expensive and prone to flakiness.
12+
13+
### CI Job Types
14+
15+
The Kubernetes job uses [prow](https://prow.k8s.io) to implement the CI system. We can differentiate between different of jobs:
16+
17+
- **Presubmit:** Runs before code is merged.
18+
- **Blocking:** Prevents merging if tests fail. Use cautiously due to potential project-wide impact. We aim to have a very high bar on these jobs and ask for proof
19+
of stability, reliability and performance.
20+
- **Non-Blocking/Informational:** Provides feedback without blocking merges.
21+
- **Periodic:** Runs at scheduled intervals. Ideal for monitoring trends and catching regressions.
22+
- **Postsubmit:** Runs after code is merged. Useful for building artifacts.
23+
24+
#### SIG-Release Blocking and Informing jobs
25+
26+
SIG-release maintains two sets of jobs that decide whether the release is
27+
healthy: Blocking and Informing.
28+
29+
If you are feature or area is critical for the release please follow the instructions provided in https://github.com/kubernetes/sig-release/blob/master/release-blocking-jobs.md to promote your periodic jobs to be Blocking or Informing.
30+
31+
A condition necessary for Presubmit Blocking jobs is to be also a Release Blocking jobs.
32+
33+
### Mitigating E2E Test Flakiness
34+
35+
E2E tests, especially in a complex project like Kubernetes, are susceptible to flakiness due to external dependencies. To mitigate this:
36+
37+
- **Identify Patterns:** Utilize periodic jobs and Testgrid to monitor test results and identify patterns in failures.
38+
- **Isolate Failures:** Improve test isolation to minimize the impact of external factors.
39+
- **Retry Mechanisms:** Implement retry mechanisms in CI jobs to handle transient failures.
40+
It is important to differentiate clearly what are retriable errors, or you may have the risk
41+
of masking legit bugs that will be present in clusters running in production.
42+
- **Robust Infrastructure:** Ensure the test infrastructure itself is reliable and stable.
43+
44+
### Testing Strategy for Specific Features/Areas
45+
46+
If your focus is on a specific feature or area within Kubernetes, consider this strategy:
47+
48+
1. **Periodic Jobs:**
49+
- Run expensive E2E tests periodically (e.g., every 6 hours).
50+
- Use Testgrid to monitor trends and receive alerts for failures. This helps identify patterns and troubleshoot issues effectively.
51+
- Subscribe to alerts, Testgrid provides early signals if changes elsewhere in Kubernetes break your feature.
52+
53+
2. **Non-Blocking Presubmit Jobs:**
54+
- Configure presubmit jobs to run only when specific files or folders are modified.
55+
- Use OWNERS files to require approval from maintainers of the relevant codebase. This acts as a "soft block," ensuring review and accountability without the risk of halting the entire project.
56+
- Encourages maintainers to take ownership of their code's quality and stability.
57+
58+
### Example: CI Configuration
59+
60+
```yaml
61+
presubmits:
62+
kubernetes/my-feature:
63+
- name: my-feature-e2e-tests
64+
always_run: false
65+
run_if_changed: 'my-feature/**/*'
66+
optional: true
67+
decorate: true
68+
path_alias: 'kubernetes/my-feature'
69+
spec:
70+
containers:
71+
- image: gcr.io/k8s-testimages/kubekins-e2e:v20241104-master-5917669-master
72+
command:
73+
- runner.sh
74+
- ./test/e2e.sh
75+
args:
76+
- --ginkgo.focus=MyFeature
77+
- --ginkgo.skip=Slow
78+
annotations:
79+
testgrid-dashboards: sig-my-feature
80+
testgrid-tab-name: My Feature E2E Tests
81+
82+
periodics:
83+
kubernetes/my-feature:
84+
- name: my-feature-periodic-e2e-tests
85+
interval: 6h
86+
decorate: true
87+
path_alias: 'kubernetes/my-feature'
88+
spec:
89+
containers:
90+
- image: gcr.io/k8s-testimages/kubekins-e2e:v20241104-master-5917669-master
91+
command:
92+
- runner.sh
93+
- ./test/e2e.sh
94+
args:
95+
- --ginkgo.focus=MyFeature
96+
annotations:
97+
testgrid-dashboards: sig-my-feature
98+
testgrid-tab-name: My Feature Periodic E2E Tests
99+
```

contributors/devel/sig-testing/testing.md

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -198,6 +198,11 @@ Please refer to [Integration Testing in Kubernetes](integration-tests.md).
198198

199199
Please refer to [End-to-End Testing in Kubernetes](e2e-tests.md).
200200

201+
## Testing Strategy
202+
203+
Either if you are a feature owner or subsystem or area maintaner, you have to define a
204+
testing strategy for your area, please refer to [Defining a Robust Testing Strategy in Kubernetes](testing-strategy.md).
205+
201206
## Running your contribution through Kubernetes CI
202207
Once you open a PR, [`prow`][prow-url] runs pre-submit tests in CI. You can find more about `prow` in [kubernetes/test-infra][prow-git] and in [this blog post][prow-doc] on automation involved in testing PRs to Kubernetes.
203208

0 commit comments

Comments
 (0)