Skip to content

Commit a12eaf2

Browse files
authored
Merge pull request kubernetes#1973 from andrewsykim/kubelet-exec-timeout
KEP-1972: kubelet exec probe timeouts
2 parents 09548f5 + 40670f6 commit a12eaf2

File tree

2 files changed

+145
-0
lines changed

2 files changed

+145
-0
lines changed
Lines changed: 109 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,109 @@
1+
# KEP-1972: Kubelet Exec Probe Timeouts
2+
3+
<!-- toc -->
4+
- [Release Signoff Checklist](#release-signoff-checklist)
5+
- [Summary](#summary)
6+
- [Motivation](#motivation)
7+
- [Goals](#goals)
8+
- [Non-Goals](#non-goals)
9+
- [Proposal](#proposal)
10+
- [Risks and Mitigations](#risks-and-mitigations)
11+
- [Design Details](#design-details)
12+
- [Test Plan](#test-plan)
13+
- [Graduation Criteria](#graduation-criteria)
14+
- [Upgrade / Downgrade Strategy](#upgrade--downgrade-strategy)
15+
- [Version Skew Strategy](#version-skew-strategy)
16+
- [Implementation History](#implementation-history)
17+
- [Drawbacks](#drawbacks)
18+
- [Alternatives](#alternatives)
19+
<!-- /toc -->
20+
21+
## Release Signoff Checklist
22+
23+
Items marked with (R) are required *prior to targeting to a milestone / release*.
24+
25+
- [X] (R) Enhancement issue in release milestone, which links to KEP dir in [kubernetes/enhancements] (not the initial KEP PR)
26+
- [X] (R) KEP approvers have approved the KEP status as `implementable`
27+
- [X] (R) Design details are appropriately documented
28+
- [X] (R) Test plan is in place, giving consideration to SIG Architecture and SIG Testing input
29+
- [X] (R) Graduation criteria is in place
30+
- [ ] (R) Production readiness review completed
31+
- [ ] Production readiness review approved
32+
- [ ] "Implementation History" section is up-to-date for milestone
33+
- [ ] User-facing documentation has been created in [kubernetes/website], for publication to [kubernetes.io]
34+
- [ ] Supporting documentation—e.g., additional design documents, links to mailing list discussions/SIG meetings, relevant PRs/issues, release notes
35+
36+
[kubernetes.io]: https://kubernetes.io/
37+
[kubernetes/enhancements]: https://git.k8s.io/enhancements
38+
[kubernetes/kubernetes]: https://git.k8s.io/kubernetes
39+
[kubernetes/website]: https://git.k8s.io/website
40+
41+
## Summary
42+
43+
Kubelet today does not respect exec probe timeouts. This is considered a bug we should fix since
44+
the timeout value is supported in the Container Probe API. Because exec probe timeouts
45+
were never respected by kubelet, a new feature gate `ExecProbeTimeouts` will be introduced.
46+
With this feature, nodes can be configured to preserve the current behavior while the proper
47+
timeouts are enabled for exec probes.
48+
49+
## Motivation
50+
51+
Kubelet not respecting the probe timeout is a bug and should be fixed.
52+
53+
### Goals
54+
55+
* treat exec probe timeouts as probe failures in kubelet
56+
57+
### Non-Goals
58+
59+
* ensuring exec processes that timed out have been killed by kubelet.
60+
* introducing CRI errors for handling scenarios such as time outs.
61+
62+
## Proposal
63+
64+
### Risks and Mitigations
65+
66+
* existing workloads on Kubernetes that relied on this bug may unexpectedly see their probes timeout
67+
68+
## Design Details
69+
70+
Changes to kubelet:
71+
* Ensure kubelet handles timeout errors and registers them as failing probes.
72+
* Add feature gate `ExecProbeTimeouts` that is GA and on by default.
73+
* If the feature gate `ExecProbeTimeouts` is disabled and an exec probe timeout is reached, add warning logs to inform users that exec probes are timing out.
74+
* Re-enable existing exec liveness probe e2e test.
75+
* Add new exec readiness probe e2e test.
76+
77+
### Test Plan
78+
79+
E2E tests:
80+
* re-enable [existing exec liveness probe e2e test](https://github.com/kubernetes/kubernetes/blob/ea1458550077bdf3b26ac34551a3591d280fe1f5/test/e2e/common/container_probe.go#L210-L227) that is currently being skipped
81+
* add new exec readiness probe e2e test.
82+
83+
### Graduation Criteria
84+
85+
This is a bug fix so the feature gate will be GA and on by default from the start.
86+
87+
### Upgrade / Downgrade Strategy
88+
89+
N/A
90+
91+
### Version Skew Strategy
92+
93+
N/A
94+
95+
## Implementation History
96+
97+
* 2020-09-08 - the KEP was merged as implementable for v1.20
98+
99+
## Drawbacks
100+
101+
* Existing workloads may depend on the fact that exec probe timeouts were never respected. Introducing
102+
the timeout now may result in unexpected behavior for some workloads.
103+
104+
## Alternatives
105+
106+
Some alternatives that were considered:
107+
1. Increasing the default timeout for exec probes
108+
2. Continuing to ignore the exec probe timeout
109+
Lines changed: 36 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,36 @@
1+
title: Kubelet Exec Probe Timeouts
2+
kep-number: 1972
3+
authors:
4+
- "@andrewsykim"
5+
- "@SergeyKanzhelev"
6+
owning-sig: sig-node
7+
participating-sigs:
8+
status: implementable
9+
creation-date: 2020-09-08
10+
reviewers:
11+
- "@dchen1107"
12+
- "@derekwaynecarr"
13+
approvers:
14+
- "@dchen1107"
15+
- "@derekwaynecarr"
16+
17+
# The target maturity stage in the current dev cycle for this KEP.
18+
stage: stable
19+
20+
# The most recent milestone for which work toward delivery of this KEP has been
21+
# done. This can be the current (upcoming) milestone, if it is being actively
22+
# worked on.
23+
latest-milestone: "v1.20"
24+
25+
# The milestone at which this feature was, or is targeted to be, at each stage.
26+
milestone:
27+
stable: "v1.20"
28+
29+
# The following PRR answers are required at alpha release
30+
# List the feature gate name and the components for which it must be enabled
31+
feature-gates:
32+
- name: ExecProbeTimeouts
33+
components:
34+
- kubelet
35+
disable-supported: true
36+

0 commit comments

Comments
 (0)