You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Items marked with (R) are required *prior to targeting to a milestone / release*.
48
49
49
-
-[] (R) Enhancement issue in release milestone, which links to KEP dir in [kubernetes/enhancements] (not the initial KEP PR)
50
-
-[] (R) KEP approvers have approved the KEP status as `implementable`
51
-
-[] (R) Design details are appropriately documented
52
-
-[] (R) Test plan is in place, giving consideration to SIG Architecture and SIG Testing input (including test refactors)
50
+
-[x] (R) Enhancement issue in release milestone, which links to KEP dir in [kubernetes/enhancements] (not the initial KEP PR)
51
+
-[x] (R) KEP approvers have approved the KEP status as `implementable`
52
+
-[x] (R) Design details are appropriately documented
53
+
-[x] (R) Test plan is in place, giving consideration to SIG Architecture and SIG Testing input (including test refactors)
53
54
-[ ] e2e Tests for all Beta API Operations (endpoints)
54
55
-[ ] (R) Ensure GA e2e tests meet requirements for [Conformance Tests](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/conformance-tests.md)
55
56
-[ ] (R) Minimum Two Week Window for GA e2e tests to prove flake free
56
-
-[] (R) Graduation criteria is in place
57
+
-[x] (R) Graduation criteria is in place
57
58
-[ ] (R) [all GA Endpoints](https://github.com/kubernetes/community/pull/1806) must be hit by [Conformance Tests](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/conformance-tests.md)
58
-
-[] (R) Production readiness review completed
59
-
-[] (R) Production readiness review approved
60
-
-[] "Implementation History" section is up-to-date for milestone
61
-
-[] User-facing documentation has been created in [kubernetes/website], for publication to [kubernetes.io]
62
-
-[] Supporting documentation—e.g., additional design documents, links to mailing list discussions/SIG meetings, relevant PRs/issues, release notes
59
+
-[x] (R) Production readiness review completed
60
+
-[x] (R) Production readiness review approved
61
+
-[x] "Implementation History" section is up-to-date for milestone
62
+
-[x] User-facing documentation has been created in [kubernetes/website], for publication to [kubernetes.io]
63
+
-[x] Supporting documentation—e.g., additional design documents, links to mailing list discussions/SIG meetings, relevant PRs/issues, release notes
In order to make sure that users are setting valid stop signals for the nodes the pods are being scheduled to, we cross validate the `ContainerSpec.Lifecycle.StopSignal` with `spec.os.name`. Here are the details of this validation:
147
+
- We require `spec.os.name` to be set to a valid value (`linux` or `windows`) to use `ContainerSpec.Lifecycle.StopSignal`.
148
+
- We have a list of valid stop signals for both linux and windows nodes (as shown below). If the Pod OS is set to `linux`, only the signals supported for `linux` would be allowed.
149
+
- Similarly for Pods with OS set to `windows`, we only allow SIGTERM and SIGKILL as valid stop signals.
150
+
151
+
The full list of valid signals for the two platforms are as follows:
var supportedStopSignalsWindows = sets.New(core.SIGKILL, core.SIGTERM)
176
+
```
177
+
178
+
You can find the validation logic implemented in [this commit](https://github.com/kubernetes/kubernetes/pull/130556/commits/0380f2c41cdc4df992294603f7844709072628b1#diff-c713e8919642d873fdf48fe8fb6d43e5cb2f53fd601066ff53580ea655948f0d).
179
+
143
180
### CRI API
144
181
145
182
The CRI API would be updated so the stop signal in the container spec (if it is not nil or unset) is sent to the container runtime via ContainerConfig. This would be passed down to the container runtime's StopContainer method ultimately:
@@ -149,12 +186,16 @@ The CRI API would be updated so the stop signal in the container spec (if it is
149
186
// container.
150
187
message ContainerConfig {
151
188
// ...
152
-
+ Lifecycle lifecycle = 18;
189
+
+ Signal stop_signal = 18;
153
190
}
154
191
155
-
+message Lifecycle {
156
-
+ string stop_signal = 1;
157
-
+}
192
+
+ enum Signal {
193
+
+ RUNTIME_DEFAULT = 0;
194
+
+ SIGABRT = 1;
195
+
+ SIGALRM = 2;
196
+
+ ...
197
+
+ SIGRTMAX = 65;
198
+
+ }
158
199
```
159
200
160
201
We can pass the container's stop signal to the container runtime with this new field to ContainerConfig.
Since the new stop lifecycle is optional, the default stop signal for a container can be unset or nil. In this case, the container runtime will fallback to the existing behaviour.
195
239
196
-
Additionally, the stop signal would also be added to `ContainerStatus` (as `containerStatus[].Lifecycle.StopSignal`) so that we can pass the stop signal extracted from the image/container runtime back to the container status at the Kubernetes API level.
240
+
Additionally, the stop signal would also be added to `ContainerStatus` (as `containerStatus[].StopSignal`) so that we can pass the stop signal extracted from the image/container runtime back to the container status at the Kubernetes API level.
197
241
198
242
### Container runtime changes
199
243
@@ -205,7 +249,7 @@ Once the stop signal from `containerSpec.Lifecycle.StopSignal` is passed down to
205
249
func (c *criService) StopContainer(ctx context.Context, r *runtime.StopContainerRequest) (*runtime.StopContainerResponse, error) {
@@ -233,7 +277,7 @@ Find the entire diff for containerd which was done for the POC [here](https://gi
233
277
234
278
Currently using the hcsshim is the only way to run containers on Windows nodes. hcsshim [supports SIGTERM and SIGKILL and a few Windows specific CTRL events](https://github.com/microsoft/hcsshim/blob/e5c83a121b980b1b85f4df0813cfba2d83572bac/internal/signals/signal.go#L74-L126). After discussing with SIG Windows, we came to the decision that for Windows Pods we'll only support SIGTERM and SIGKILL as the valid stop signals. The behaviour of how kubelet handles stop signals is not different for Linux and Windows environments and the CRI API works in both cases.
235
279
236
-
We will have additional validation for Windows Pods to restrict the set of valid stop signals to SIGTERM and SIGKILL. There will be an admission check that validates that the stop signal is only set to either SIGTERM or SIGKILL if spec.Os.Name == windows.
280
+
We will have additional validation for Windows Pods to restrict the set of valid stop signals to SIGTERM and SIGKILL. There will be an admission check that validates that the stop signal is only set to either SIGTERM or SIGKILL if `spec.os.name` == windows. This OS specific cross validation is further described in [Cross validation with Pod spec.os.name](#cross-validation-with-pod-specosname).
237
281
238
282
### User Stories (Optional)
239
283
@@ -491,6 +535,10 @@ Disable the ContainerStopSignal feature gate, and restart the kube-apiserver and
491
535
492
536
## Implementation History
493
537
538
+
- 2025-02-13: Alpha [KEP PR](https://github.com/kubernetes/enhancements/pull/5122) approved and merged for v1.33
539
+
- 2025-03-25: Alpha [code changes to k/k](https://github.com/kubernetes/kubernetes/pull/130556) merged with API changes, validation and CRI API implementation
One of the drawbacks of introducing stop signal to the container spec is that this introduces the scope of users misconfiguring the stop signal leading to unexpected behaviour such as the hanging pods as mentioned in the [Risks and Mitigations](#risks-and-mitigations) section.
0 commit comments