You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: keps/sig-node/4960-container-stop-signals/README.md
+25-41Lines changed: 25 additions & 41 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -68,7 +68,7 @@ Items marked with (R) are required *prior to targeting to a milestone / release*
68
68
69
69
## Summary
70
70
71
-
Container runtimes let you define a [STOPSIGNAL](https://docs.docker.com/reference/dockerfile/#stopsignal) to let your container images change which signal is delivered to kill the container. Currently you can only configure this by defining STOPSIGNAL in the container image definition file before you build the image. This becomes difficult to change when you’re using prebuilt images. This KEP proposes to add support to configure custom stop signals for containers from the ContainerSpec. Kubernetes has no equivalent for STOPSIGNAL as part of Pod or Container APIs. This KEP proposes to add support to configure custom stop signals for containers from the ContainerSpec.
71
+
Container runtimes let you define a [STOPSIGNAL](https://docs.docker.com/reference/dockerfile/#stopsignal) to let your container images change which signal is delivered to kill the container. Currently you can only configure this by defining STOPSIGNAL in the container image definition file before you build the image. This becomes difficult to change when you’re using prebuilt images. Kubernetes has no equivalent for STOPSIGNAL as part of Pod or Container APIs. This KEP proposes to add support to configure custom stop signals for containers from the ContainerSpec.
72
72
73
73
## Motivation
74
74
@@ -80,9 +80,9 @@ Having stop signal as a first class citizen in the Pod's container specification
80
80
81
81
### Goals
82
82
83
-
- Add a new Stop lifecycle handler to container lifecycle which can be configured with a Signal option
83
+
- Add a new Stop lifecycle handler to container lifecycle which can be configured with a Signal option, which takes a string value
84
84
- Update the CRI API to pass down the stop signal to the container runtime via ContainerConfig
85
-
- Update the implementation of the StopContainer method in container runtimes to use the container’s stop signal (if defined) to kill containers
85
+
- Update the implementation of the StopContainer method in container runtimes to use the container’s stop signal defined in the container spec (if present) to kill containers
86
86
- Add support to show the effective stop signal of containers in the container status field in the pod status
87
87
88
88
### Non-Goals
@@ -93,22 +93,16 @@ Having stop signal as a first class citizen in the Pod's container specification
93
93
94
94
### API
95
95
96
-
A new Stop lifecycle handler will be added to container lifecycle. The Stop lifecycle event can be configured with a Signal option, which is of type `Signal`. This new `Signal` type can take a string value, and can be used to define a stop signal for containers when creating Pods. `Signal` will hold string values which can be mapped to Go's syscall.Signal. For example, [list of signals supported in Linux environments by moby](https://github.com/containerd/containerd/blob/main/vendor/github.com/moby/sys/signal/signal_linux.go). If the user doesn't define a particular stop signal, the behaviour would default to what it is today and fallback to the stop signal defined in the container image or use the default stop signal of the container runtime (SIGTERM in case of containerd, CRI-O).
96
+
A new StopSignal lifecycle hook will be added to container lifecycle. The StopSignal lifecycle hook can be configured with a signal, which is of type `Signal`. This new `Signal` type can take a string value, and can be used to define a stop signal for containers when creating Pods. `Signal` will hold string values which can be mapped to Go's syscall.Signal. For example, see the[list of signals supported in Linux environments by moby](https://github.com/containerd/containerd/blob/main/vendor/github.com/moby/sys/signal/signal_linux.go). If the user doesn't define a particular stop signal, the behaviour would default to what it is today and fallback to the stop signal defined in the container image or use the default stop signal of the container runtime (SIGTERM in case of containerd, CRI-O).
97
97
98
98
```go
99
99
// pkg/apis/core/types.go
100
100
typeSignalstring//parseable into Go's syscall.Signal
101
101
102
-
typeLifecycleHandlerstruct {
103
-
// ...
104
-
// +optional
105
-
Signal *Signal
106
-
}
107
-
108
102
typeLifecyclestruct {
109
103
// ...
110
-
// +optional
111
-
Stop *LifecycleHandler
104
+
// +optional
105
+
StopSignal *Signal
112
106
}
113
107
```
114
108
@@ -124,8 +118,7 @@ spec:
124
118
- name: nginx
125
119
image: nginx:1.14.2
126
120
lifecycle:
127
-
stop:
128
-
signal: SIGUSR1
121
+
stopSignal: SIGUSR1
129
122
```
130
123
131
124
The stop signal would also be shown in the containers' status. The value of the stop signal shown in the status can be from the spec, if a stop cycle is defined in the spec, else it will be the effective stop signal which is used by the container runtime to kill your container. This can either be read from the container image or will be the default stop signal of the container runtime. Users will be able to see a container's stop signal in its status even if they're not using a custom stop signal from the spec.
Since the new stop lifecycle is optional, the default stop signal for a container can be unset or nil. In this case, the container runtime will fallback to the existing behaviour.
209
195
210
-
Additionally, the stop signal would also be added to `ContainerStatus` (as `containerStatus[].Lifecycle.Stop.Signal`) so that we can pass the stop signal extracted from the image/container runtime back to the container status at the Kubernetes API level.
196
+
Additionally, the stop signal would also be added to `ContainerStatus` (as `containerStatus[].Lifecycle.StopSignal`) so that we can pass the stop signal extracted from the image/container runtime back to the container status at the Kubernetes API level.
211
197
212
198
### Container runtime changes
213
199
214
-
Once the stop signal from `containerSpec.Lifecycle.Stop.Signal` is passed down to the container runtime via `ContainerConfig` during creation/updation of the container, we can use the value of the stop signal from the container runtime's implementation of `stopContainer` method. In the case of containerd, it would look like this:
200
+
Once the stop signal from `containerSpec.Lifecycle.StopSignal` is passed down to the container runtime via `ContainerConfig` during creation/updation of the container, we can use the value of the stop signal from the container runtime's implementation of `stopContainer` method. In the case of containerd, it would look like this:
215
201
216
202
```diff
217
203
//internal/cri/server/container_stop.go
218
204
219
205
func (c *criService) StopContainer(ctx context.Context, r *runtime.StopContainerRequest) (*runtime.StopContainerResponse, error) {
The signal that we get from `ContainerConfig` can be validated with [ParseSignal](https://github.com/containerd/containerd/blob/main/vendor/github.com/moby/sys/signal/signal.go#L38) to further validate that we've received a valid stop signal. Also `container.StopSignal` is reading the stop signal from the image. We can add another condition before that to use the stop signal defined in spec if there is one. If nothing is defined in the spec ("" or unset), containerd behaves like how it is today. Also note that `SIGTERM` is hardcoded in containerd's stopContainer method as the default stop signal to fallback to, in case the image doesn't defined a stop signal. Similar logic in also present in CRI-O [here](https://github.com/cri-o/cri-o/blob/main/internal/oci/container.go#L259-L272).
228
+
The signal that we get from `ContainerConfig` can be validated with [ParseSignal](https://github.com/containerd/containerd/blob/main/vendor/github.com/moby/sys/signal/signal.go#L38) to further validate that we've received a valid stop signal. Also here`container.StopSignal` is reading the stop signal from the image. We can add another condition before that to use the stop signal defined in spec if there is one. If nothing is defined in the spec ("" or unset), containerd behaves like how it is today. Also note that `SIGTERM` is hardcoded in containerd's stopContainer method as the default stop signal to fallback to, in case the image doesn't defined a stop signal. Similar logic in also present in CRI-O [here](https://github.com/cri-o/cri-o/blob/main/internal/oci/container.go#L259-L272).
243
229
244
-
Find the entire diff for containerd [here](https://github.com/containerd/containerd/compare/main...sreeram-venkitesh:containerd:added-custom-stop-signal?expand=1).
230
+
Find the entire diff for containerd which was done for the POC [here](https://github.com/containerd/containerd/compare/main...sreeram-venkitesh:containerd:added-custom-stop-signal?expand=1).
245
231
246
232
### User Stories (Optional)
247
233
@@ -256,8 +242,8 @@ Kubernetes by default sends a SIGTERM to all containers while killing them. When
256
242
## Design Details
257
243
258
244
On top of the details described in the [Proposal](#proposal), these are some details on how exactly the new field will work.
259
-
-`ContainerSpec.Lifecycle.Stop.Signal` is totally optional and can be a nil value. In this case, the stop signal defined in the container image or the container runtime's default stop signal (SIGTERM for containerd and CRI-O) would be used.
260
-
- If set, `ContainerSpec.Lifecycle.Stop.Signal` will override the stop signal set from the container image definition.
245
+
-`ContainerSpec.Lifecycle.StopSignal` is totally optional and can be a nil value. In this case, the stop signal defined in the container image or the container runtime's default stop signal (SIGTERM for containerd and CRI-O) would be used.
246
+
- If set, `ContainerSpec.Lifecycle.StopSignal` will override the stop signal set from the container image definition.
261
247
- The order of priority for the different stop signals would look like this
262
248
`Stop signal from Container Spec > STOPSIGNAL from container image > Default stop signal of container runtime`
263
249
@@ -272,14 +258,12 @@ to implement this enhancement.
272
258
##### Unit tests
273
259
274
260
Alpha:
275
-
- Test that the validation fails when given a non string value for container lifecycle stop hook's signal field
261
+
- Test that the validation fails when given a non string value for container lifecycle StopSignal hook
276
262
- Test that the validation passes when given a proper string value representing a standard stop signal
277
263
- Test that the validation fails when we configure a custom stop signal with the feature gate disabled
278
264
- Test that the validation returns the appropriate error message when an invalid string value is given for the stop signal
279
265
- Tests for verifying behavior when feature gate is disabled after being used to create Pods where the stop signal field is used
280
266
- Tests for verifying behavior when feature gate is reenabled after being disabled after creating Pods with stop signal
281
-
282
-
##### Integration tests
283
267
284
268
##### e2e tests
285
269
@@ -318,27 +302,27 @@ Alpha:
318
302
319
303
#### Upgrade
320
304
321
-
When upgrading to a new Kubernetes version which supports Container Stop Signals, users can enable the feature gate and start using the feature. If the user is running an older version of the container runtime, the feature will be gracefully degraded as mentioned [here](https://www.kubernetes.dev/docs/code/cri-api-version-skew-policy/#version-skew-policy-for-cri-api) in the CRI API version skew doc. In this case the user will be able to set a Stop lifecycle hook in the Container spec, but the kubelet will not pass this value to the container runtime when calling the `runtimeService.stopContainer` method.
305
+
When upgrading to a new Kubernetes version which supports Container Stop Signals, users can enable the feature gate and start using the feature. If the user is running an older version of the container runtime, the feature will be gracefully degraded as mentioned [here](https://www.kubernetes.dev/docs/code/cri-api-version-skew-policy/#version-skew-policy-for-cri-api) in the CRI API version skew doc. In this case the user will be able to set a StopSignal lifecycle hook in the Container spec, but the kubelet will not pass this value to the container runtime when calling the `runtimeService.stopContainer` method. The container status would also not have stop signal since the container runtime is not updated to return the effective stop signal extracted from the image.
322
306
323
307
#### Downgrade
324
308
325
-
If the kube-apiserver or the kubelet's version is downgraded, you will no longer be able to create or update container specs to include the Stop lifecycle hook. Existing containers which have the field set would not be cleared. If you're running a version of the kubelet which doesn't support ContainerStopSignals, the CRI API wouldn't pass the stop signal to the runtime as part of ContainerConfig. Even if the runtime is on a newer version supporting stop signal, it would handle this and default to the stop signal defined in the image or to SIGTERM.
309
+
If the kube-apiserver or the kubelet's version is downgraded, you will no longer be able to create or update container specs to include the StopSignal lifecycle hook. Existing containers which have the field set would not be cleared. If you're running a version of the kubelet which doesn't support ContainerStopSignals, the CRI API wouldn't pass the stop signal to the runtime as part of ContainerConfig. Even if the container runtime is on a newer version supporting stop signal, it would handle this and default to the stop signal defined in the image or to SIGTERM.
326
310
327
311
### Version Skew Strategy
328
312
329
313
Both kubelet and kube-apiserver will need to enable the feature gate for the full featureset to be present and working. If both components disable the feature gate, this feature will be completely unavailable.
330
314
331
-
If only the kube-apiserver enables this feature, validation will pass, but kubelet won't understand the new lifecycle hook and will ignore it when creating the ContainerConfig.
315
+
If only the kube-apiserver enables this feature, validation will pass, but kubelet won't understand the new lifecycle hook and will not add the stop signal when creating the ContainerConfig.
332
316
333
-
If only the kubelet has enabled this feature, you won't be able to create a Pod which has a Stop lifecycle hook via the apiserver and hence the feature won't be usable even if the kubelet supports it. `containerSpec.Lifecycle.Stop.Signal` can be an empty value and kubelet functions as if no custom stop signal has been set for any container.
317
+
If only the kubelet has enabled this feature, you won't be able to create a Pod which has a StopSignal lifecycle hook via the apiserver and hence the feature won't be usable even if the kubelet supports it. `containerSpec.Lifecycle.StopSignal` can be an empty value and kubelet functions as if no custom stop signal has been set for any container.
334
318
335
319
#### Version skew with CRI API and container runtime
336
320
337
321
As described above in the upgrade/downgrade strategies,
338
322
339
323
-**If the container runtime is in an older version than kubelet**, the feature will be gracefully degraded. In this case the user will be able to set the stop signal in the Container spec, but the kubelet will not pass this value to the container runtime via ContainerConfig and the container runtime will use the stop signal defined in the image or use the default SIGTERM.
340
324
341
-
-**If you're running an older version of the kubelet with a newer version of the container runtime**, the CRI API call from the kubelet would be made with the older version of ContainerConfig which doesn't include the stop signal. The container runtime code, even if it is running the newer version supporting stop signal, would handle this and use the stop signal defined in the container image or default to SIGTERM.
325
+
-**If you're running an older version of the kubelet with a newer version of the container runtime**, the CRI API call from the kubelet would be made with the older version of ContainerConfig which doesn't include the stop signal. The container runtime doesn't receive any custom stop signal from the container spec in this case. The container runtime code, even if it is running the newer version supporting stop signal, would fall back to the current behaviour and use the stop signal defined in the container image or default to SIGTERM since it doesn't receive any stop signal from ContainerSpec.
342
326
343
327
## Production Readiness Review Questionnaire
344
328
@@ -366,7 +350,7 @@ Yes, the feature gate can be turned off to disable the feature once it has been
366
350
367
351
###### What happens if we reenable the feature if it was previously rolled back?
368
352
369
-
If you reenable the feature, you'll be able to create Pods with Stop lifecycle hooks for their containers. Without the feature gate enabled, this would make your workloads invalid.
353
+
If you reenable the feature, you'll be able to create Pods with StopSignal lifecycle hooks for their containers. Without the feature gate enabled, this would make your workloads invalid.
370
354
371
355
###### Are there any tests for feature enablement/disablement?
372
356
@@ -445,7 +429,7 @@ No.
445
429
446
430
###### Will enabling / using this feature result in increasing size or count of the existing API objects?
447
431
448
-
We are adding a new lifecycle hook called Stop, and a new lifecycle handler called Signal which can be used in the Container spec. These are optional values however and can increase the size of the API object.
432
+
We are adding a new lifecycle hook called StopSignal, which takes a string value. These are optional values however and can increase the size of the API object.
449
433
450
434
###### Will enabling / using this feature result in increasing time taken by any operations covered by existing SLIs/SLOs?
0 commit comments