1
1
# KEP-2413: Enable seccomp by default
2
2
3
3
<!-- toc -->
4
-
5
4
- [ Release Signoff Checklist] ( #release-signoff-checklist )
6
5
- [ Summary] ( #summary )
7
6
- [ Motivation] ( #motivation )
12
11
- [ Risks and Mitigations] ( #risks-and-mitigations )
13
12
- [ Design Details] ( #design-details )
14
13
- [ Test Plan] ( #test-plan )
14
+ - [ Prerequisite testing updates] ( #prerequisite-testing-updates )
15
+ - [ Unit tests] ( #unit-tests )
16
+ - [ Integration tests] ( #integration-tests )
17
+ - [ e2e tests] ( #e2e-tests )
15
18
- [ Graduation Criteria] ( #graduation-criteria )
16
19
- [ Alpha] ( #alpha )
17
20
- [ Alpha to Beta Graduation] ( #alpha-to-beta-graduation )
29
32
- [ Alternatives] ( #alternatives )
30
33
- [ Alternative 1: Define a new <code >KubernetesDefault</code > profile] ( #alternative-1-define-a-new--profile )
31
34
- [ Alternative 2: Allow admins to pick one of <code >KubernetesDefault</code >, <code >RuntimeDefault</code > or a custom profile] ( #alternative-2-allow-admins-to-pick-one-of---or-a-custom-profile )
32
- <!-- /toc -->
35
+ <!-- /toc -->
33
36
34
37
## Release Signoff Checklist
35
38
36
39
Items marked with (R) are required _ prior to targeting to a milestone / release_ .
37
40
38
41
- [x] (R) Enhancement issue in release milestone, which links to KEP dir in [ kubernetes/enhancements] (not the initial KEP PR)
39
- - [ ] (R) KEP approvers have approved the KEP status as ` implementable `
42
+ - [x ] (R) KEP approvers have approved the KEP status as ` implementable `
40
43
- [x] (R) Design details are appropriately documented
41
44
- [x] (R) Test plan is in place, giving consideration to SIG Architecture and SIG Testing input
42
45
- [x] (R) Graduation criteria is in place
43
- - [ ] (R) Production readiness review completed
44
- - [ ] (R) Production readiness review approved
45
- - [ ] "Implementation History" section is up-to-date for milestone
46
- - [ ] User-facing documentation has been created in [ kubernetes/website] , for publication to [ kubernetes.io]
47
- - [ ] Supporting documentation—e.g., additional design documents, links to mailing list discussions/SIG meetings, relevant PRs/issues, release notes
46
+ - [x ] (R) Production readiness review completed
47
+ - [x ] (R) Production readiness review approved
48
+ - [x ] "Implementation History" section is up-to-date for milestone
49
+ - [x ] User-facing documentation has been created in [ kubernetes/website] , for publication to [ kubernetes.io]
50
+ - [x ] Supporting documentation—e.g., additional design documents, links to mailing list discussions/SIG meetings, relevant PRs/issues, release notes
48
51
49
52
[ kubernetes.io ] : https://kubernetes.io/
50
53
[ kubernetes/enhancements ] : https://git.k8s.io/enhancements
@@ -136,21 +139,49 @@ section](https://kubernetes.io/docs/tutorials/clusters/seccomp).
136
139
137
140
### Test Plan
138
141
142
+ [ x] I/we understand the owners of the involved components may require updates to
143
+ existing tests to make this code solid enough prior to committing the changes necessary
144
+ to implement this enhancement.
145
+
146
+ ##### Prerequisite testing updates
147
+
148
+ There are no prerequisites required.
149
+
150
+ ##### Unit tests
151
+
139
152
There will be unit tests for the feature, whereas the existing seccomp tests can
140
153
be extended to cover the new behavior if enabled.
141
154
155
+ - ` pkg/kubelet/kuberuntime ` : ` 2022-06-15 ` - ` 66.3% `
156
+
157
+ ##### Integration tests
158
+
159
+ No integration tests have been added for the alpha implementation because the
160
+ feature is off by default.
161
+
162
+ For the beta graduation we will defer this section to the e2e tests.
163
+
164
+ ##### e2e tests
165
+
166
+ No e2e tests have been added for the alpha implementation because the feature is
167
+ off by default.
168
+
169
+ For the beta graduation, we will add a serial e2e test which covers the kubelet
170
+ configuration.
171
+
142
172
### Graduation Criteria
143
173
144
174
#### Alpha
145
175
146
- - [ ] Implement the new feature gate and kubelet configuration
147
- - [ ] Ensure proper tests are in place
148
- - [ ] Update documentation to make the feature visible
176
+ - [x ] Implement the new feature gate and kubelet configuration
177
+ - [x ] Ensure proper tests are in place
178
+ - [x ] Update documentation to make the feature visible
149
179
150
180
#### Alpha to Beta Graduation
151
181
152
- - [ ] Enable the feature per default
153
- - [ ] No major bugs reported in the previous cycle
182
+ - [x] Enable the feature gate per default
183
+ (the kubelet configuration value still default to ` false ` )
184
+ - [x] No major bugs reported in the previous cycle
154
185
155
186
#### Beta to GA Graduation
156
187
@@ -171,19 +202,19 @@ risks and mitigations are available for each one.
171
202
that the application code does not trigger syscalls blocked by the
172
203
` RuntimeDefault ` profile (for [ CRI-O] [ default-crio ] or
173
204
[ containerd] [ default-containerd ] ). This can be done by:
174
- - * Recommended * : Analyzing the code for any executed syscalls which may be
205
+ - _ Recommended _ : Analyzing the code for any executed syscalls which may be
175
206
blocked by the default profiles. If that's the case, either craft a custom
176
207
seccomp profile based on the default or change the application deployment
177
208
to ` Unconfined ` .
178
- - * Recommended * : Run the application against an e2e test suite to trigger
209
+ - _ Recommended _ : Run the application against an e2e test suite to trigger
179
210
relevant code paths. Monitor the application hosts audit logs (via auditd
180
211
or ` /var/log/audit/audit.log ` ) for blocking syscalls via ` type=SECCOMP ` . If
181
212
that's the case, use the same mitigation as mentioned above.
182
- - * Optional * : Create a custom seccomp profile based on the default and change
213
+ - _ Optional _ : Create a custom seccomp profile based on the default and change
183
214
their default action from ` SCMP_ACT_ERRNO ` to ` SCMP_ACT_LOG ` . This means
184
215
that the seccomp filter will have no effect on the application at all, but
185
216
the audit logs will now indicate which syscalls may be blocked.
186
- - * Optional * : Use cluster additions like the [ Security Profiles
217
+ - _ Optional _ : Use cluster additions like the [ Security Profiles
187
218
Operator] [ spo ] for profiling the application via its log enrichment feature
188
219
or recording a profile by using its recording feature.
189
220
3 . ** Deploying the modified application** :
@@ -280,53 +311,50 @@ _This section must be completed when targeting beta graduation to a release._
280
311
_ This section must be completed when targeting beta graduation to a release._
281
312
282
313
- ** How can an operator determine if the feature is in use by workloads?**
283
- Ideally, this should be a metric. Operations against the Kubernetes API (e.g.,
284
- checking if there are objects with field X set) may be a last resort. Avoid
285
- logs or events for this purpose.
314
+
315
+ Operators have to check the kubelet config value for the node where the
316
+ workload runs on. They can also run ` crictl inspect ` to examine the used OCI
317
+ runtime spec and find out which profile is in use.
286
318
287
319
- ** What are the SLIs (Service Level Indicators) an operator can use to determine
288
320
the health of the service?**
289
321
290
- - [ ] Metrics
291
- - Metric name:
292
- - [ Optional] Aggregation method:
293
- - Components exposing the metric:
294
- - [ ] Other (treat as last resort)
295
- - Details:
322
+ - A workload is exiting unexpectedly after the feature has been enabled.
323
+
324
+ - The termination reason is a "permission denied" error.
325
+ - The termination is reproducible.
326
+ - Replacing ` SCMP_ACT_ERRNO ` to ` SCMP_ACT_LOG ` in the default profile will
327
+ show seccomp error messages in auditd or syslog.
328
+ - There are no other reasons for container termination (like eviction or
329
+ exhausting resources)
330
+
331
+ - A workload is not behaving completely functional, for example some features
332
+ are misbehaving but the appliction does not exit.
333
+
334
+ - There are permission denied errors in the workload logs.
335
+ - The behavior is reproducible.
336
+ - Replacing ` SCMP_ACT_ERRNO ` to ` SCMP_ACT_LOG ` in the default profile will
337
+ show seccomp error messages in auditd or syslog.
296
338
297
339
- ** What are the reasonable SLOs (Service Level Objectives) for the above SLIs?**
298
- At a high level, this usually will be in the form of "high percentile of SLI
299
- per day <= X". It's impossible to provide comprehensive guidance, but at the very
300
- high level (needs more precise definitions) those may be things like:
301
340
302
- - per-day percentage of API calls finishing with 5XX errors <= 1%
303
- - 99% percentile over day of absolute value from (job creation time minus expected
304
- job creation time) for cron job <= 10%
305
- - 99,9% of /health requests per day finish with 200 code
341
+ The workload availability, functionality and health is exactly the same with
342
+ the feature enabled. This can be done by tracking the
343
+ ` kube_pod_container_status_restarts_total ` in
344
+ [ kube-state-metrics ] ( https://github.com/kubernetes/kube-state-metrics/blob/379b60abd97be5914c0b4e292b14e75c5d3cf694/docs/pod-metrics.md#pod-metrics ) .
306
345
307
346
- ** Are there any missing metrics that would be useful to have to improve observability
308
347
of this feature?**
309
- Describe the metrics themselves and the reasons why they weren't added (e.g., cost,
310
- implementation difficulties, etc.).
348
+
349
+ None
311
350
312
351
### Dependencies
313
352
314
353
_ This section must be completed when targeting beta graduation to a release._
315
354
316
355
- ** Does this feature depend on any specific services running in the cluster?**
317
- Think about both cluster-level services (e.g. metrics-server) as well
318
- as node-level agents (e.g. specific version of CRI). Focus on external or
319
- optional services that are needed. For example, if this feature depends on
320
- a cloud provider API, or upon an external software-defined storage or network
321
- control plane.
322
-
323
- For each of these, fill in the following—thinking about running existing user workloads
324
- and creating new ones, as well as about cluster-level services (e.g. DNS):
325
356
326
- - [ Dependency name]
327
- - Usage description:
328
- - Impact of its outage on the feature:
329
- - Impact of its degraded performance or high-error rates on the feature:
357
+ None
330
358
331
359
### Scalability
332
360
@@ -378,26 +406,22 @@ _This section must be completed when targeting beta graduation to a release._
378
406
379
407
- ** How does this feature react if the API server and/or etcd is unavailable?**
380
408
409
+ It will still work as intended since it's a kubelet internal feature.
410
+
381
411
- ** What are other known failure modes?**
382
- For each of them, fill in the following information by copying the below template:
383
-
384
- - [ Failure mode brief description]
385
- - Detection: How can it be detected via metrics? Stated another way:
386
- how can an operator troubleshoot without logging into a master or worker node?
387
- - Mitigations: What can be done to stop the bleeding, especially for already
388
- running user workloads?
389
- - Diagnostics: What are the useful log messages and their required logging
390
- levels that could help debug the issue?
391
- Not required until feature graduated to beta.
392
- - Testing: Are there any tests for failure mode? If not, describe why.
412
+
413
+ None
393
414
394
415
- ** What steps should be taken if SLOs are not being met to determine the problem?**
395
416
417
+ None
418
+
396
419
[ supported limits ] : https://git.k8s.io/community//sig-scalability/configs-and-limits/thresholds.md
397
420
[ existing slis/slos ] : https://git.k8s.io/community/sig-scalability/slos/slos.md#kubernetes-slisslos
398
421
399
422
## Implementation History
400
423
424
+ - 2022-03-15: Updated KEP to beta
401
425
- 2021-05-05: KEP promoted to implementable
402
426
403
427
## Alternatives
0 commit comments