Skip to content

Commit b324c1f

Browse files
authored
Merge pull request kubernetes#3240 from saschagrunert/seccomp-default-beta
KEP-2413: Graduate SeccompDefault feature to beta
2 parents 668d00b + 9a124fd commit b324c1f

File tree

3 files changed

+88
-62
lines changed

3 files changed

+88
-62
lines changed
Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,5 @@
11
kep-number: 2413
22
alpha:
33
approver: "@deads2k"
4+
beta:
5+
approver: "@deads2k"

keps/sig-node/2413-seccomp-by-default/README.md

Lines changed: 82 additions & 58 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,6 @@
11
# KEP-2413: Enable seccomp by default
22

33
<!-- toc -->
4-
54
- [Release Signoff Checklist](#release-signoff-checklist)
65
- [Summary](#summary)
76
- [Motivation](#motivation)
@@ -12,6 +11,10 @@
1211
- [Risks and Mitigations](#risks-and-mitigations)
1312
- [Design Details](#design-details)
1413
- [Test Plan](#test-plan)
14+
- [Prerequisite testing updates](#prerequisite-testing-updates)
15+
- [Unit tests](#unit-tests)
16+
- [Integration tests](#integration-tests)
17+
- [e2e tests](#e2e-tests)
1518
- [Graduation Criteria](#graduation-criteria)
1619
- [Alpha](#alpha)
1720
- [Alpha to Beta Graduation](#alpha-to-beta-graduation)
@@ -29,22 +32,22 @@
2932
- [Alternatives](#alternatives)
3033
- [Alternative 1: Define a new <code>KubernetesDefault</code> profile](#alternative-1-define-a-new--profile)
3134
- [Alternative 2: Allow admins to pick one of <code>KubernetesDefault</code>, <code>RuntimeDefault</code> or a custom profile](#alternative-2-allow-admins-to-pick-one-of---or-a-custom-profile)
32-
<!-- /toc -->
35+
<!-- /toc -->
3336

3437
## Release Signoff Checklist
3538

3639
Items marked with (R) are required _prior to targeting to a milestone / release_.
3740

3841
- [x] (R) Enhancement issue in release milestone, which links to KEP dir in [kubernetes/enhancements] (not the initial KEP PR)
39-
- [ ] (R) KEP approvers have approved the KEP status as `implementable`
42+
- [x] (R) KEP approvers have approved the KEP status as `implementable`
4043
- [x] (R) Design details are appropriately documented
4144
- [x] (R) Test plan is in place, giving consideration to SIG Architecture and SIG Testing input
4245
- [x] (R) Graduation criteria is in place
43-
- [ ] (R) Production readiness review completed
44-
- [ ] (R) Production readiness review approved
45-
- [ ] "Implementation History" section is up-to-date for milestone
46-
- [ ] User-facing documentation has been created in [kubernetes/website], for publication to [kubernetes.io]
47-
- [ ] Supporting documentation—e.g., additional design documents, links to mailing list discussions/SIG meetings, relevant PRs/issues, release notes
46+
- [x] (R) Production readiness review completed
47+
- [x] (R) Production readiness review approved
48+
- [x] "Implementation History" section is up-to-date for milestone
49+
- [x] User-facing documentation has been created in [kubernetes/website], for publication to [kubernetes.io]
50+
- [x] Supporting documentation—e.g., additional design documents, links to mailing list discussions/SIG meetings, relevant PRs/issues, release notes
4851

4952
[kubernetes.io]: https://kubernetes.io/
5053
[kubernetes/enhancements]: https://git.k8s.io/enhancements
@@ -136,21 +139,49 @@ section](https://kubernetes.io/docs/tutorials/clusters/seccomp).
136139

137140
### Test Plan
138141

142+
[x] I/we understand the owners of the involved components may require updates to
143+
existing tests to make this code solid enough prior to committing the changes necessary
144+
to implement this enhancement.
145+
146+
##### Prerequisite testing updates
147+
148+
There are no prerequisites required.
149+
150+
##### Unit tests
151+
139152
There will be unit tests for the feature, whereas the existing seccomp tests can
140153
be extended to cover the new behavior if enabled.
141154

155+
- `pkg/kubelet/kuberuntime`: `2022-06-15` - `66.3%`
156+
157+
##### Integration tests
158+
159+
No integration tests have been added for the alpha implementation because the
160+
feature is off by default.
161+
162+
For the beta graduation we will defer this section to the e2e tests.
163+
164+
##### e2e tests
165+
166+
No e2e tests have been added for the alpha implementation because the feature is
167+
off by default.
168+
169+
For the beta graduation, we will add a serial e2e test which covers the kubelet
170+
configuration.
171+
142172
### Graduation Criteria
143173

144174
#### Alpha
145175

146-
- [ ] Implement the new feature gate and kubelet configuration
147-
- [ ] Ensure proper tests are in place
148-
- [ ] Update documentation to make the feature visible
176+
- [x] Implement the new feature gate and kubelet configuration
177+
- [x] Ensure proper tests are in place
178+
- [x] Update documentation to make the feature visible
149179

150180
#### Alpha to Beta Graduation
151181

152-
- [ ] Enable the feature per default
153-
- [ ] No major bugs reported in the previous cycle
182+
- [x] Enable the feature gate per default
183+
(the kubelet configuration value still default to `false`)
184+
- [x] No major bugs reported in the previous cycle
154185

155186
#### Beta to GA Graduation
156187

@@ -171,19 +202,19 @@ risks and mitigations are available for each one.
171202
that the application code does not trigger syscalls blocked by the
172203
`RuntimeDefault` profile (for [CRI-O][default-crio] or
173204
[containerd][default-containerd]). This can be done by:
174-
- *Recommended*: Analyzing the code for any executed syscalls which may be
205+
- _Recommended_: Analyzing the code for any executed syscalls which may be
175206
blocked by the default profiles. If that's the case, either craft a custom
176207
seccomp profile based on the default or change the application deployment
177208
to `Unconfined`.
178-
- *Recommended*: Run the application against an e2e test suite to trigger
209+
- _Recommended_: Run the application against an e2e test suite to trigger
179210
relevant code paths. Monitor the application hosts audit logs (via auditd
180211
or `/var/log/audit/audit.log`) for blocking syscalls via `type=SECCOMP`. If
181212
that's the case, use the same mitigation as mentioned above.
182-
- *Optional*: Create a custom seccomp profile based on the default and change
213+
- _Optional_: Create a custom seccomp profile based on the default and change
183214
their default action from `SCMP_ACT_ERRNO` to `SCMP_ACT_LOG`. This means
184215
that the seccomp filter will have no effect on the application at all, but
185216
the audit logs will now indicate which syscalls may be blocked.
186-
- *Optional*: Use cluster additions like the [Security Profiles
217+
- _Optional_: Use cluster additions like the [Security Profiles
187218
Operator][spo] for profiling the application via its log enrichment feature
188219
or recording a profile by using its recording feature.
189220
3. **Deploying the modified application**:
@@ -280,53 +311,50 @@ _This section must be completed when targeting beta graduation to a release._
280311
_This section must be completed when targeting beta graduation to a release._
281312

282313
- **How can an operator determine if the feature is in use by workloads?**
283-
Ideally, this should be a metric. Operations against the Kubernetes API (e.g.,
284-
checking if there are objects with field X set) may be a last resort. Avoid
285-
logs or events for this purpose.
314+
315+
Operators have to check the kubelet config value for the node where the
316+
workload runs on. They can also run `crictl inspect` to examine the used OCI
317+
runtime spec and find out which profile is in use.
286318

287319
- **What are the SLIs (Service Level Indicators) an operator can use to determine
288320
the health of the service?**
289321

290-
- [ ] Metrics
291-
- Metric name:
292-
- [Optional] Aggregation method:
293-
- Components exposing the metric:
294-
- [ ] Other (treat as last resort)
295-
- Details:
322+
- A workload is exiting unexpectedly after the feature has been enabled.
323+
324+
- The termination reason is a "permission denied" error.
325+
- The termination is reproducible.
326+
- Replacing `SCMP_ACT_ERRNO` to `SCMP_ACT_LOG` in the default profile will
327+
show seccomp error messages in auditd or syslog.
328+
- There are no other reasons for container termination (like eviction or
329+
exhausting resources)
330+
331+
- A workload is not behaving completely functional, for example some features
332+
are misbehaving but the appliction does not exit.
333+
334+
- There are permission denied errors in the workload logs.
335+
- The behavior is reproducible.
336+
- Replacing `SCMP_ACT_ERRNO` to `SCMP_ACT_LOG` in the default profile will
337+
show seccomp error messages in auditd or syslog.
296338

297339
- **What are the reasonable SLOs (Service Level Objectives) for the above SLIs?**
298-
At a high level, this usually will be in the form of "high percentile of SLI
299-
per day <= X". It's impossible to provide comprehensive guidance, but at the very
300-
high level (needs more precise definitions) those may be things like:
301340

302-
- per-day percentage of API calls finishing with 5XX errors <= 1%
303-
- 99% percentile over day of absolute value from (job creation time minus expected
304-
job creation time) for cron job <= 10%
305-
- 99,9% of /health requests per day finish with 200 code
341+
The workload availability, functionality and health is exactly the same with
342+
the feature enabled. This can be done by tracking the
343+
`kube_pod_container_status_restarts_total` in
344+
[kube-state-metrics](https://github.com/kubernetes/kube-state-metrics/blob/379b60abd97be5914c0b4e292b14e75c5d3cf694/docs/pod-metrics.md#pod-metrics).
306345

307346
- **Are there any missing metrics that would be useful to have to improve observability
308347
of this feature?**
309-
Describe the metrics themselves and the reasons why they weren't added (e.g., cost,
310-
implementation difficulties, etc.).
348+
349+
None
311350

312351
### Dependencies
313352

314353
_This section must be completed when targeting beta graduation to a release._
315354

316355
- **Does this feature depend on any specific services running in the cluster?**
317-
Think about both cluster-level services (e.g. metrics-server) as well
318-
as node-level agents (e.g. specific version of CRI). Focus on external or
319-
optional services that are needed. For example, if this feature depends on
320-
a cloud provider API, or upon an external software-defined storage or network
321-
control plane.
322-
323-
For each of these, fill in the following—thinking about running existing user workloads
324-
and creating new ones, as well as about cluster-level services (e.g. DNS):
325356

326-
- [Dependency name]
327-
- Usage description:
328-
- Impact of its outage on the feature:
329-
- Impact of its degraded performance or high-error rates on the feature:
357+
None
330358

331359
### Scalability
332360

@@ -378,26 +406,22 @@ _This section must be completed when targeting beta graduation to a release._
378406

379407
- **How does this feature react if the API server and/or etcd is unavailable?**
380408

409+
It will still work as intended since it's a kubelet internal feature.
410+
381411
- **What are other known failure modes?**
382-
For each of them, fill in the following information by copying the below template:
383-
384-
- [Failure mode brief description]
385-
- Detection: How can it be detected via metrics? Stated another way:
386-
how can an operator troubleshoot without logging into a master or worker node?
387-
- Mitigations: What can be done to stop the bleeding, especially for already
388-
running user workloads?
389-
- Diagnostics: What are the useful log messages and their required logging
390-
levels that could help debug the issue?
391-
Not required until feature graduated to beta.
392-
- Testing: Are there any tests for failure mode? If not, describe why.
412+
413+
None
393414

394415
- **What steps should be taken if SLOs are not being met to determine the problem?**
395416

417+
None
418+
396419
[supported limits]: https://git.k8s.io/community//sig-scalability/configs-and-limits/thresholds.md
397420
[existing slis/slos]: https://git.k8s.io/community/sig-scalability/slos/slos.md#kubernetes-slisslos
398421

399422
## Implementation History
400423

424+
- 2022-03-15: Updated KEP to beta
401425
- 2021-05-05: KEP promoted to implementable
402426

403427
## Alternatives

keps/sig-node/2413-seccomp-by-default/kep.yaml

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -17,18 +17,18 @@ prr-approvers:
1717
- "@deads2k"
1818

1919
# The target maturity stage in the current dev cycle for this KEP.
20-
stage: alpha
20+
stage: beta
2121

2222
# The most recent milestone for which work toward delivery of this KEP has been
2323
# done. This can be the current (upcoming) milestone, if it is being actively
2424
# worked on.
25-
latest-milestone: "v1.22"
25+
latest-milestone: "v1.25"
2626

2727
# The milestone at which this feature was, or is targeted to be, at each stage.
2828
milestone:
2929
alpha: "v1.22"
30-
beta: "v1.23"
31-
stable: "v1.26"
30+
beta: "v1.25"
31+
stable: "v1.28"
3232

3333
# The following PRR answers are required at alpha release
3434
# List the feature gate name and the components for which it must be enabled

0 commit comments

Comments
 (0)