Skip to content

Commit 5a02adf

Browse files
author
Mayank Kumar
committed
Promote RunAsGroup to GA
1 parent 639b1d3 commit 5a02adf

File tree

3 files changed

+149
-5
lines changed

3 files changed

+149
-5
lines changed

keps/prod-readiness/sig-node/213.yaml

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
kep-number: 213
2+
stable:
3+
approver: "@johnbelamaric"

keps/sig-node/213-run-as-group/README.md

Lines changed: 134 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -18,7 +18,15 @@
1818
- [Behavior](#behavior)
1919
- [Note About RunAsNonRoot field](#note-about-runasnonroot-field)
2020
- [Summary of Changes needed](#summary-of-changes-needed)
21+
- [Test Plan](#test-plan)
2122
- [Graduation Criteria](#graduation-criteria)
23+
- [Production Readiness Review Questionnaire](#production-readiness-review-questionnaire)
24+
- [Feature Enablement and Rollback](#feature-enablement-and-rollback)
25+
- [Rollout, Upgrade and Rollback Planning](#rollout-upgrade-and-rollback-planning)
26+
- [Monitoring Requirements](#monitoring-requirements)
27+
- [Dependencies](#dependencies)
28+
- [Scalability](#scalability)
29+
- [Troubleshooting](#troubleshooting)
2230
- [Implementation History](#implementation-history)
2331
<!-- /toc -->
2432

@@ -223,17 +231,139 @@ There are other potentially unresolved discussions in that PR which need a follo
223231
- https://github.com/kubernetes/website/pull/12297
224232
- https://github.com/kubernetes/kubernetes/pull/73007
225233

234+
## Test Plan
235+
For `Alpha`, unit tests and e2e tests were added to test functionality at both
236+
container and pod level for dockershim.
237+
238+
For `Beta`, tests were added to other CRI's like cri-o, containerd and Docker.
239+
240+
For `GA`, the introduced e2e tests will be promoted to conformance. It was also
241+
verified that all e2e coverage was proper and CRI's had tests in their respective
242+
repos testing this feature.
226243

227244
## Graduation Criteria
228245

229-
- Publish Test Results from Master Branch of Cri-o To http://prow.k8s.io [#72253](https://github.com/kubernetes/kubernetes/issues/72253)
230-
- Containerd and CRI-O tests included in k/k CI [#72287](https://github.com/kubernetes/kubernetes/issues/72287)
231-
- Make CRI tests failures as release informing
246+
Beta
247+
- RunAsGroup is tested for containerd and CRI-O in cri-tools repo using critest
248+
-- [Tests](https://github.com/kubernetes-sigs/cri-tools/blob/16911795a3c33833fa0ec83dac1ade3172f6989e/pkg/validate/security_context_linux.go#L357)
249+
- critests are executed in cri-tools for all merges as GitHub Action
250+
-- [CRI-O](https://github.com/kubernetes-sigs/cri-tools/actions?query=workflow%3A%22critest+CRI-O%22)
251+
-- [containerd](https://github.com/kubernetes-sigs/cri-tools/actions?query=workflow%3A%22critest+containerd%22)
252+
253+
GA
254+
- assuming no negative user feedback, promote after 1 release at beta.
255+
- verify test coverage for CRI's
256+
257+
## Production Readiness Review Questionnaire
258+
259+
### Feature Enablement and Rollback
260+
This feature is enabled in alpha releases using the feature flag `RunAsGroup`.
261+
262+
263+
### Rollout, Upgrade and Rollback Planning
264+
265+
266+
* **How can a rollout fail? Can it impact already running workloads?**
267+
Its possible in an incorrect configuration. For e.g. lets say the init container writes some
268+
data using runAsGroup of 234, but the main container comes up as 436 and tries to read the
269+
data written by the initcontainer. If that fails, the pod will not be ready and the deployment
270+
wont proceed. This should not impact already running workloads. One way, this can affect
271+
already running workloads is when data is shared between all pods and the access of the files
272+
is changed by the initContainer due to misconfigured runAsGroup.
273+
274+
275+
* **What specific metrics should inform a rollback?**
276+
Metrics will be specific to application. Generic metrics like pod not being healthy and running
277+
should generally inform rollback in this case. More specific checks will involve intrusive testing
278+
like exec into a pod to determine the gid.
279+
280+
281+
* **Were upgrade and rollback tested? Was the upgrade->downgrade->upgrade path tested? **
282+
Yes, manually
283+
284+
* **Is the rollout accompanied by any deprecations and/or removals of features, APIs, fields of API types, flags, etc.? **
285+
Moving from Beta to GA, is accompanied by the removal of the feature flag `RunAsGroup`. No other deprecations or removals
286+
are in scope or part of this process.
287+
288+
### Monitoring Requirements
289+
290+
* **How can an operator determine if the feature is in use by workloads?**
291+
By inspecting the pod spec of any workload using kubectl or client-go libraries. If the pod spec
292+
has RunAsGroup present either at the container or pod level, then the feature is in use.
293+
```
294+
kubectl get pods --all-namespaces -o json | jq -r '.items[] | select(.spec.securityContext.runAsGroup != null or .spec.containers[].securityContext.runAsGroup != null)|[.metadata.name, .metadata.namespace]'
295+
```
296+
297+
* **What are the SLIs (Service Level Indicators) an operator can use to determine
298+
the health of the service?**
299+
If a pod with this feature is enabled, and the pod is running , it's healthy.
300+
If the pod doesn't have the expected runAsGroup id as determined by the below command,
301+
the feature is not supported in that container runtime. Dont know if this caught earlier
302+
somewhere.
303+
304+
```
305+
id -g
306+
```
307+
308+
* **What are the reasonable SLOs (Service Level Objectives) for the above SLIs?**
309+
N/A
310+
311+
* **Are there any missing metrics that would be useful to have to improve observability
312+
of this feature?**
313+
N/A
314+
315+
316+
### Dependencies
317+
318+
* **Does this feature depend on any specific services running in the cluster?**
319+
This feature only depends on the container runtime(CRI) supporting this feature.
320+
321+
### Scalability
322+
323+
* **Will enabling / using this feature result in any new API calls?**
324+
No
325+
* **Will enabling / using this feature result in introducing new API types?**
326+
No
327+
328+
* **Will enabling / using this feature result in any new calls to the cloud
329+
provider?**
330+
No
331+
332+
* **Will enabling / using this feature result in increasing size or count of
333+
the existing API objects?**
334+
This feature adds two new fields on at the pod level and one in each and every container this field is used in.
335+
336+
337+
* **Will enabling / using this feature result in increasing time taken by any
338+
operations covered by [existing SLIs/SLOs]?**
339+
No
340+
341+
* **Will enabling / using this feature result in non-negligible increase of
342+
resource usage (CPU, RAM, disk, IO, ...) in any components?**
343+
No
344+
345+
346+
### Troubleshooting
347+
348+
* **How does this feature react if the API server and/or etcd is unavailable?**
349+
After a pod is deployed, this feature will continue to work even if etcd or api server is unavailable.
350+
The functions not available when apiserver or etcd is unavailable is not specific to this feature.
351+
352+
353+
* **What are other known failure modes?**
354+
N/A
355+
356+
* **What steps should be taken if SLOs are not being met to determine the problem?**
357+
N/A
358+
359+
360+
232361

233362
## Implementation History
234363
- Proposal merged on 9-18-2017
235364
- Implementation merged as Alpha on 3-1-2018 and Release in 1.10
236365
- Implementation for Containerd merged on 3-30-2018
237366
- Implementation for CRI-O merged on 6-8-2018
238367
- Implemented RunAsGroup PodSecurityPolicy Strategy on 10-12-2018
239-
- Planned Beta in v1.14
368+
- Beta in 1.14
369+
- GA in 1.21

keps/sig-node/213-run-as-group/kep.yaml

Lines changed: 12 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,18 @@ reviewers:
1111
approvers:
1212
- "@liggitt"
1313
- "@derekwaynecarr"
14+
prr-approvers:
15+
- "@johnbelamaric"
1416
editor: TBD
1517
creation-date: 2017-06-21
16-
last-updated: 2019-02-14
18+
last-updated: 2021-02-04
1719
status: implementable
20+
stage: "stable"
21+
22+
# The milestone at which this feature was, or is targeted to be, at each stage.
23+
milestone:
24+
alpha: "v1.10"
25+
beta: "v1.14"
26+
stable: "v1.21"
27+
28+
latest-milestone: "v1.21"

0 commit comments

Comments
 (0)