|
18 | 18 | - [Behavior](#behavior)
|
19 | 19 | - [Note About RunAsNonRoot field](#note-about-runasnonroot-field)
|
20 | 20 | - [Summary of Changes needed](#summary-of-changes-needed)
|
| 21 | +- [Test Plan](#test-plan) |
21 | 22 | - [Graduation Criteria](#graduation-criteria)
|
| 23 | +- [Production Readiness Review Questionnaire](#production-readiness-review-questionnaire) |
| 24 | + - [Feature Enablement and Rollback](#feature-enablement-and-rollback) |
| 25 | + - [Rollout, Upgrade and Rollback Planning](#rollout-upgrade-and-rollback-planning) |
| 26 | + - [Monitoring Requirements](#monitoring-requirements) |
| 27 | + - [Dependencies](#dependencies) |
| 28 | + - [Scalability](#scalability) |
| 29 | + - [Troubleshooting](#troubleshooting) |
22 | 30 | - [Implementation History](#implementation-history)
|
23 | 31 | <!-- /toc -->
|
24 | 32 |
|
@@ -223,17 +231,139 @@ There are other potentially unresolved discussions in that PR which need a follo
|
223 | 231 | - https://github.com/kubernetes/website/pull/12297
|
224 | 232 | - https://github.com/kubernetes/kubernetes/pull/73007
|
225 | 233 |
|
| 234 | +## Test Plan |
| 235 | +For `Alpha`, unit tests and e2e tests were added to test functionality at both |
| 236 | +container and pod level for dockershim. |
| 237 | + |
| 238 | +For `Beta`, tests were added to other CRI's like cri-o, containerd and Docker. |
| 239 | + |
| 240 | +For `GA`, the introduced e2e tests will be promoted to conformance. It was also |
| 241 | +verified that all e2e coverage was proper and CRI's had tests in their respective |
| 242 | +repos testing this feature. |
226 | 243 |
|
227 | 244 | ## Graduation Criteria
|
228 | 245 |
|
229 |
| -- Publish Test Results from Master Branch of Cri-o To http://prow.k8s.io [#72253](https://github.com/kubernetes/kubernetes/issues/72253) |
230 |
| -- Containerd and CRI-O tests included in k/k CI [#72287](https://github.com/kubernetes/kubernetes/issues/72287) |
231 |
| -- Make CRI tests failures as release informing |
| 246 | +Beta |
| 247 | +- RunAsGroup is tested for containerd and CRI-O in cri-tools repo using critest |
| 248 | + -- [Tests](https://github.com/kubernetes-sigs/cri-tools/blob/16911795a3c33833fa0ec83dac1ade3172f6989e/pkg/validate/security_context_linux.go#L357) |
| 249 | +- critests are executed in cri-tools for all merges as GitHub Action |
| 250 | + -- [CRI-O](https://github.com/kubernetes-sigs/cri-tools/actions?query=workflow%3A%22critest+CRI-O%22) |
| 251 | + -- [containerd](https://github.com/kubernetes-sigs/cri-tools/actions?query=workflow%3A%22critest+containerd%22) |
| 252 | + |
| 253 | +GA |
| 254 | +- assuming no negative user feedback, promote after 1 release at beta. |
| 255 | +- verify test coverage for CRI's |
| 256 | + |
| 257 | +## Production Readiness Review Questionnaire |
| 258 | + |
| 259 | +### Feature Enablement and Rollback |
| 260 | +This feature is enabled in alpha releases using the feature flag `RunAsGroup`. |
| 261 | + |
| 262 | + |
| 263 | +### Rollout, Upgrade and Rollback Planning |
| 264 | + |
| 265 | + |
| 266 | +* **How can a rollout fail? Can it impact already running workloads?** |
| 267 | +Its possible in an incorrect configuration. For e.g. lets say the init container writes some |
| 268 | +data using runAsGroup of 234, but the main container comes up as 436 and tries to read the |
| 269 | +data written by the initcontainer. If that fails, the pod will not be ready and the deployment |
| 270 | +wont proceed. This should not impact already running workloads. One way, this can affect |
| 271 | +already running workloads is when data is shared between all pods and the access of the files |
| 272 | +is changed by the initContainer due to misconfigured runAsGroup. |
| 273 | + |
| 274 | + |
| 275 | +* **What specific metrics should inform a rollback?** |
| 276 | +Metrics will be specific to application. Generic metrics like pod not being healthy and running |
| 277 | +should generally inform rollback in this case. More specific checks will involve intrusive testing |
| 278 | +like exec into a pod to determine the gid. |
| 279 | + |
| 280 | + |
| 281 | +* **Were upgrade and rollback tested? Was the upgrade->downgrade->upgrade path tested? ** |
| 282 | +Yes, manually |
| 283 | + |
| 284 | +* **Is the rollout accompanied by any deprecations and/or removals of features, APIs, fields of API types, flags, etc.? ** |
| 285 | +Moving from Beta to GA, is accompanied by the removal of the feature flag `RunAsGroup`. No other deprecations or removals |
| 286 | +are in scope or part of this process. |
| 287 | + |
| 288 | +### Monitoring Requirements |
| 289 | + |
| 290 | +* **How can an operator determine if the feature is in use by workloads?** |
| 291 | +By inspecting the pod spec of any workload using kubectl or client-go libraries. If the pod spec |
| 292 | +has RunAsGroup present either at the container or pod level, then the feature is in use. |
| 293 | +``` |
| 294 | +kubectl get pods --all-namespaces -o json | jq -r '.items[] | select(.spec.securityContext.runAsGroup != null or .spec.containers[].securityContext.runAsGroup != null)|[.metadata.name, .metadata.namespace]' |
| 295 | +``` |
| 296 | + |
| 297 | +* **What are the SLIs (Service Level Indicators) an operator can use to determine |
| 298 | +the health of the service?** |
| 299 | +If a pod with this feature is enabled, and the pod is running , it's healthy. |
| 300 | +If the pod doesn't have the expected runAsGroup id as determined by the below command, |
| 301 | +the feature is not supported in that container runtime. Dont know if this caught earlier |
| 302 | +somewhere. |
| 303 | + |
| 304 | +``` |
| 305 | +id -g |
| 306 | +``` |
| 307 | + |
| 308 | +* **What are the reasonable SLOs (Service Level Objectives) for the above SLIs?** |
| 309 | +N/A |
| 310 | + |
| 311 | +* **Are there any missing metrics that would be useful to have to improve observability |
| 312 | +of this feature?** |
| 313 | +N/A |
| 314 | + |
| 315 | + |
| 316 | +### Dependencies |
| 317 | + |
| 318 | +* **Does this feature depend on any specific services running in the cluster?** |
| 319 | +This feature only depends on the container runtime(CRI) supporting this feature. |
| 320 | + |
| 321 | +### Scalability |
| 322 | + |
| 323 | +* **Will enabling / using this feature result in any new API calls?** |
| 324 | + No |
| 325 | +* **Will enabling / using this feature result in introducing new API types?** |
| 326 | + No |
| 327 | + |
| 328 | +* **Will enabling / using this feature result in any new calls to the cloud |
| 329 | +provider?** |
| 330 | + No |
| 331 | + |
| 332 | +* **Will enabling / using this feature result in increasing size or count of |
| 333 | +the existing API objects?** |
| 334 | +This feature adds two new fields on at the pod level and one in each and every container this field is used in. |
| 335 | + |
| 336 | + |
| 337 | +* **Will enabling / using this feature result in increasing time taken by any |
| 338 | +operations covered by [existing SLIs/SLOs]?** |
| 339 | + No |
| 340 | + |
| 341 | +* **Will enabling / using this feature result in non-negligible increase of |
| 342 | +resource usage (CPU, RAM, disk, IO, ...) in any components?** |
| 343 | + No |
| 344 | + |
| 345 | + |
| 346 | +### Troubleshooting |
| 347 | + |
| 348 | +* **How does this feature react if the API server and/or etcd is unavailable?** |
| 349 | +After a pod is deployed, this feature will continue to work even if etcd or api server is unavailable. |
| 350 | +The functions not available when apiserver or etcd is unavailable is not specific to this feature. |
| 351 | + |
| 352 | + |
| 353 | +* **What are other known failure modes?** |
| 354 | +N/A |
| 355 | + |
| 356 | +* **What steps should be taken if SLOs are not being met to determine the problem?** |
| 357 | + N/A |
| 358 | + |
| 359 | + |
| 360 | + |
232 | 361 |
|
233 | 362 | ## Implementation History
|
234 | 363 | - Proposal merged on 9-18-2017
|
235 | 364 | - Implementation merged as Alpha on 3-1-2018 and Release in 1.10
|
236 | 365 | - Implementation for Containerd merged on 3-30-2018
|
237 | 366 | - Implementation for CRI-O merged on 6-8-2018
|
238 | 367 | - Implemented RunAsGroup PodSecurityPolicy Strategy on 10-12-2018
|
239 |
| -- Planned Beta in v1.14 |
| 368 | +- Beta in 1.14 |
| 369 | +- GA in 1.21 |
0 commit comments