|
9 | 9 | - [Non-Goals](#non-goals)
|
10 | 10 | - [User Stories](#user-stories)
|
11 | 11 | - [Implementation Details](#implementation-details)
|
| 12 | +- [Design](#design) |
| 13 | + - [Test Plan](#test-plan) |
| 14 | + - [Needed Tests](#needed-tests) |
| 15 | + - [Graduation Criteria](#graduation-criteria) |
| 16 | + - [Upgrade / Downgrade Strategy](#upgrade--downgrade-strategy) |
| 17 | +- [Production Readiness Review Questionnaire](#production-readiness-review-questionnaire) |
| 18 | + - [Feature Enablement and Rollback](#feature-enablement-and-rollback) |
| 19 | + - [Rollout, Upgrade and Rollback Planning](#rollout-upgrade-and-rollback-planning) |
| 20 | + - [Monitoring Requirements](#monitoring-requirements) |
| 21 | + - [Dependencies](#dependencies) |
| 22 | + - [Scalability](#scalability) |
| 23 | + - [Troubleshooting](#troubleshooting) |
12 | 24 | - [Proposal](#proposal)
|
13 | 25 | - [Dependencies on OCI and container runtimes](#dependencies-on-oci-and-container-runtimes)
|
14 | 26 | - [Current status of dependencies](#current-status-of-dependencies)
|
|
17 | 29 | - [Phase 1: Convert from cgroups v1 settings to v2](#phase-1-convert-from-cgroups-v1-settings-to-v2)
|
18 | 30 | - [Phase 2: Use cgroups v2 throughout the stack](#phase-2-use-cgroups-v2-throughout-the-stack)
|
19 | 31 | - [Risk and Mitigations](#risk-and-mitigations)
|
20 |
| -- [Graduation Criteria](#graduation-criteria) |
21 | 32 | <!-- /toc -->
|
22 | 33 |
|
23 | 34 | ## Summary
|
@@ -52,6 +63,186 @@ This proposal aims to:
|
52 | 63 |
|
53 | 64 | ## Implementation Details
|
54 | 65 |
|
| 66 | +## Design |
| 67 | + |
| 68 | +### Test Plan |
| 69 | + |
| 70 | +#### Needed Tests |
| 71 | + |
| 72 | +- Run E2E tests on a cgroup v2 enabled host. |
| 73 | + |
| 74 | +### Graduation Criteria |
| 75 | + |
| 76 | +- Alpha: Phase 1 completed and basic support for running Kubernetes on |
| 77 | + a cgroups v2 host, e2e tests coverage or have a plan for the |
| 78 | + failing tests. |
| 79 | + A good candidate for running cgroup v2 test is Fedora 31 that has |
| 80 | + already switched to default to cgroup v2. |
| 81 | + |
| 82 | +- Beta: e2e tests coverage and performance testing. Verify that both |
| 83 | + the CPU and Memory Manager work. |
| 84 | + |
| 85 | +- GA: Assuming no negative user feedback based on production |
| 86 | + experience, promote after 2 releases in beta. |
| 87 | + *TBD* whether phase 2 must be implemented for GA. |
| 88 | + |
| 89 | +### Upgrade / Downgrade Strategy |
| 90 | + |
| 91 | +<!-- |
| 92 | +If applicable, how will the component be upgraded and downgraded? Make sure |
| 93 | +this is in the test plan. |
| 94 | +
|
| 95 | +Consider the following in developing an upgrade/downgrade strategy for this |
| 96 | +enhancement: |
| 97 | +- What changes (in invocations, configurations, API use, etc.) is an existing |
| 98 | + cluster required to make on upgrade, in order to maintain previous behavior? |
| 99 | +- What changes (in invocations, configurations, API use, etc.) is an existing |
| 100 | + cluster required to make on upgrade, in order to make use of the enhancement? |
| 101 | +--> |
| 102 | + |
| 103 | +N/A. Not relevant to upgrades. If the host is running with cgroup v2 then |
| 104 | +it will be automatically detected and used. |
| 105 | + |
| 106 | +## Production Readiness Review Questionnaire |
| 107 | + |
| 108 | +### Feature Enablement and Rollback |
| 109 | + |
| 110 | +###### How can this feature be enabled / disabled in a live cluster? |
| 111 | + |
| 112 | +- [ ] Feature gate (also fill in values in `kep.yaml`) |
| 113 | + - Feature gate name: |
| 114 | + - Components depending on the feature gate: |
| 115 | +- [X] Other |
| 116 | + - Describe the mechanism: |
| 117 | + configure the hosts to use cgroup v2 |
| 118 | + - Will enabling / disabling the feature require downtime of the control |
| 119 | + plane? |
| 120 | + No, each host can be restarted to cgroup v2 separately |
| 121 | + - Will enabling / disabling the feature require downtime or reprovisioning |
| 122 | + of a node? (Do not assume `Dynamic Kubelet Config` feature is enabled). |
| 123 | + It requires downtime of a node since it needs to be rebooted |
| 124 | + |
| 125 | +###### Does enabling the feature change any default behavior? |
| 126 | + |
| 127 | +N/A. It must work in the same way as on cgroup v1 |
| 128 | + |
| 129 | +###### Can the feature be disabled once it has been enabled (i.e. can we roll back the enablement)? |
| 130 | + |
| 131 | +yes, it is enough to restart the node on cgroup v1 |
| 132 | + |
| 133 | +###### What happens if we reenable the feature if it was previously rolled back? |
| 134 | + |
| 135 | +It should work seamlessly without any difference |
| 136 | + |
| 137 | +###### Are there any tests for feature enablement/disablement? |
| 138 | + |
| 139 | +The same E2E tests that work on cgroup v1 should work on cgroup v2 |
| 140 | + |
| 141 | +### Rollout, Upgrade and Rollback Planning |
| 142 | + |
| 143 | +N/A. Each node can be configured separately. |
| 144 | + |
| 145 | +###### How can a rollout or rollback fail? Can it impact already running workloads? |
| 146 | + |
| 147 | +N/A. It requires a reboot to be enabled. If the workload accesses directly the |
| 148 | +cgroup file system, then also the workload must be enabled for cgroup v2. |
| 149 | + |
| 150 | +###### What specific metrics should inform a rollback? |
| 151 | + |
| 152 | +Pods not being healthy. One could inspect if the pods are getting the cgroups |
| 153 | +set correctly referencing the conversion table in this KEP. |
| 154 | + |
| 155 | +###### Were upgrade and rollback tested? Was the upgrade->downgrade->upgrade path tested? |
| 156 | + |
| 157 | +N/A. It depends on the node configuration and it is stateless. |
| 158 | + |
| 159 | +###### Is the rollout accompanied by any deprecations and/or removals of features, APIs, fields of API types, flags, etc.? |
| 160 | + |
| 161 | +The cgroup file system inside of the containers will use cgroup v2 instead of cgroup v1. |
| 162 | + |
| 163 | +### Monitoring Requirements |
| 164 | + |
| 165 | +###### How can an operator determine if the feature is in use by workloads? |
| 166 | + |
| 167 | +An operator could run `cat /proc/self/cgroup` on a node to check if it is running in cgroups v2 mode. |
| 168 | +If the node is using cgroup v2, then also the pods running on that node are using it. |
| 169 | + |
| 170 | +###### How can someone using this feature know that it is working for their instance? |
| 171 | + |
| 172 | + |
| 173 | +- [ ] Events |
| 174 | + - Event Reason: |
| 175 | +- [ ] API .status |
| 176 | + - Condition name: |
| 177 | + - Other field: |
| 178 | +- [X] Other (treat as last resort) |
| 179 | + - Details: pods are healthy. |
| 180 | + |
| 181 | +###### What are the reasonable SLOs (Service Level Objectives) for the enhancement? |
| 182 | + |
| 183 | +N/A. Same as when running on cgroup v1. |
| 184 | + |
| 185 | +###### What are the SLIs (Service Level Indicators) an operator can use to determine the health of the service? |
| 186 | + |
| 187 | +- [ ] Metrics |
| 188 | + - Metric name: |
| 189 | + - [Optional] Aggregation method: |
| 190 | + - Components exposing the metric: |
| 191 | +- [X] Other (treat as last resort) |
| 192 | + - Details: not a service |
| 193 | + |
| 194 | +###### Are there any missing metrics that would be useful to have to improve observability of this feature? |
| 195 | + |
| 196 | +No |
| 197 | + |
| 198 | +### Dependencies |
| 199 | + |
| 200 | +The container runtime must also support cgroup v2 |
| 201 | + |
| 202 | +###### Does this feature depend on any specific services running in the cluster? |
| 203 | + |
| 204 | +No |
| 205 | + |
| 206 | +### Scalability |
| 207 | + |
| 208 | +###### Will enabling / using this feature result in any new API calls? |
| 209 | + |
| 210 | +No |
| 211 | + |
| 212 | +###### Will enabling / using this feature result in introducing new API types? |
| 213 | + |
| 214 | +No |
| 215 | + |
| 216 | +###### Will enabling / using this feature result in any new calls to the cloud provider? |
| 217 | + |
| 218 | +No |
| 219 | + |
| 220 | +###### Will enabling / using this feature result in increasing size or count of the existing API objects? |
| 221 | + |
| 222 | +No |
| 223 | + |
| 224 | +###### Will enabling / using this feature result in increasing time taken by any operations covered by existing SLIs/SLOs? |
| 225 | + |
| 226 | +No |
| 227 | + |
| 228 | +###### Will enabling / using this feature result in non-negligible increase of resource usage (CPU, RAM, disk, IO, ...) in any components? |
| 229 | + |
| 230 | +No |
| 231 | + |
| 232 | +### Troubleshooting |
| 233 | + |
| 234 | +###### How does this feature react if the API server and/or etcd is unavailable? |
| 235 | + |
| 236 | +N/A |
| 237 | + |
| 238 | +###### What are other known failure modes? |
| 239 | + |
| 240 | +N/A |
| 241 | + |
| 242 | +###### What steps should be taken if SLOs are not being met to determine the problem? |
| 243 | + |
| 244 | +If SLOs are not being met, reboot the node in cgroup v1 to disable this feature. |
| 245 | + |
55 | 246 | ## Proposal
|
56 | 247 |
|
57 | 248 | The proposal is to implement cgroups v2 in two different phases.
|
@@ -201,17 +392,3 @@ Some cgroups v1 features are not available with cgroups v2:
|
201 | 392 | Some cgroups v1 controllers such as _device_ and _net_cls_,
|
202 | 393 | _net_prio_ are not available with the new version. The alternative to
|
203 | 394 | these controllers is to use eBPF.
|
204 |
| - |
205 |
| -## Graduation Criteria |
206 |
| - |
207 |
| -- Alpha: Phase 1 completed and basic support for running Kubernetes on |
208 |
| - a cgroups v2 host, e2e tests coverage or have a plan for the |
209 |
| - failing tests. |
210 |
| - A good candidate for running cgroup v2 test is Fedora 31 that has |
211 |
| - already switched to default to cgroup v2. |
212 |
| - |
213 |
| -- Beta: e2e tests coverage and performance testing. |
214 |
| - |
215 |
| -- GA: Assuming no negative user feedback based on production |
216 |
| - experience, promote after 2 releases in beta. |
217 |
| - *TBD* whether phase 2 must be implemented for GA. |
|
0 commit comments