|
15 | 15 | - [Graduation Criteria](#graduation-criteria)
|
16 | 16 | - [Upgrade / Downgrade Strategy](#upgrade--downgrade-strategy)
|
17 | 17 | - [Version Skew Strategy](#version-skew-strategy)
|
| 18 | +- [Production Readiness Review Questionnaire](#production-readiness-review-questionnaire) |
| 19 | + - [Feature Enablement and Rollback](#feature-enablement-and-rollback) |
| 20 | + - [Rollout, Upgrade and Rollback Planning](#rollout-upgrade-and-rollback-planning) |
| 21 | + - [Monitoring Requirements](#monitoring-requirements) |
| 22 | + - [Dependencies](#dependencies) |
| 23 | + - [Scalability](#scalability) |
| 24 | + - [Troubleshooting](#troubleshooting) |
18 | 25 | - [Implementation History](#implementation-history)
|
19 | 26 | <!-- /toc -->
|
20 | 27 |
|
@@ -178,7 +185,130 @@ No specific strategy is required.
|
178 | 185 | All the references to `SelfLink` should be removed early enough (2 releases before) the field
|
179 | 186 | itself will be removed.
|
180 | 187 |
|
| 188 | +## Production Readiness Review Questionnaire |
| 189 | + |
| 190 | +### Feature Enablement and Rollback |
| 191 | + |
| 192 | +_This section must be completed when targeting alpha to a release._ |
| 193 | + |
| 194 | +* **How can this feature be enabled / disabled in a live cluster?** |
| 195 | + - [x] Feature gate (also fill in values in `kep.yaml`) |
| 196 | + - Feature gate name: RemoveSelfLink |
| 197 | + - Components depending on the feature gate: |
| 198 | + - kube-apiserver |
| 199 | + |
| 200 | +* **Does enabling the feature change any default behavior?** |
| 201 | + Yes. SelfLink field is no longer propagated by kube-apiserver. |
| 202 | + |
| 203 | +* **Can the feature be disabled once it has been enabled (i.e. can we roll back |
| 204 | + the enablement)?** |
| 205 | + Yes - selflink is set purely in-memory in kube-apiserver, the feature can be |
| 206 | + switched on and off. |
| 207 | + |
| 208 | +* **What happens if we reenable the feature if it was previously rolled back?** |
| 209 | + SelfLink will stop being propagated again. |
| 210 | + |
| 211 | +* **Are there any tests for feature enablement/disablement?** |
| 212 | + No. |
| 213 | + |
| 214 | +### Rollout, Upgrade and Rollback Planning |
| 215 | + |
| 216 | +_This section must be completed when targeting beta graduation to a release._ |
| 217 | + |
| 218 | +* **How can a rollout fail? Can it impact already running workloads?** |
| 219 | + If there is any component relying on the fact that SelfLink field is set, |
| 220 | + it may stop working as expected. |
| 221 | + |
| 222 | +* **What specific metrics should inform a rollback?** |
| 223 | + No generic metrics. Health of individual components should be watched. |
| 224 | + Generic Kubernetes components has been updated to not rely on it. |
| 225 | + |
| 226 | +* **Were upgrade and rollback tested? Was the upgrade->downgrade->upgrade path tested?** |
| 227 | + Manual tests were done, SelfLink was/wasn't set as expected. |
| 228 | + |
| 229 | +* **Is the rollout accompanied by any deprecations and/or removals of features, APIs, |
| 230 | +fields of API types, flags, etc.?** |
| 231 | + Yes - SelfLink field in ObjectMetadata is being deprecated and removed. |
| 232 | + |
| 233 | +### Monitoring Requirements |
| 234 | + |
| 235 | +_This section must be completed when targeting beta graduation to a release._ |
| 236 | + |
| 237 | +* **How can an operator determine if the feature is in use by workloads?** |
| 238 | + SelfLink is not a runtime feature - it's read-only object identifier. |
| 239 | + |
| 240 | +* **What are the SLIs (Service Level Indicators) an operator can use to determine |
| 241 | +the health of the service?** |
| 242 | + Existing, so-far used metrics to determine components health should be used. |
| 243 | + |
| 244 | +* **What are the reasonable SLOs (Service Level Objectives) for the above SLIs?** |
| 245 | + n/a |
| 246 | + |
| 247 | +* **Are there any missing metrics that would be useful to have to improve observability |
| 248 | +of this feature?** |
| 249 | + No. |
| 250 | + |
| 251 | +### Dependencies |
| 252 | + |
| 253 | +_This section must be completed when targeting beta graduation to a release._ |
| 254 | + |
| 255 | +* **Does this feature depend on any specific services running in the cluster?** |
| 256 | + No |
| 257 | + |
| 258 | +### Scalability |
| 259 | + |
| 260 | +_For alpha, this section is encouraged: reviewers should consider these questions |
| 261 | +and attempt to answer them._ |
| 262 | + |
| 263 | +_For beta, this section is required: reviewers must answer these questions._ |
| 264 | + |
| 265 | +_For GA, this section is required: approvers should be able to confirm the |
| 266 | +previous answers based on experience in the field._ |
| 267 | + |
| 268 | +* **Will enabling / using this feature result in any new API calls?** |
| 269 | + No. |
| 270 | + |
| 271 | +* **Will enabling / using this feature result in introducing new API types?** |
| 272 | + No. |
| 273 | + |
| 274 | +* **Will enabling / using this feature result in any new calls to the cloud |
| 275 | +provider?** |
| 276 | + No. |
| 277 | + |
| 278 | +* **Will enabling / using this feature result in increasing size or count of |
| 279 | +the existing API objects?** |
| 280 | + No (in fact returned objects will be smaller as they won't contain selflink). |
| 281 | + |
| 282 | +* **Will enabling / using this feature result in increasing time taken by any |
| 283 | +operations covered by [existing SLIs/SLOs]?** |
| 284 | + No. |
| 285 | + |
| 286 | +* **Will enabling / using this feature result in non-negligible increase of |
| 287 | +resource usage (CPU, RAM, disk, IO, ...) in any components?** |
| 288 | + No. |
| 289 | + |
| 290 | +### Troubleshooting |
| 291 | + |
| 292 | +The Troubleshooting section currently serves the `Playbook` role. We may consider |
| 293 | +splitting it into a dedicated `Playbook` document (potentially with some monitoring |
| 294 | +details). For now, we leave it here. |
| 295 | + |
| 296 | +_This section must be completed when targeting beta graduation to a release._ |
| 297 | + |
| 298 | +* **How does this feature react if the API server and/or etcd is unavailable?** |
| 299 | + n/a |
| 300 | + |
| 301 | +* **What are other known failure modes?** |
| 302 | + n/a |
| 303 | + |
| 304 | +* **What steps should be taken if SLOs are not being met to determine the problem?** |
| 305 | + n/a |
| 306 | + |
| 307 | +[supported limits]: https://git.k8s.io/community//sig-scalability/configs-and-limits/thresholds.md |
| 308 | +[existing SLIs/SLOs]: https://git.k8s.io/community/sig-scalability/slos/slos.md#kubernetes-slisslos |
| 309 | + |
181 | 310 | ## Implementation History
|
182 | 311 |
|
183 | 312 | 2019-07-23: KEP merged.
|
184 | 313 | 2019-07-24: KEP move to implementable.
|
| 314 | +v1.16: Released in Alpha |
0 commit comments