|
15 | 15 | - [Alpha -> Beta](#alpha---beta)
|
16 | 16 | - [Beta -> GA](#beta---ga)
|
17 | 17 | - [Test plan](#test-plan)
|
| 18 | + - [Production Readiness Review Questionnaire](#production-readiness-review-questionnaire) |
| 19 | + - [Feature Enablement and Rollback](#feature-enablement-and-rollback) |
| 20 | + - [Rollout, Upgrade and Rollback Planning](#rollout-upgrade-and-rollback-planning) |
| 21 | + - [Monitoring Requirements](#monitoring-requirements) |
| 22 | + - [Dependencies](#dependencies) |
| 23 | + - [Scalability](#scalability) |
| 24 | + - [Troubleshooting](#troubleshooting) |
18 | 25 | <!-- /toc -->
|
19 | 26 |
|
20 | 27 | ## Summary
|
@@ -111,3 +118,108 @@ already exists on EndpointSlice. Additionally, it will add tests that ensure
|
111 | 118 | that both the Endpoints and EndpointSlice controllers appropriately set the
|
112 | 119 | AppProtocol field on Endpoints and EndpointSlices when it is set on the
|
113 | 120 | corresponding Service.
|
| 121 | + |
| 122 | +## Production Readiness Review Questionnaire |
| 123 | + |
| 124 | +### Feature Enablement and Rollback |
| 125 | + |
| 126 | +* **How can this feature be enabled / disabled in a live cluster?** |
| 127 | + This was previously enabled with the `ServiceAppProtocol` feature gate. That |
| 128 | + will be removed in Kubernetes 1.21. |
| 129 | + |
| 130 | +* **Does enabling the feature change any default behavior?** |
| 131 | + No. |
| 132 | + |
| 133 | +* **Can the feature be disabled once it has been enabled (i.e. can we roll back |
| 134 | + the enablement)?** |
| 135 | + Not anymore. |
| 136 | + |
| 137 | +* **What happens if we reenable the feature if it was previously rolled back?** |
| 138 | + N/A. |
| 139 | + |
| 140 | +* **Are there any tests for feature enablement/disablement?** |
| 141 | + N/A. |
| 142 | + |
| 143 | +### Rollout, Upgrade and Rollback Planning |
| 144 | + |
| 145 | +* **How can a rollout fail? Can it impact already running workloads?** |
| 146 | + If the `ServiceAppProtocol` gate is manually enabled on Kubernetes components |
| 147 | + it will no longer be recognized in Kubernetes 1.21. Users should stop using |
| 148 | + this feature gate. |
| 149 | + |
| 150 | +* **What specific metrics should inform a rollback?** |
| 151 | + N/A. |
| 152 | + |
| 153 | +* **Were upgrade and rollback tested? Was the upgrade->downgrade->upgrade path tested?** |
| 154 | + N/A. |
| 155 | + |
| 156 | +* **Is the rollout accompanied by any deprecations and/or removals of features, |
| 157 | + APIs, fields of API types, flags, etc.?** |
| 158 | + The v1.21 rollout will include the removal of the `ServiceAppProtcol` feature |
| 159 | + gate. |
| 160 | + |
| 161 | +### Monitoring Requirements |
| 162 | + |
| 163 | +* **How can an operator determine if the feature is in use by workloads?** |
| 164 | + If this field is set on any Services, it may be used by applications that |
| 165 | + consume those Services. No core Kubernetes components consume this field. |
| 166 | + |
| 167 | +* **What are the SLIs (Service Level Indicators) an operator can use to |
| 168 | + determine the health of the service?** |
| 169 | + N/A. |
| 170 | + |
| 171 | +* **What are the reasonable SLOs (Service Level Objectives) for the above SLIs?** |
| 172 | + N/A. |
| 173 | + |
| 174 | +* **Are there any missing metrics that would be useful to have to improve |
| 175 | + observability of this feature?** |
| 176 | + No. |
| 177 | + |
| 178 | +### Dependencies |
| 179 | + |
| 180 | +* **Does this feature depend on any specific services running in the cluster?** |
| 181 | + No. |
| 182 | + |
| 183 | + |
| 184 | +### Scalability |
| 185 | + |
| 186 | +* **Will enabling / using this feature result in any new API calls?** |
| 187 | + No. |
| 188 | + |
| 189 | +* **Will enabling / using this feature result in introducing new API types?** |
| 190 | + No. |
| 191 | + |
| 192 | +* **Will enabling / using this feature result in any new calls to the cloud |
| 193 | + provider?** |
| 194 | + No. |
| 195 | + |
| 196 | +* **Will enabling / using this feature result in increasing size or count of the |
| 197 | + existing API objects?** |
| 198 | + Describe them, providing: |
| 199 | + - API type(s): Service |
| 200 | + - Estimated increase in size: 10B |
| 201 | + - Estimated amount of new objects: This field could be specified on each port |
| 202 | + in each Service in a cluster although that is unlikely. |
| 203 | + |
| 204 | +* **Will enabling / using this feature result in increasing time taken by any |
| 205 | + operations covered by existing SLIs/SLOs?** |
| 206 | + No. |
| 207 | + |
| 208 | +* **Will enabling / using this feature result in non-negligible increase of |
| 209 | + resource usage (CPU, RAM, disk, IO, ...) in any components?** |
| 210 | + No |
| 211 | + |
| 212 | +### Troubleshooting |
| 213 | + |
| 214 | +The Troubleshooting section currently serves the `Playbook` role. We may consider |
| 215 | +splitting it into a dedicated `Playbook` document (potentially with some monitoring |
| 216 | +details). For now, we leave it here. |
| 217 | + |
| 218 | +* **How does this feature react if the API server and/or etcd is unavailable?** |
| 219 | + N/A |
| 220 | + |
| 221 | +* **What are other known failure modes?** |
| 222 | + N/A |
| 223 | + |
| 224 | +* **What steps should be taken if SLOs are not being met to determine the problem?** |
| 225 | + N/A |
0 commit comments