You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
-[Feature Enablement and Rollback](#feature-enablement-and-rollback)
@@ -203,7 +203,7 @@ validation should be done by CNIs.
203
203
with generally positive feedback on its usage.
204
204
- Feature Gate is enabled by Default.
205
205
206
-
#### GA Graduation
206
+
#### GA
207
207
208
208
- At least **four** NetworkPolicy providers (or CNI providers) support the `EndPort` field
209
209
-`EndPort` has been enabled by default for at least 1 minor release
@@ -221,16 +221,16 @@ start working incorrectly. This is a fail-closed failure, so it is acceptable.
221
221
### Feature Enablement and Rollback
222
222
223
223
224
-
***How can this feature be enabled / disabled in a live cluster?**
224
+
###### How can this feature be enabled / disabled in a live cluster?
225
225
-[X] Feature gate (also fill in values in `kep.yaml`)
226
226
- Feature gate name: NetworkPolicyEndPort
227
227
- Components depending on the feature gate: Kubernetes API Server
228
228
229
-
***Does enabling the feature change any default behavior?**
229
+
###### Does enabling the feature change any default behavior?
230
230
No
231
231
232
-
***Can the feature be disabled once it has been enabled (i.e. can we roll back
233
-
the enablement)?**
232
+
###### Can the feature be disabled once it has been enabled (i.e. can we roll back the enablement)?
233
+
234
234
235
235
Yes. One caveat here is that NetworkPolicies created with EndPort field set
236
236
when the feature was enabled will continue to have that field set when the
@@ -247,40 +247,45 @@ start working incorrectly. This is a fail-closed failure, so it is acceptable.
247
247
port range, which may break users, which is inevitable but satisfies the
248
248
fail-closed requirement.
249
249
250
-
***What happens if we reenable the feature if it was previously rolled back?**
250
+
###### What happens if we reenable the feature if it was previously rolled back?
251
+
251
252
Nothing.
252
253
253
-
***Are there any tests for feature enablement/disablement?**
254
+
###### Are there any tests for feature enablement/disablement?
254
255
255
256
Yes and they can be found [here](https://github.com/kubernetes/kubernetes/blob/release-1.21/pkg/registry/networking/networkpolicy/strategy_test.go#L284)
256
257
257
258
### Rollout, Upgrade and Rollback Planning
258
259
259
260
_This section must be completed when targeting beta graduation to a release._
260
-
***How can a rollout fail? Can it impact already running workloads?**
261
+
###### How can a rollout or rollback fail? Can it impact already running workloads?
261
262
Not probably, but still there's the risk of some bug that fails validation,
262
263
or conversion function crashes.
263
264
264
-
***What specific metrics should inform a rollback?**
265
+
###### What specific metrics should inform a rollback?
265
266
The increase of 5xx http error count on Network Policies Endpoint
266
267
267
-
***Were upgrade and rollback tested? Was the upgrade->downgrade->upgrade path tested?**
268
+
###### Were upgrade and rollback tested? Was the upgrade->downgrade->upgrade path tested?
269
+
268
270
Yes, with unit tests.
269
271
There's still some need to make manual tests, that will be done in a follow up.
270
272
271
-
***Is the rollout accompanied by any deprecations and/or removals of features, APIs,
273
+
###### Is the rollout accompanied by any deprecations and/or removals of features, APIs, fields of API types, flags, etc.?
274
+
272
275
None
273
276
274
277
### Monitoring Requirements
275
278
276
279
_This section must be completed when targeting beta graduation to a release._
277
-
***How can an operator determine if the feature is in use by workloads?**
280
+
###### How can an operator determine if the feature is in use by workloads?
281
+
278
282
279
283
Operators can determine if NetworkPolicies are making use of EndPort creating
280
284
an object specifying the range and validating if the traffic is allowed within
281
285
the specified range
282
286
283
-
***How can someone using this feature know that it is working for their instance?
287
+
###### How can someone using this feature know that it is working for their instance?
288
+
284
289
-[x] Other
285
290
- Details:
286
291
The API Field must be present when a NetworkPolicy is created with that field.
@@ -291,13 +296,14 @@ _This section must be completed when targeting beta graduation to a release._
291
296
We might need in a future to add some Status field that allows CNI providers to provide
292
297
feedback about the functionality
293
298
294
-
***What are the SLIs (Service Level Indicators) an operator can use to determine
295
-
the health of the service?**
299
+
###### What are the SLIs (Service Level Indicators) an operator can use to determine the health of the service?
300
+
296
301
Operators can use metrics provided by the CNI to use as SLI, like
297
302
`felix_iptables_restore_errors` from Calico to verify if the errors rate
298
303
has raised.
299
304
300
-
***What are the reasonable SLOs (Service Level Objectives) for the above SLIs?**
305
+
###### What are the reasonable SLOs (Service Level Objectives) for the enhancement?
306
+
301
307
- per-day percentage of API calls finishing with 5XX errors <= 1% is a reasonable SLO
302
308
303
309
***Are there any missing metrics that would be useful to have to improve observability
@@ -307,52 +313,56 @@ of this feature?**
307
313
308
314
### Dependencies
309
315
310
-
***Does this feature depend on any specific services running in the cluster?**
316
+
###### Does this feature depend on any specific services running in the cluster?
317
+
311
318
Yes, a CNI supporting the new feature
312
319
313
320
314
321
### Scalability
315
322
316
-
***Will enabling / using this feature result in any new API calls?**
323
+
###### Will enabling / using this feature result in any new API calls?
317
324
No
318
325
319
-
***Will enabling / using this feature result in introducing new API types?**
326
+
###### Will enabling / using this feature result in introducing new API types?
327
+
320
328
No
321
329
322
-
***Will enabling / using this feature result in any new calls to the cloud
323
-
provider?**
330
+
###### Will enabling / using this feature result in any new calls to the cloud provider?
331
+
324
332
No
325
333
326
-
***Will enabling / using this feature result in increasing size or count of
327
-
the existing API objects?**
334
+
###### Will enabling / using this feature result in increasing size or count of the existing API objects?
335
+
328
336
329
337
- API type(s): NetworkPolicyPorts
330
338
- Estimated increase in size: 2 bytes for each new `EndPort` value specified + the field name/number in its serialized format
331
339
- Estimated amount of new objects: N/A
332
340
333
-
***Will enabling / using this feature result in increasing time taken by any
334
-
operations covered by [existing SLIs/SLOs]?**
341
+
###### Will enabling / using this feature result in increasing time taken by any operations covered by existing SLIs/SLOs?
342
+
335
343
N/A
336
344
337
-
***Will enabling / using this feature result in non-negligible increase of
338
-
resource usage (CPU, RAM, disk, IO, ...) in any components?**
345
+
###### Will enabling / using this feature result in non-negligible increase of resource usage (CPU, RAM, disk, IO, ...) in any components?
339
346
It might get some increase of resource usage by the CNI while parsing the
340
347
new field.
341
348
342
349
### Troubleshooting
343
350
344
-
***How does this feature react if the API server and/or etcd is unavailable?**
351
+
###### How does this feature react if the API server and/or etcd is unavailable?
352
+
345
353
As this feature is mainly used by CNI providers, the reaction with API server
346
354
and/or etcd being unavailable will be the same as before.
347
355
348
-
***What are other known failure modes?**
356
+
###### What are other known failure modes?
349
357
N/A
350
358
351
-
***What steps should be taken if SLOs are not being met to determine the problem?**
359
+
###### What steps should be taken if SLOs are not being met to determine the problem?
360
+
352
361
Remove EndPort field and check if the number of errors reduce, although this might
353
362
lead to undesired Network Policy, blocking previously working rules.
354
363
355
364
## Implementation History
365
+
- 2022-01-31 Propose GA graduation
356
366
- 2021-05-11 Propose Beta graduation and add more Performance Review data
0 commit comments