You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The flakes shown here are not related to this feature, per the tests logs
192
211
193
212
### Graduation Criteria
194
213
@@ -203,11 +222,17 @@ validation should be done by CNIs.
203
222
with generally positive feedback on its usage.
204
223
- Feature Gate is enabled by Default.
205
224
206
-
#### GA Graduation
225
+
#### GA
207
226
208
227
- At least **four** NetworkPolicy providers (or CNI providers) support the `EndPort` field
209
228
-`EndPort` has been enabled by default for at least 1 minor release
210
229
230
+
The following are the CNIs that implement this feature:
231
+
- Calico
232
+
- Antrea
233
+
- Openshift SDN
234
+
- Kuberouter
235
+
211
236
### Upgrade / Downgrade Strategy
212
237
213
238
If upgraded no impact should happen as this is a new field.
@@ -221,17 +246,16 @@ start working incorrectly. This is a fail-closed failure, so it is acceptable.
221
246
### Feature Enablement and Rollback
222
247
223
248
224
-
***How can this feature be enabled / disabled in a live cluster?**
249
+
###### How can this feature be enabled / disabled in a live cluster?
225
250
-[X] Feature gate (also fill in values in `kep.yaml`)
226
251
- Feature gate name: NetworkPolicyEndPort
227
252
- Components depending on the feature gate: Kubernetes API Server
228
253
229
-
***Does enabling the feature change any default behavior?**
254
+
###### Does enabling the feature change any default behavior?
230
255
No
231
256
232
-
***Can the feature be disabled once it has been enabled (i.e. can we roll back
233
-
the enablement)?**
234
-
257
+
###### Can the feature be disabled once it has been enabled (i.e. can we roll back the enablement)?
258
+
235
259
Yes. One caveat here is that NetworkPolicies created with EndPort field set
236
260
when the feature was enabled will continue to have that field set when the
237
261
feature is disabled unless user removes it from the object.
@@ -247,57 +271,82 @@ start working incorrectly. This is a fail-closed failure, so it is acceptable.
247
271
port range, which may break users, which is inevitable but satisfies the
248
272
fail-closed requirement.
249
273
250
-
***What happens if we reenable the feature if it was previously rolled back?**
274
+
###### What happens if we reenable the feature if it was previously rolled back?
275
+
251
276
Nothing.
252
277
253
-
***Are there any tests for feature enablement/disablement?**
278
+
###### Are there any tests for feature enablement/disablement?
254
279
255
280
Yes and they can be found [here](https://github.com/kubernetes/kubernetes/blob/release-1.21/pkg/registry/networking/networkpolicy/strategy_test.go#L284)
256
281
257
282
### Rollout, Upgrade and Rollback Planning
258
283
259
-
_This section must be completed when targeting beta graduation to a release._
260
-
***How can a rollout fail? Can it impact already running workloads?**
284
+
###### How can a rollout or rollback fail? Can it impact already running workloads?
261
285
Not probably, but still there's the risk of some bug that fails validation,
262
286
or conversion function crashes.
263
287
264
-
***What specific metrics should inform a rollback?**
288
+
###### What specific metrics should inform a rollback?
265
289
The increase of 5xx http error count on Network Policies Endpoint
266
290
267
-
***Were upgrade and rollback tested? Was the upgrade->downgrade->upgrade path tested?**
291
+
###### Were upgrade and rollback tested? Was the upgrade->downgrade->upgrade path tested?
292
+
268
293
Yes, with unit tests.
269
-
There's still some need to make manual tests, that will be done in a follow up.
294
+
Manual tests were also executed as the following:
295
+
* Created a KinD cluster in v1.24 and Calico as a CNI
296
+
* Created a Network Policy with `endPort` field to allow a Pod egress to ports from 70 to 90
297
+
* Did a test against a target in port 80 - Worked
298
+
* Disabled the Feature Gate
299
+
* The Network Policy still worked fine
300
+
* Changed the Network Policy so the range is 70 to 79, and the Network Policy was changed fine
301
+
* Traffic started to be blocked, but could call port 78 as it is still within range
302
+
* Removed `endPort` field, and wasn't able to re-add it as Feature gate was disabled
303
+
* Re-enabled feature gate
304
+
* Re-added `endPort` field with value of 90
305
+
* Traffic started to flow/be accepted again
306
+
307
+
Per the manual tests, all worked as desired.
308
+
309
+
###### Is the rollout accompanied by any deprecations and/or removals of features, APIs, fields of API types, flags, etc.?
270
310
271
-
***Is the rollout accompanied by any deprecations and/or removals of features, APIs,
272
311
None
273
312
274
313
### Monitoring Requirements
275
314
276
-
_This section must be completed when targeting beta graduation to a release._
277
-
***How can an operator determine if the feature is in use by workloads?**
315
+
###### How can an operator determine if the feature is in use by workloads?
316
+
278
317
279
318
Operators can determine if NetworkPolicies are making use of EndPort creating
280
319
an object specifying the range and validating if the traffic is allowed within
281
-
the specified range
320
+
the specified range.
321
+
322
+
Also Network Policy object now supports (as alpha) status/condition fields, so
323
+
Network Policy providers can add a feedback to the user whether the policy was processed
324
+
correctly or not. Providing this feedback is optional and depends on implementation
325
+
by each NPP.
326
+
327
+
###### How can someone using this feature know that it is working for their instance?
282
328
283
-
***How can someone using this feature know that it is working for their instance?
284
329
-[x] Other
285
330
- Details:
286
331
The API Field must be present when a NetworkPolicy is created with that field.
287
332
The feature working correctly depends on the CNI implementation, so the operator can
288
333
look into CNI metrics to check if the rules are being applied correctly, like Calico
289
334
that provides metrics like `felix_iptables_restore_errors` that can be used to
290
335
verify if the amount of restoring errors raised after the feature being applied.
291
-
We might need in a future to add some Status field that allows CNI providers to provide
292
-
feedback about the functionality
336
+
For NetworkPolicy Providers that doesn't support this feature, a new status field was added
337
+
in Network Policy object allowing the providers to give feedback to users using conditions.
338
+
Any NPP that does not support this feature should add a condition on the Network Policy
339
+
object.
293
340
294
-
***What are the SLIs (Service Level Indicators) an operator can use to determine
295
-
the health of the service?**
341
+
342
+
###### What are the SLIs (Service Level Indicators) an operator can use to determine the health of the service?
343
+
296
344
Operators can use metrics provided by the CNI to use as SLI, like
297
345
`felix_iptables_restore_errors` from Calico to verify if the errors rate
298
346
has raised.
299
347
300
-
***What are the reasonable SLOs (Service Level Objectives) for the above SLIs?**
348
+
###### What are the reasonable SLOs (Service Level Objectives) for the enhancement?
349
+
301
350
- per-day percentage of API calls finishing with 5XX errors <= 1% is a reasonable SLO
302
351
303
352
***Are there any missing metrics that would be useful to have to improve observability
@@ -307,52 +356,55 @@ of this feature?**
307
356
308
357
### Dependencies
309
358
310
-
***Does this feature depend on any specific services running in the cluster?**
311
-
Yes, a CNI supporting the new feature
359
+
###### Does this feature depend on any specific services running in the cluster?
312
360
361
+
Yes, a CNI supporting the new feature
313
362
314
363
### Scalability
315
364
316
-
***Will enabling / using this feature result in any new API calls?**
365
+
###### Will enabling / using this feature result in any new API calls?
317
366
No
318
367
319
-
***Will enabling / using this feature result in introducing new API types?**
368
+
###### Will enabling / using this feature result in introducing new API types?
369
+
320
370
No
321
371
322
-
***Will enabling / using this feature result in any new calls to the cloud
323
-
provider?**
372
+
###### Will enabling / using this feature result in any new calls to the cloud provider?
373
+
324
374
No
325
375
326
-
***Will enabling / using this feature result in increasing size or count of
327
-
the existing API objects?**
376
+
###### Will enabling / using this feature result in increasing size or count of the existing API objects?
377
+
328
378
329
379
- API type(s): NetworkPolicyPorts
330
380
- Estimated increase in size: 2 bytes for each new `EndPort` value specified + the field name/number in its serialized format
331
381
- Estimated amount of new objects: N/A
332
382
333
-
***Will enabling / using this feature result in increasing time taken by any
334
-
operations covered by [existing SLIs/SLOs]?**
383
+
###### Will enabling / using this feature result in increasing time taken by any operations covered by existing SLIs/SLOs?
384
+
335
385
N/A
336
386
337
-
***Will enabling / using this feature result in non-negligible increase of
338
-
resource usage (CPU, RAM, disk, IO, ...) in any components?**
387
+
###### Will enabling / using this feature result in non-negligible increase of resource usage (CPU, RAM, disk, IO, ...) in any components?
339
388
It might get some increase of resource usage by the CNI while parsing the
340
389
new field.
341
390
342
391
### Troubleshooting
343
392
344
-
***How does this feature react if the API server and/or etcd is unavailable?**
393
+
###### How does this feature react if the API server and/or etcd is unavailable?
394
+
345
395
As this feature is mainly used by CNI providers, the reaction with API server
346
396
and/or etcd being unavailable will be the same as before.
347
397
348
-
***What are other known failure modes?**
398
+
###### What are other known failure modes?
349
399
N/A
350
400
351
-
***What steps should be taken if SLOs are not being met to determine the problem?**
401
+
###### What steps should be taken if SLOs are not being met to determine the problem?
402
+
352
403
Remove EndPort field and check if the number of errors reduce, although this might
353
404
lead to undesired Network Policy, blocking previously working rules.
354
405
355
406
## Implementation History
407
+
- 2022-06-14 Propose GA graduation
356
408
- 2021-05-11 Propose Beta graduation and add more Performance Review data
0 commit comments