Skip to content

Commit e159865

Browse files
committed
Mark KEP as implementable following PRR review.
1 parent 9904180 commit e159865

File tree

3 files changed

+81
-51
lines changed

3 files changed

+81
-51
lines changed
Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
kep-number: 2594
2+
alpha:
3+
approver: "@wojtek-t"

keps/sig-network/2594-multiple-cluster-cidrs/README.md

Lines changed: 70 additions & 49 deletions
Original file line numberDiff line numberDiff line change
@@ -23,7 +23,7 @@
2323
- [Dual-Stack Support](#dual-stack-support)
2424
- [Startup Options](#startup-options)
2525
- [Startup](#startup)
26-
- [Reconciliation Loop](#reconciliation-loop)
26+
- [Processing Queue](#processing-queue)
2727
- [Event Watching Loops](#event-watching-loops)
2828
- [Node Added](#node-added)
2929
- [Node Updated](#node-updated)
@@ -77,15 +77,15 @@ checklist items _must_ be updated for the enhancement to be released.
7777
Items marked with (R) are required *prior to targeting to a milestone /
7878
release*.
7979

80-
- [ ] (R) Enhancement issue in release milestone, which links to KEP dir in
80+
- [X] (R) Enhancement issue in release milestone, which links to KEP dir in
8181
[kubernetes/enhancements](not the initial KEP PR)
82-
- [ ] (R) KEP approvers have approved the KEP status as `implementable`
83-
- [ ] (R) Design details are appropriately documented
84-
- [ ] (R) Test plan is in place, giving consideration to SIG Architecture and
82+
- [X] (R) KEP approvers have approved the KEP status as `implementable`
83+
- [X] (R) Design details are appropriately documented
84+
- [X] (R) Test plan is in place, giving consideration to SIG Architecture and
8585
SIG Testing input (including test refactors)
86-
- [ ] (R) Graduation criteria is in place
87-
- [ ] (R) Production readiness review completed
88-
- [ ] (R) Production readiness review approved
86+
- [X] (R) Graduation criteria is in place
87+
- [X] (R) Production readiness review completed
88+
- [X] (R) Production readiness review approved
8989
- [ ] "Implementation History" section is up-to-date for milestone
9090
- [ ] User-facing documentation has been created in [kubernetes/website], for
9191
publication to [kubernetes.io]
@@ -225,6 +225,8 @@ do not assume Kubernetes has a single continguous Pod CIDR.
225225

226226
### New Resource
227227

228+
This KEP proposes adding a new built-in API called `ClusterCIDRConfig`.
229+
228230
```go
229231
type ClusterCIDRConfig struct {
230232
metav1.TypeMeta
@@ -238,7 +240,7 @@ type ClusterCIDRConfigSpec struct {
238240
// This defines which nodes the config is applicable to. A nil selector can
239241
// be applied to any node.
240242
// +optional
241-
NodeSelector *v1.LabelSelector
243+
NodeSelector *v1.NodeSelector
242244

243245
// This defines the IPv4 CIDR assignable to nodes selected by this config.
244246
// +optional
@@ -275,9 +277,10 @@ type ClusterCIDRConfigStatus struct {
275277

276278
```32 - IPv4.PerNodeMaskSize == 128 - IPv6.PerNodeMaskSize```
277279

278-
- Each node will be assigned all Pod CIDRs from a matching config.
279-
Consider the following example:
280-
280+
- Each node will be assigned all Pod CIDRs from a matching config. That is to
281+
say, you cannot assing only IPv4 addresses from a `ClusterCIDRConfig` which
282+
specifies both IPv4 and IPv6. Consider the following example:
283+
281284
```go
282285
{
283286
IPv4: {
@@ -294,12 +297,22 @@ type ClusterCIDRConfigStatus struct {
294297
Pod CIDRs can be partitioned from the IPv4 CIDR. The remaining IPv6 Pod
295298
CIDRs may be used if referenced in another `ClusterCIDRConfig`.
296299

297-
- In case of multiple matching ranges, attempt to break ties with the
300+
- When there are multiple `ClusterCIDRConfig` resources in the cluster, first
301+
collect the list of applicable `ClusterCIDRConfig`. A `ClusterCIDRConfig` is
302+
applicable if its `NodeSelector` matches the `Node` being allocated, and if
303+
it has free CIDRs to allocate.
304+
305+
A nil `NodeSelector` functions as a default that applies to all nodes. This
306+
should be the fall-back and not take precedence if any other range matches.
307+
If there are multiple default ranges, ties are broken using the scheme
308+
outlined below.
309+
310+
In ths case of multiple matching ranges, attempt to break ties with the
298311
following rules:
299312
1. Pick the `ClusterCIDRConfig` whose `NodeSelector` matches the most
300-
labels on the `Node`. For example, `{'node.kubernetes.io/instance-type':
301-
'medium', 'rack': 'rack1'}` before `{'node.kubernetes.io/instance-type':
302-
'medium'}`.
313+
labels/fields on the `Node`. For example,
314+
`{'node.kubernetes.io/instance-type': 'medium', 'rack': 'rack1'}` before
315+
`{'node.kubernetes.io/instance-type': 'medium'}`.
303316
1. Pick the `ClusterCIDRConfig` with the fewest Pod CIDRs allocatable. For
304317
example, `{CIDR: "10.0.0.0/16", PerNodeMaskSize: "16"}` (1 possible Pod
305318
CIDR) is picked before `{CIDR: "192.168.0.0/20", PerNodeMaskSize: "22"}`
@@ -308,11 +321,6 @@ type ClusterCIDRConfigStatus struct {
308321
For example, `27` (32 IPs) picked before `25` (128 IPs).
309322
1. Break ties arbitrarily.
310323

311-
- An empty `NodeSelector` functions as a default that applies to all nodes.
312-
This should be the fall-back and not take precedence if any other range
313-
matches. If there are multiple default ranges, ties are broken using the
314-
scheme outlined above.
315-
316324
- When breaking ties between matching `ClusterCIDRConfig`, if the most
317325
applicable (as defined by the tie-break rules) has no more free allocations,
318326
attempt to allocate from the next highest matching `ClusterCIDRConfig`. For
@@ -329,21 +337,21 @@ type ClusterCIDRConfigStatus struct {
329337
to the tie-break rules.
330338
```go
331339
{
332-
NodeSelector: { MatchLabels: { "node": "n1", "rack": "rack1" } },
340+
NodeSelector: { MatchExpressions: { "node": "n1", "rack": "rack1" } },
333341
IPv4: {
334342
CIDR: "10.5.0.0/16",
335343
PerNodeMaskSize: 26,
336344
}
337345
},
338346
{
339-
NodeSelector: { MatchLabels: { "node": "n1" } },
347+
NodeSelector: { MatchExpressions: { "node": "n1" } },
340348
IPv4: {
341349
CIDR: "192.168.128.0/17",
342350
PerNodeMaskSize: 28,
343351
}
344352
},
345353
{
346-
NodeSelector: { MatchLabels: { "node": "n1" } },
354+
NodeSelector: { MatchExpressions: { "node": "n1" } },
347355
IPv4: {
348356
CIDR: "192.168.64.0/20",
349357
PerNodeMaskSize: 28,
@@ -363,7 +371,7 @@ type ClusterCIDRConfigStatus struct {
363371

364372
- On deletion of the `ClusterCIDRConfig`, the controller checks to see if any
365373
Nodes are using `PodCIDRs` from this range -- if so it keeps the finalizer
366-
in place and periodically polls Nodes. When all Nodes using this
374+
in place and waits for the Nodes to be deleted. When all Nodes using this
367375
`ClusterCIDRConfig` are deleted, the finalizer is removed.
368376

369377
#### Example: Allocations
@@ -451,11 +459,11 @@ nodes we expect.
451459
452460
#### Dual-Stack Support
453461
454-
To assign both IPv4 and IPv6 Pod CIDRs to a Node, the `IPv4` and `IPv6` fields
455-
must be both set on the object. The controller does not have an in-built notion
456-
of single-stack or dual-stack clusters. It uses the tie-break rules specified
457-
[above](#expected-behavior) to pick a `ClusterCIDRConfig` from which to allocate
458-
Pod CIDRs for each Node.
462+
The decision of whether to assign only IPv4, only IPv6, or both depends on the
463+
CIDRs configured in a `ClusterCIDRConfig` object. As described
464+
[above](#expected-behavior), the controller creates an ordered list of
465+
`ClusterCIDRConfig` resources which apply to a given `Node` and allocates from
466+
the first matching `ClusterCIDRConfig` with CIDRs available.
459467
460468
The controller makes no guarantees that all Nodes are single-stack or that all
461469
Nodes are dual-stack. This is to specifically allow users to upgrade existing
@@ -497,12 +505,20 @@ from the existing NodeIPAM controller:
497505
necessary.
498506
- The "created-from-flags-\<hash\>" object will always be created as long
499507
as the flags are set. The hash is arbitrarily assigned.
500-
- If an object with the name "created-from-flags-\<hash>" already exists,
501-
but it does not match the flag values, the controller will delete it and
502-
create a new object. The controller will ensure (on startup) that there
503-
is only one non-deleted `ClusterCIDRConfig` with the name
504-
"create-from-flags\<hash>". This will allow users to change the flag
505-
values and stop using the old values.
508+
- If an un-deleted object with the name "created-from-flags-*" already
509+
exists, but it does not match the flag values, the controller will
510+
delete it and create a new object. The controller will ensure (on
511+
startup) that there is only one non-deleted `ClusterCIDRConfig` with the
512+
name "create-from-flags-\<hash>". The "\<hash>" at the end of the name
513+
allows the controller to have multiple "created-from-flags" objects
514+
present (e.g. blocked on deletion because of the finalizer), without
515+
blocking startup.
516+
- If some `Node`s were allocated Pod CIDRs from the old
517+
"created-from-flags-\<hash>" object, they will follow the standard
518+
lifecycle for deleting a `ClusterCIDRConfig` object. The
519+
"created-from-flag-\<hash>" object the `Nodes` are allocated from will
520+
remain pending deletion (waiting for its finalizer to be cleared) until
521+
all `Nodes` using those ranges are re-created.
506522
- Fetch list of `Node`s. Check each node for `PodCIDRs`
507523
- If `PodCIDR` is set, mark the allocation in the internal data structure
508524
and store this association with the node.
@@ -512,13 +528,12 @@ from the existing NodeIPAM controller:
512528
After processing all nodes, allocate ranges to any nodes without Pod
513529
CIDR(s) [Same logic as Node Added event]
514530
515-
#### Reconciliation Loop
516-
517-
This go-routine will watch for cleanup operations and failed allocations and
518-
continue to try them in the background.
531+
#### Processing Queue
519532
520-
For example if a Node can't be allocated a PodCIDR, it will be periodically
521-
retried until it can be allocated a range or it is deleted.
533+
The controller will maintain a queue of events that it is processing. `Node`
534+
additions and `ClusterCIDRConfig` additions will be appended to the queue.
535+
Similarly, Node allocations which failed due to insufficient CIDRs can be
536+
retried by adding them back on to the queue (with exponential backoff).
522537
523538
#### Event Watching Loops
524539
@@ -729,6 +744,12 @@ This section must be completed when targeting alpha to a release.
729744
Pick one of these and delete the rest.
730745
-->
731746
747+
- [X] Feature Gate
748+
- Feature gate name: ClusterCIDRConfig
749+
- Components depending on the feature gate: kube-controller-manager
750+
- The feature gate will control whether the new controller can even be
751+
used, while the kube-controller-manager flag below will pick the
752+
active controller.
732753
- [X] Other
733754
- Describe the mechanism:
734755
- The feature is enabled by setting the kube-controller-manager flag
@@ -755,8 +776,8 @@ too only for nodes created after that point).
755776
Yes, users can switch back to the old controller and delete the
756777
`ClusterCIDRConfig` objects. However, if any Nodes were allocated `PodCIDR` by
757778
the new controller, those allocation will persist for the lifetime of the Node.
758-
Users will have to restart their Nodes to trigger another `PodCIDR` allocation
759-
(this time performed by the old controller.)
779+
Users will have to recreate their Nodes to trigger another `PodCIDR` allocation
780+
(this time performed by the old controller).
760781
761782
The should not be any effect on running workloads. The nodes will continue to
762783
use their allocated `PodCIDR` even if the underlying `ClusterCidrConfig` object
@@ -765,15 +786,15 @@ is forceably deleted.
765786
###### What happens if we reenable the feature if it was previously rolled back?
766787
767788
The controller is expected to read the existing set of `ClusterCIDRConfig` as
768-
well as the existing Node `PodCIDR` allocations and allocate new PorCIDRs
789+
well as the existing Node `PodCIDR` allocations and allocate new PodCIDRs
769790
appropriately.
770791
771792
###### Are there any tests for feature enablement/disablement?
772793
773-
Yes, some integraiotn tets will be added to test this case. They will test the
774-
scenario where some Nodes already have PodCIDRs allocated to them (potentially
775-
from CIDRs not tracked by any `ClusterCIDRConfig`). THis should be sufficient to
776-
cover the enablement/disablment scenarios.
794+
Not yet, they will be added as part of the graduation to alpha. They will test
795+
the scenario where some Nodes already have PodCIDRs allocated to them
796+
(potentially from CIDRs not tracked by any `ClusterCIDRConfig`). This should be
797+
sufficient to cover the enablement/disablment scenarios.
777798
778799
### Rollout, Upgrade and Rollback Planning
779800

keps/sig-network/2594-multiple-cluster-cidrs/kep.yaml

Lines changed: 8 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -4,15 +4,15 @@ authors:
44
- "@rahulkjoshi"
55
- "@sdmodi"
66
owning-sig: sig-network
7-
status: provisional
7+
status: implementable
88
creation-date: 2021-03-22
99
reviewers:
1010
- "@mskrocki"
1111
approvers:
1212
- "@thockin"
1313
- "@aojea"
1414
prr-approvers:
15-
- TBD
15+
- "@wojtek-t"
1616

1717
# The target maturity stage in the current dev cycle for this KEP.
1818
stage: alpha
@@ -27,3 +27,9 @@ milestone:
2727
alpha: "v1.23"
2828
beta: "v1.24"
2929
stable: "v1.26"
30+
31+
feature-gates:
32+
- name: ClusterCIDRConfig
33+
components:
34+
- kube-controller-manager
35+
disable-supported: true

0 commit comments

Comments
 (0)