Skip to content

Commit e0f7bba

Browse files
committed
PRR for 2593-multiple-cluster-cidrs
1 parent 335eaa9 commit e0f7bba

File tree

3 files changed

+153
-13
lines changed

3 files changed

+153
-13
lines changed
Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,5 @@
11
kep-number: 2593
22
alpha:
33
approver: "@wojtek-t"
4+
beta:
5+
approver: "@wojtek-t"

keps/sig-network/2593-multiple-cluster-cidrs/README.md

Lines changed: 149 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -595,10 +595,11 @@ N/A
595595
596596
##### Integration tests
597597
598-
- Verify finalizers and statuses are persisted appropriately
599-
- Test watchers
600-
- Ensure that the controller handles the feature being disabled and re-enabled:
601-
- Test with some Nodes already having `PodCIDR` allocations
598+
- `TestIPAMMultiCIDRRangeAllocatorCIDRAllocate`: https://github.com/kubernetes/kubernetes/blob/b7ad17978eaba508c560f81bab2fb9d0b256e469/test/integration/clustercidr/ipam_test.go#L44
599+
- `TestIPAMMultiCIDRRangeAllocatorCIDRRelease`: https://github.com/kubernetes/kubernetes/blob/b7ad17978eaba508c560f81bab2fb9d0b256e469/test/integration/clustercidr/ipam_test.go#L131
600+
- `TestIPAMMultiCIDRRangeAllocatorClusterCIDRDelete`: https://github.com/kubernetes/kubernetes/blob/b7ad17978eaba508c560f81bab2fb9d0b256e469/test/integration/clustercidr/ipam_test.go#L208
601+
- `TestIPAMMultiCIDRRangeAllocatorClusterCIDRTerminate`: https://github.com/kubernetes/kubernetes/blob/b7ad17978eaba508c560f81bab2fb9d0b256e469/test/integration/clustercidr/ipam_test.go#L304
602+
- `TestIPAMMultiCIDRRangeAllocatorClusterCIDRTieBreak`: https://github.com/kubernetes/kubernetes/blob/b7ad17978eaba508c560f81bab2fb9d0b256e469/test/integration/clustercidr/ipam_test.go#L389
602603
603604
##### e2e tests
604605
@@ -774,10 +775,10 @@ appropriately.
774775
775776
###### Are there any tests for feature enablement/disablement?
776777
777-
Not yet, they will be added as part of the graduation to alpha. They will test
778+
Not yet, they will be added as part of the graduation to beta. They will test
778779
the scenario where some Nodes already have PodCIDRs allocated to them
779780
(potentially from CIDRs not tracked by any `ClusterCIDR`). This should be
780-
sufficient to cover the enablement/disablment scenarios.
781+
sufficient to cover the enablement/disablement scenarios.
781782
782783
### Rollout, Upgrade and Rollback Planning
783784
@@ -791,13 +792,25 @@ This section must be completed when targeting beta to a release.
791792
Try to be as paranoid as possible - e.g., what if some components will restart
792793
mid-rollout?
793794
-->
795+
kube-controller-manager needs to be restarted, the CIDR allocator will be switched
796+
to the new range allocator based on `--cidr-allocator-type=MultiCIDRRangeAllocator`
797+
flag. Rollout can fail if the `--cluster-cidr` field is updated during the
798+
kube-controller-manager restart. kube-controller-manager will crashloop and new
799+
nodes will not have any PodCIDRs assigned and thus will not be ready. This
800+
behavior is consistent with the existing reange-allocator behavior. The mitigation
801+
is to set `--cluster-cidr` back to the original value.
802+
803+
Already running workloads will not be impacted.
794804
795805
###### What specific metrics should inform a rollback?
796806
797807
<!--
798808
What signals should users be paying attention to when the feature is young
799809
that might indicate a serious problem?
800810
-->
811+
multicidrset_allocation_tries_per_request metric must be monitored, if the value
812+
is high(>25) in the buckets greater than 125 would indicate a large number of
813+
failures while allocating the cidrs.
801814
802815
###### Were upgrade and rollback tested? Was the upgrade->downgrade->upgrade path tested?
803816
@@ -807,11 +820,113 @@ Longer term, we may want to require automated upgrade/rollback tests, but we
807820
are missing a bunch of machinery and tooling and can't do that now.
808821
-->
809822
823+
Upgrade->downgrade->upgrade testing was done manually using the following steps:
824+
825+
Build and run the latest version of Kubernetes using Kind:
826+
```
827+
$ kind build node-image
828+
$ kind create cluster --image kindest/node:latest --config ~/k8sexample/kind/cluster-config-without-multipod.yaml
829+
...
830+
...
831+
sarveshr@sarveshr:~$ kubectl --context kind-multi-pod-cidr get node
832+
NAME STATUS ROLES AGE VERSION
833+
multi-pod-cidr-control-plane Ready control-plane 21m v1.27.0-alpha.1.226+1ded677b2a77a7
834+
```
835+
836+
Initially the feature gate `MultiCIDRRangeAllocator` is set to `false`
837+
838+
```
839+
$ vi cluster-config-without-multipod.yaml
840+
841+
kind: Cluster
842+
apiVersion: kind.x-k8s.io/v1alpha4
843+
name: multi-pod-cidr
844+
featureGates:
845+
"MultiCIDRRangeAllocator": false
846+
runtimeConfig:
847+
"api/alpha": "true"
848+
networking:
849+
apiServerAddress: "127.0.0.1"
850+
apiServerPort: 6443
851+
podSubnet: "10.244.0.0/16"
852+
serviceSubnet: "10.96.0.0/12"
853+
```
854+
855+
Make sure no ClusterCIDR objects are created
856+
857+
```
858+
$ kubectl --context kind-multi-pod-cidr get cc -A
859+
No resources found
860+
```
861+
862+
Upgrade to use MultiCIDRRangeAllocator, by logging into the Control plane node
863+
and updating the kubeadm manifests as follows:
864+
865+
1. Update `/etc/kubernetes/manifests/kube-apiserver.yaml` by modifying
866+
`--feature-gates=MultiCIDRRangeAllocator=false` to `--feature-gates=MultiCIDRRangeAllocator=true`
867+
2. Update `/etc/kubernetes/manifests/kube-controller-manager.yaml` by modifying
868+
`--cidr-allocator-type=RangeAllocator` to `--cidr-allocator-type=MultiCIDRRangeAllocator` and
869+
`--feature-gates=MultiCIDRRangeAllocator=false` to `--feature-gates=MultiCIDRRangeAllocator=true`
870+
871+
kubelet will restart the kube-apiserver and kube-controller-manager pods as these are static manifests.
872+
873+
Validate that all pods are Running
874+
875+
```
876+
$ kubectl --context kind-multi-pod-cidr get po -A
877+
NAMESPACE NAME READY STATUS RESTARTS AGE
878+
kube-system coredns-56f4c55bf9-b7bds 1/1 Running 0 3m26s
879+
kube-system coredns-56f4c55bf9-bgcxk 1/1 Running 0 3m26s
880+
kube-system etcd-multi-pod-cidr-control-plane 1/1 Running 0 3m38s
881+
kube-system kindnet-cj5sg 1/1 Running 2 (71s ago) 3m26s
882+
kube-system kube-apiserver-multi-pod-cidr-control-plane 1/1 Running 0 72s
883+
kube-system kube-controller-manager-multi-pod-cidr-control-plane 1/1 Running 0 19s
884+
kube-system kube-proxy-gwh45 1/1 Running 0 3m26s
885+
kube-system kube-scheduler-multi-pod-cidr-control-plane 1/1 Running 1 (95s ago) 3m38s
886+
local-path-storage local-path-provisioner-c8855d4bb-qxw7g 1/1 Running 0 3m26s
887+
```
888+
889+
Validate whether the Default ClusterCIDR object is created
890+
891+
```
892+
$ kubectl --context kind-multi-pod-cidr get cc -A
893+
NAME PERNODEHOSTBITS IPV4 IPV6 AGE
894+
default-cluster-cidr 8 10.244.0.0/16 <none> 21s
895+
```
896+
897+
Rollback to use RangeAllocator, by logging into the Control plane node
898+
and updating the kubeadm manifests as follows:
899+
900+
1. Update `/etc/kubernetes/manifests/kube-apiserver.yaml` by modifying
901+
`--feature-gates=MultiCIDRRangeAllocator=true` to `--feature-gates=MultiCIDRRangeAllocator=false`
902+
2. Update `/etc/kubernetes/manifests/kube-controller-manager.yaml` by modifying
903+
`--cidr-allocator-type=MultiCIDRRangeAllocator` to `--cidr-allocator-type=RangeAllocator` and
904+
`--feature-gates=MultiCIDRRangeAllocator=true` to `--feature-gates=MultiCIDRRangeAllocator=false`
905+
906+
kubelet will restart the kube-apiserver and kube-controller-manager pods as these are static manifests.
907+
908+
Validate that all pods are Running
909+
910+
```
911+
$ kubectl --context kind-multi-pod-cidr get po -A
912+
NAMESPACE NAME READY STATUS RESTARTS AGE
913+
kube-system coredns-56f4c55bf9-b7bds 1/1 Running 0 16m
914+
kube-system coredns-56f4c55bf9-bgcxk 1/1 Running 0 16m
915+
kube-system etcd-multi-pod-cidr-control-plane 1/1 Running 0 16m
916+
kube-system kindnet-cj5sg 1/1 Running 4 (24s ago) 16m
917+
kube-system kube-apiserver-multi-pod-cidr-control-plane 1/1 Running 0 24s
918+
kube-system kube-controller-manager-multi-pod-cidr-control-plane 1/1 Running 0 11s
919+
kube-system kube-proxy-gwh45 1/1 Running 0 16m
920+
kube-system kube-scheduler-multi-pod-cidr-control-plane 1/1 Running 2 (49s ago) 16m
921+
local-path-storage local-path-provisioner-c8855d4bb-qxw7g 1/1 Running 0 16m
922+
```
923+
810924
###### Is the rollout accompanied by any deprecations and/or removals of features, APIs, fields of API types, flags, etc.?
811925
812926
<!--
813927
Even if applying deprecation policies, they may still surprise some users.
814928
-->
929+
No
815930
816931
### Monitoring Requirements
817932
@@ -827,15 +942,28 @@ checking if there are objects with field X set) may be a last resort. Avoid
827942
logs or events for this purpose.
828943
-->
829944
945+
`multicidrset_cidrs_allocations_total` metric with value > 0 will indicate enablement.
946+
The operator can also check if ClusterCIDR objects are created, this can be done
947+
using `kubectl get cc -A`
948+
830949
###### What are the SLIs (Service Level Indicators) an operator can use to determine the health of the service?
831950
832-
We will carry-over existing metrics to the new controller: https://github.com/kubernetes/kubernetes/blob/master/pkg/controller/nodeipam/ipam/cidrset/metrics.go#L26-L68
951+
We have carried-over existing metrics to the new controller: https://github.com/kubernetes/kubernetes/blob/master/pkg/controller/nodeipam/ipam/cidrset/metrics.go#L26-L68
833952
834953
They are:
835-
- cidrset_cidrs_allocations_total - Count of total number of CIDR allcoations
836-
- cidrset_cidrs_releases_total - Count of total number of CIDR releases
837-
- cidrset_usage_cidrs - Gauge messuring the percentage of the provided CIDRs
838-
that have been allocated
954+
- multicidrset_cidrs_allocations_total - Count of total number of CIDR allocations
955+
- multicidrset_cidrs_releases_total - Count of total number of CIDR releases
956+
- multicidrset_usage_cidrs - Gauge measuring the percentage of the provided CIDRs
957+
that have been allocated.
958+
- multicidrset_allocation_tries_per_request - Histogram measuring CIDR allocation
959+
tries per request.
960+
961+
The health of the service can be determined if the *_total metrics increase with
962+
the number of nodes and the multicidrset_usage_cidrs metric is greater or equal
963+
than the number of nodes.
964+
965+
Note that this service is optional and would allocate the CIDRs only when
966+
`--allocate-node-cidrs` is set to true in kube-controller-manager.
839967
840968
###### What are the reasonable SLOs (Service Level Objectives) for the above SLIs?
841969
@@ -848,13 +976,15 @@ high level (needs more precise definitions) those may be things like:
848976
job creation time) for cron job <= 10%
849977
- 99,9% of /health requests per day finish with 200 code
850978
-->
979+
N/A
851980
852981
###### Are there any missing metrics that would be useful to have to improve observability of this feature?
853982
854983
<!--
855984
Describe the metrics themselves and the reasons why they weren't added (e.g., cost,
856985
implementation difficulties, etc.).
857986
-->
987+
TBD
858988
859989
### Dependencies
860990
@@ -878,6 +1008,7 @@ and creating new ones, as well as about cluster-level services (e.g. DNS):
8781008
- Impact of its outage on the feature:
8791009
- Impact of its degraded performance or high-error rates on the feature:
8801010
-->
1011+
No
8811012
8821013
### Scalability
8831014
@@ -955,6 +1086,13 @@ details). For now, we leave it here.
9551086
-->
9561087
9571088
###### How does this feature react if the API server and/or etcd is unavailable?
1089+
MultiCIDRRangeAllocator is a part of the kube-controller-manager and if the
1090+
kube-controller-manager is not able to connect to the apiserver, the cluster
1091+
might have greater problems.
1092+
1093+
The MultiCIDRRangeAllocator will not be able to fetch node objects and thus the
1094+
node will remain in NotReady state as it will have no PodCIDRs assigned. This is
1095+
in line with the current RangeAllocator behavior.
9581096
9591097
###### What are other known failure modes?
9601098

keps/sig-network/2593-multiple-cluster-cidrs/kep.yaml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -13,12 +13,12 @@ approvers:
1313
- "@aojea"
1414

1515
# The target maturity stage in the current dev cycle for this KEP.
16-
stage: alpha
16+
stage: beta
1717

1818
# The most recent milestone for which work toward delivery of this KEP has been
1919
# done. This can be the current (upcoming) milestone, if it is being actively
2020
# worked on.
21-
latest-milestone: "v1.25"
21+
latest-milestone: "v1.27"
2222

2323
# The milestone at which this feature was, or is targeted to be, at each stage.
2424
milestone:

0 commit comments

Comments
 (0)