Skip to content

Commit bf9dd1f

Browse files
committed
address PRR reviews
1 parent 59691dc commit bf9dd1f

File tree

1 file changed

+66
-14
lines changed
  • keps/sig-network/1880-multiple-service-cidrs

1 file changed

+66
-14
lines changed

keps/sig-network/1880-multiple-service-cidrs/README.md

Lines changed: 66 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -742,18 +742,38 @@ A rollout failure can impact the new apiserver to not be able to start or the cl
742742

743743
###### What specific metrics should inform a rollback?
744744

745-
If the apiserver is not able to start, can be detected during the rollout phase, not metrics needed.
746-
Users not be able to create Services can be monitored via two sets of metrics:
747-
1. IP allocator pkg/registry/core/service/ipallocator/metrics.go
748-
2. IP repair loop pkg/registry/core/service/ipallocator/controller/metrics.go
745+
The feature impact the apiserver bootstrap process, specially about the kubernetes.default IP address assignment,
746+
in case the apiserver is not able to start after enabling the feature, it is a strong indicated that a rollback is required.
747+
748+
Another metrics that can indicate a rollback are `clusterip_allocator.allocation_errors_total`, `clusterip_repair.ip_errors_total` or `clusterip_repair.reconcile_errors_total`, definitions can be found on
749+
- IP allocator pkg/registry/core/service/ipallocator/metrics.go
750+
- IP repair loop pkg/registry/core/service/ipallocator/controller/metrics.go
749751

750752
###### Were upgrade and rollback tested? Was the upgrade->downgrade->upgrade path tested?
751753

752-
<!--
753-
Describe manual testing that was done and the outcomes.
754-
Longer term, we may want to require automated upgrade/rollback tests, but we
755-
are missing a bunch of machinery and tooling and can't do that now.
756-
-->
754+
There are integration tests these behaviors:
755+
756+
test/integration/servicecidr/allocator_test.go TestSkewedAllocators
757+
1. start one apiserver with the feature gate enabled and other with the feature gate disabled
758+
2. create 5 Services against each apiserver
759+
3. the Services created against the apiserver with the bitmap will not allocate the IPAddresses automatically,
760+
but the repair loop of the apiserver with the feature enable must reconcile these new Services and create the IPAddresses
761+
762+
test/integration/servicecidr/allocator_test.go TestMigrateService
763+
1. store a Service with ClusterIP directly in etcd
764+
2. start an apiserver with the new feature enable
765+
3. the reconciler must detect this stored service and create the corresponding IPAddress
766+
767+
To be added in Beta https://github.com/kubernetes/kubernetes/pull/122047
768+
test/integration/servicecidr/feature_enable_disable.go TestEnableDisableServiceCIDR
769+
1. start apiserver with the feature disabled
770+
2. create new services and assert are correct
771+
3. shutdown apiserver
772+
4. start new apiserver with feature enabled
773+
5. create new services and assert previous and new ones are correct
774+
6. shutdown apiserver
775+
7. start new apiserver with feature disabled
776+
8. create new services and assert previous and new ones are correct
757777

758778
###### Is the rollout accompanied by any deprecations and/or removals of features, APIs, fields of API types, flags, etc.?
759779

@@ -764,12 +784,38 @@ No
764784

765785
###### How can an operator determine if the feature is in use by workloads?
766786

767-
A new group of metrics are added for the new Cluster IP allocators
768-
1. IP allocator pkg/registry/core/service/ipallocator/metrics.go
769-
2. IP repair loop pkg/registry/core/service/ipallocator/controller/metrics.go
787+
A group of metrics are added to each new Cluster IP allocators, labeled with the
788+
correspoding ServiceCIDR associateed to the allocator:
789+
790+
`clusterip_allocator.allocated_ips`
791+
`clusterip_allocator.available_ips`
792+
`clusterip_allocator.allocation_total`
793+
794+
See IP allocator pkg/registry/core/service/ipallocator/metrics.go for definitions.
770795

771796
Users can also obtain the `ServiceCIDR` and `IPAddress` objects that are only available
772-
if the feature is enabled.
797+
if the feature is enabled, per each Service with a ClusterIP associated there must be a
798+
corresponding IPAddress object. Per example:
799+
800+
```
801+
$ kubectl get service kubernetes
802+
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
803+
kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 17d
804+
```
805+
806+
```
807+
$ kubectl get ipaddress 10.96.0.1
808+
NAME PARENTREF
809+
10.96.0.1 services/default/kubernetes
810+
```
811+
812+
All the ServiceCIDRs ranges configured must be present, included those ones created from the
813+
apiserver flags to initialize the cluster, with the special name `kubernetes`:
814+
```
815+
$ kubectl get servicecidr
816+
NAME CIDRS AGE
817+
kubernetes 10.96.0.0/28 17d
818+
```
773819

774820
###### How can someone using this feature know that it is working for their instance?
775821

@@ -800,6 +846,8 @@ Recall that end users cannot usually observe component logs or access metrics.
800846

801847
###### What are the reasonable SLOs (Service Level Objectives) for the enhancement?
802848

849+
- 99.9% of ClusterIP allocations per day take less than 500 ms.
850+
- 100% of ClusterIP allocations succeed.
803851

804852
###### What are the SLIs (Service Level Indicators) an operator can use to determine the health of the service?
805853

@@ -810,6 +858,10 @@ Recall that end users cannot usually observe component logs or access metrics.
810858

811859
###### Are there any missing metrics that would be useful to have to improve observability of this feature?
812860

861+
Idially we should have metrics to detect overlaps or IP conflicts with the Pods and Nodes network, but this was
862+
[heavily discussed on the SIG](https://docs.google.com/document/d/1Dx7Qu5rHGaqoWue-JmlwYO9g_kgOaQzwaeggUsLooKo/edit#heading=h.rkh0f6t1c3vc) and we concluded that is not possible to get the Pod and Nodes network information reliably,
863+
so any metrics of this kind will be misleading.
864+
813865
### Dependencies
814866

815867
###### Does this feature depend on any specific services running in the cluster?
@@ -825,7 +877,7 @@ See Drawbacks section
825877

826878
When creating a Service this will require to create an IPAddress object,
827879
previously we updated a bitmap on etcd, so we keep the number of request
828-
but the size of the objects stored is reduced considerable.
880+
but the size of the objects stored is reduced considerably.
829881

830882
###### Will enabling / using this feature result in introducing new API types?
831883

0 commit comments

Comments
 (0)