Skip to content

Commit 7aa56ea

Browse files
authored
Merge pull request #3 from bridgetkromhout/updates-to-563
Additional updates to PRR docs
2 parents 8b6188e + 6ebafe2 commit 7aa56ea

File tree

1 file changed

+105
-74
lines changed

1 file changed

+105
-74
lines changed

keps/sig-network/563-dual-stack/README.md

Lines changed: 105 additions & 74 deletions
Original file line numberDiff line numberDiff line change
@@ -1392,26 +1392,35 @@ This capability will move to stable when the following criteria have been met.
13921392
13931393
* **Does enabling the feature change any default behavior?**
13941394
Pods and Services will remain single-stack until cli flags have been modified
1395-
as described in this KEP. Once modified, existing and new services will remain
1396-
single-stack until user requests otherwise. Pods will become dual-stack
1397-
however will maintain the same ipfamily used before enabling feature flag.
1395+
as described in this KEP. Existing and new services will remain single-stack
1396+
until user requests otherwise. Pods will become dual-stack once CNI is
1397+
configured for dual-stack.
13981398
13991399
* **Can the feature be disabled once it has been enabled (i.e. can we roll back
14001400
the enablement)?**
1401-
Yes. If you decide to turn off dual-stack after turning on, ensure all
1402-
services are converted to single-stack first (switch ipfamily to single-stack
1403-
on all services) and then disable the feature. Remove the CLI parameters, as
1404-
an older client won't see or be able to use the new fields. When the user
1405-
disables dual-stack from the controller manager, new endpoints will no longer
1406-
carry two sets, while existing endpoints may not be updated.
1401+
Yes. If you decide to turn off dual-stack after turning on:
1402+
1. Ensure all services are converted to single-stack first (downgraded to
1403+
single-stack as described in this KEP)
1404+
2. Remove the CLI parameters.
1405+
3. Disable the feature.
1406+
1407+
Notes:
1408+
1. When the user disables dual-stack from the controller manager,
1409+
endpointSlices will no longer be created for the alternative IP family.
1410+
2. Existing endpointSlices for the alternative family will not be
1411+
automatically removed; this is left to the operator.
1412+
3. Existing dual-stack service configurations will remain in place when
1413+
the feature is disabled, but no routing will happen and no
1414+
endpointSlices will be created while the feature is disabled.
14071415
14081416
* **What happens if we reenable the feature if it was previously rolled back?**
1409-
Whatever the user has defined will not change without intervention. If you
1410-
disable dual-stack from the controller manager, the service will be given
1411-
single-stack endpoints. If you enable dual-stack again, it's as if you're
1412-
enabling it for the first time on a cluster. We don't load balance across IP
1413-
families, and with no selectors we don't get endpoints. If you use the
1414-
feature flag to turn off dual-stack, we do not edit user input.
1417+
1418+
If the system has no existing dual-stack services, then it will be treated
1419+
as a new enablement. However, if dual-stack services exist in the cluster,
1420+
the controller manager will automatically update endpoints and endpointSlices
1421+
to match the service IP families. When the feature is reenabled, kube-proxy
1422+
will automatically start updating iptables/ipvs rules for the alternative
1423+
ipfamily, for existing and new dual-stack services.
14151424
14161425
* **Are there any tests for feature enablement/disablement?**
14171426
The feature is being tested using integration tests with gate on/off. The
@@ -1431,20 +1440,34 @@ This capability will move to stable when the following criteria have been met.
14311440
cluster networking was configured.
14321441
14331442
Existing workloads are not expected to be impacted during rollout. When you
1434-
disable dual-stack, existing routes aren't deleted. A component restart
1435-
during rollout might delay generating endpoint and endpoint slices for
1436-
alternative IP families. If there are *new* workloads that depend on them,
1437-
they will fail.
1438-
1439-
Imbalance is possible if a replica set scales up or ipv6 gets turned off and
1440-
the endpoint controller has not yet updated, but iptables and the service
1441-
controller manager won't look at endpoints that are flagged off. (Endpoints
1442-
can exist and not be used.) For services, the user will get an error
1443-
immediately. If the existing rules are removed but new ones don't resolve
1444-
correctly yet, then it has a chance to resolve on the next loop.
1443+
disable dual-stack, existing services aren't deleted, but routes for
1444+
alternative families are disabled. A component restart during rollout might
1445+
delay generating endpoints and endpointSlices for alternative IP families.
1446+
If there are *new* workloads that depend on the endpointSlices, these
1447+
workloads will fail until the endpoint slices are created.
1448+
1449+
Because of the nature of the gradual rollout (node by node) of the dual-stack
1450+
feature, endpoints for the alternative IP family will not be created for
1451+
nodes where the feature is not yet enabled. That will cause unequal
1452+
distribution of alternative IP traffic. To prevent that, we advise the
1453+
following steps:
1454+
1455+
1. (preferred) Do not create dual-stack services until the rollout of the
1456+
dual-stack feature across the cluster is complete.
1457+
or
1458+
2. Cordon and drain the node(s) where the feature is not enabled
14451459
14461460
* **What specific metrics should inform a rollback?**
1447-
N/A
1461+
1462+
Failures that could exist include an imbalance or a failure in the
1463+
deployment. For imbalance, operators are advised to count the number of
1464+
alternative endpoint inside the endpoint slices, and ensure that count
1465+
equals the number of pods. (If the number is not equal, take steps to
1466+
correct as described above.)
1467+
1468+
Failure in the deployment usually indicates misconfiguration and is
1469+
characterized by components being unavailable (such as kube-apiserver).
1470+
14481471
14491472
* **Were upgrade and rollback tested? Was the upgrade->downgrade->upgrade path tested?**
14501473
We did manual testing of a cluster turning it off and on to explore
@@ -1453,8 +1476,7 @@ This capability will move to stable when the following criteria have been met.
14531476
14541477
* **Is the rollout accompanied by any deprecations and/or removals of features, APIs,
14551478
fields of API types, flags, etc.?**
1456-
Enabling this without configuring the CLI options will not change any default
1457-
behavior.
1479+
No; we're not deprecating or removing any fields.
14581480
14591481
### Monitoring Requirements
14601482
@@ -1467,6 +1489,9 @@ fields of API types, flags, etc.?**
14671489
kubectl get services --all-namespaces -ogo-template='{{range .items}}{{.spec.ipFamilyPolicy}}{{"\n"}}{{end}}' | grep -v SingleStack
14681490
```
14691491
1492+
Using this check, one can determine how many services have been created with
1493+
dual-stack preferred or required.
1494+
14701495
* **What are the SLIs (Service Level Indicators) an operator can use to determine
14711496
the health of the service?**
14721497
Dual-stack networking is a functional addition, not a service with SLIs. Use
@@ -1475,38 +1500,24 @@ the health of the service?**
14751500
IPv4/IPv6 dual-stack](https://kubernetes.io/docs/tasks/network/validate-dual-stack/)
14761501
to ensure that node addressing, pod addressing, and services are configured
14771502
correctly. If dual-stack services are created, they have passed validation.
1503+
Metrics to check could include pods stuck in pending; look in the event logs
1504+
to determine if it's a CNI issue which may cause a delay of IP address
1505+
allocation.
14781506
14791507
* **What are the reasonable SLOs (Service Level Objectives) for the above SLIs?**
14801508
Existing kubelet pod creation and service creation SLOs are what is needed.
14811509
14821510
* **Are there any missing metrics that would be useful to have to improve observability
14831511
of this feature?**
14841512
1485-
1. For services:
1486-
1487-
Whether a cluster is converted to dual-stack or converted back to
1488-
single-stack, services will remain the same because the dual-stack
1489-
conversation does not change user data.
1490-
1491-
Services/Endpoint selection is not in path of pod creation. It runs in
1492-
kube-controller-manager, so any malfunction will not affect pods.
1493-
1494-
2. For pods:
1495-
1496-
Dual-stack components are not in path of pod creation. It is in the path
1497-
of reporting pod ips. So pod creation will not be affected; if it is
1498-
affected, then it is a CNI issue (entirely separate from dual-stack).
1499-
1500-
Dual-stack components are in the path of PodIPs reporting which affects
1501-
kubelet. If there is a problem (or if there are persistent problems)
1502-
then it is migitated by disabling the feature gate, which turns it off
1503-
for kube-apiserver, kube-controller-manager, kube-proxy, and kubelet.
1513+
Useful metrics include those that report on pods using multiple IP addresses
1514+
and likewise services that are using multiple IP addresses.
15041515
15051516
### Dependencies
15061517
15071518
* **Does this feature depend on any specific services running in the cluster?**
1508-
This feature does not have dependency beyond kube-apiserver and standard controllers
1509-
shipped with Kubernetes releases.
1519+
This feature does not have dependency beyond kube-apiserver and standard
1520+
controllers shipped with Kubernetes releases.
15101521
15111522
### Scalability
15121523
@@ -1518,7 +1529,9 @@ of this feature?**
15181529
15191530
* **Will enabling / using this feature result in any new calls to the cloud
15201531
provider?**
1521-
No. IP allocation for services only involves the API server.
1532+
No. Because of the backwards-compatibility of the modified services API, the
1533+
cloud provider will work as-is with the primary service cluster IP. The cloud
1534+
providers can optionally work with alternative ipfamily.
15221535
15231536
* **Will enabling / using this feature result in increasing size or count of
15241537
the existing API objects?**
@@ -1540,15 +1553,25 @@ resource usage (CPU, RAM, disk, IO, ...) in any components?**
15401553
This feature will not be operable if either kube-apiserver or etcd is unavailable.
15411554
15421555
* **What are other known failure modes?**
1543-
1556+
1557+
* Missing prerequisites. Operator must verify the following conditions:
1558+
1. Ensure correct support in the node infrastructure provider.
1559+
a. supports routing both IPv4 and IPv6 interfaces.
1560+
b. makes both IPv4 and IPv6 interfaces available to Kubernetes.
1561+
2. CNI needs to be correctly configured for dual-stack service.
1562+
a. Kubernetes must be able to assign IPv4 and IPv6 addresses from the
1563+
CNI provider.
1564+
3. Service CIDRs need to be sufficiently large to allow for creation of
1565+
new services.
1566+
4. Dual-stack CLI flags must be configured on the cluster as defined in the [dual-stack docs](https://kubernetes.io/docs/concepts/services-networking/dual-stack/#enable-ipv4-ipv6-dual-stack)
1567+
15441568
* Failure to create dual-stack services. Operator must perform the following steps:
15451569
1. Ensure that the cluster has `IPv6DualStack` feature enabled.
15461570
2. Ensure that api-server is correctly configured with multi (dual-stack) service
15471571
CIDRs using `--services-cluster-ip-range` flag.
15481572
1549-
* Failure to route traffic to pod backing a dual-stack service. Operator must perform the
1550-
following steps:
1551-
1. Ensure that nodes (where the pod is running) is configured for dual-stack
1573+
* Failure to route traffic to pod backing a dual-stack service. Operator must perform the following steps:
1574+
1. Ensure that nodes (where the pod is running) are configured for dual-stack
15521575
a. Node is using dual-stack enabled CNI.
15531576
b. kubelet is configured with dual-stack feature flag.
15541577
c. kube-proxy is configured with dual-stack feature flag.
@@ -1560,25 +1583,33 @@ resource usage (CPU, RAM, disk, IO, ...) in any components?**
15601583
where applicable.
15611584
4. Operator can ensure that `endpoints` and `endpointSlices` are correctly
15621585
created for the service in question by using kubectl.
1563-
5. If the pod is using host network then operator must ensure that the node is correctly
1564-
reporting dual-stack addresses.
1565-
1566-
* Possible workload imbalances
1567-
1568-
Once a service is saved (with the appropriate feature flag and one or more
1569-
cidrs), the controller manager scans the number of pods (seeing dual or
1570-
single) and creates endpoint objects pointing to these pods.Each node has
1571-
a proxy that watches for services, endpoints, endpoint slices, per family.
1572-
If the existing rules are removed but new ones don't resolve correctly, then
1573-
it will resolve on the next loop.
1574-
1575-
If allocation fails, that would translate to the pod not running if there is
1576-
no IP allocated from the CIDR. If the pod is created but routing is not
1577-
working correctly, there will be an error in the kube-proxy logs, so
1578-
debugging would take place by looking at iptables, similar to how it is
1579-
already done today. The cluster IP allocation is allocated in a synchronous
1580-
process; if this fails the service creation will fail and the service object
1581-
will not be persisted.
1586+
5. If the pod is using host network then operator must ensure that the node
1587+
is correctly reporting dual-stack addresses.
1588+
6. Due to the amount of time needed for control loops to function, when
1589+
scaling with dual-stack it may take time to attach all ready endpoints.
1590+
1591+
* CNI changes may affect legacy workloads.
1592+
1. When dual-stack is configured and enabled, DNS queries will start returning
1593+
IPv4(A) and IPv6(AAAA).
1594+
2. If a workload doesn't account for being offered both IP families, it
1595+
may fail in unexpected ways. For example, firewall rules may need to be
1596+
updated to allow IPv6 addresses.
1597+
3. Recommended to independently verify legacy workloads to ensure fidelity.
1598+
1599+
* IP-related error conditions to consider.
1600+
1. pod IP allocation fails (this is due to CNI)
1601+
a. Will result in the pod not running if there is no IP allocated from
1602+
the CIDR.
1603+
2. Service to pod routing fails
1604+
a. kube proxy can't configure IP tables
1605+
b. if the pod is created but routing is not working correctly, there
1606+
will be an error in the kube-proxy event logs
1607+
c. debugging by looking at iptables, similar to with single-stack.
1608+
3. cluster IP allocation fails
1609+
a. cluster IPs are allocated on save in a synchronous process
1610+
b. if this fails, the service creation will fail and the service object
1611+
will not be persisted.
1612+
15821613
15831614
* **What steps should be taken if SLOs are not being met to determine the problem?**
15841615
N/A

0 commit comments

Comments
 (0)