You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
to ensure that node addressing, pod addressing, and services are configured
1477
1502
correctly. If dual-stack services are created, they have passed validation.
1503
+
Metrics to check could include pods stuck in pending; look in the event logs
1504
+
to determine if it's a CNI issue which may cause a delay of IP address
1505
+
allocation.
1478
1506
1479
1507
* **What are the reasonable SLOs (Service Level Objectives) for the above SLIs?**
1480
1508
Existing kubelet pod creation and service creation SLOs are what is needed.
1481
1509
1482
1510
* **Are there any missing metrics that would be useful to have to improve observability
1483
1511
of this feature?**
1484
1512
1485
-
1. For services:
1486
-
1487
-
Whether a cluster is converted to dual-stack or converted back to
1488
-
single-stack, services will remain the same because the dual-stack
1489
-
conversation does not change user data.
1490
-
1491
-
Services/Endpoint selection is not in path of pod creation. It runs in
1492
-
kube-controller-manager, so any malfunction will not affect pods.
1493
-
1494
-
2. For pods:
1495
-
1496
-
Dual-stack components are not in path of pod creation. It is in the path
1497
-
of reporting pod ips. So pod creation will not be affected; if it is
1498
-
affected, then it is a CNI issue (entirely separate from dual-stack).
1499
-
1500
-
Dual-stack components are in the path of PodIPs reporting which affects
1501
-
kubelet. If there is a problem (or if there are persistent problems)
1502
-
then it is migitated by disabling the feature gate, which turns it off
1503
-
for kube-apiserver, kube-controller-manager, kube-proxy, and kubelet.
1513
+
Useful metrics include those that report on pods using multiple IP addresses
1514
+
and likewise services that are using multiple IP addresses.
1504
1515
1505
1516
### Dependencies
1506
1517
1507
1518
* **Does this feature depend on any specific services running in the cluster?**
1508
-
This feature does not have dependency beyond kube-apiserver and standard controllers
1509
-
shipped with Kubernetes releases.
1519
+
This feature does not have dependency beyond kube-apiserver and standard
1520
+
controllers shipped with Kubernetes releases.
1510
1521
1511
1522
### Scalability
1512
1523
@@ -1518,7 +1529,9 @@ of this feature?**
1518
1529
1519
1530
* **Will enabling / using this feature result in any new calls to the cloud
1520
1531
provider?**
1521
-
No. IP allocation for services only involves the API server.
1532
+
No. Because of the backwards-compatibility of the modified services API, the
1533
+
cloud provider will work as-is with the primary service cluster IP. The cloud
1534
+
providers can optionally work with alternative ipfamily.
1522
1535
1523
1536
* **Will enabling / using this feature result in increasing size or count of
1524
1537
the existing API objects?**
@@ -1540,15 +1553,25 @@ resource usage (CPU, RAM, disk, IO, ...) in any components?**
1540
1553
This feature will not be operable if either kube-apiserver or etcd is unavailable.
1541
1554
1542
1555
* **What are other known failure modes?**
1543
-
1556
+
1557
+
* Missing prerequisites. Operator must verify the following conditions:
1558
+
1. Ensure correct support in the node infrastructure provider.
1559
+
a. supports routing both IPv4 and IPv6 interfaces.
1560
+
b. makes both IPv4 and IPv6 interfaces available to Kubernetes.
1561
+
2. CNI needs to be correctly configured for dual-stack service.
1562
+
a. Kubernetes must be able to assign IPv4 and IPv6 addresses from the
1563
+
CNI provider.
1564
+
3. Service CIDRs need to be sufficiently large to allow for creation of
1565
+
new services.
1566
+
4. Dual-stack CLI flags must be configured on the cluster as defined in the [dual-stack docs](https://kubernetes.io/docs/concepts/services-networking/dual-stack/#enable-ipv4-ipv6-dual-stack)
1567
+
1544
1568
* Failure to create dual-stack services. Operator must perform the following steps:
1545
1569
1. Ensure that the cluster has `IPv6DualStack` feature enabled.
1546
1570
2. Ensure that api-server is correctly configured with multi (dual-stack) service
1547
1571
CIDRs using `--services-cluster-ip-range` flag.
1548
1572
1549
-
* Failure to route traffic to pod backing a dual-stack service. Operator must perform the
1550
-
following steps:
1551
-
1. Ensure that nodes (where the pod is running) is configured for dual-stack
1573
+
* Failure to route traffic to pod backing a dual-stack service. Operator must perform the following steps:
1574
+
1. Ensure that nodes (where the pod is running) are configured for dual-stack
1552
1575
a. Node is using dual-stack enabled CNI.
1553
1576
b. kubelet is configured with dual-stack feature flag.
1554
1577
c. kube-proxy is configured with dual-stack feature flag.
@@ -1560,25 +1583,33 @@ resource usage (CPU, RAM, disk, IO, ...) in any components?**
1560
1583
where applicable.
1561
1584
4. Operator can ensure that `endpoints` and `endpointSlices` are correctly
1562
1585
created for the service in question by using kubectl.
1563
-
5. If the pod is using host network then operator must ensure that the node is correctly
1564
-
reporting dual-stack addresses.
1565
-
1566
-
* Possible workload imbalances
1567
-
1568
-
Once a service is saved (with the appropriate feature flag and one or more
1569
-
cidrs), the controller manager scans the number of pods (seeing dual or
1570
-
single) and creates endpoint objects pointing to these pods.Each node has
1571
-
a proxy that watches for services, endpoints, endpoint slices, per family.
1572
-
If the existing rules are removed but new ones don't resolve correctly, then
1573
-
it will resolve on the next loop.
1574
-
1575
-
If allocation fails, that would translate to the pod not running if there is
1576
-
no IP allocated from the CIDR. If the pod is created but routing is not
1577
-
working correctly, there will be an error in the kube-proxy logs, so
1578
-
debugging would take place by looking at iptables, similar to how it is
1579
-
already done today. The cluster IP allocation is allocated in a synchronous
1580
-
process; if this fails the service creation will fail and the service object
1581
-
will not be persisted.
1586
+
5. If the pod is using host network then operator must ensure that the node
1587
+
is correctly reporting dual-stack addresses.
1588
+
6. Due to the amount of time needed for control loops to function, when
1589
+
scaling with dual-stack it may take time to attach all ready endpoints.
1590
+
1591
+
* CNI changes may affect legacy workloads.
1592
+
1. When dual-stack is configured and enabled, DNS queries will start returning
1593
+
IPv4(A) and IPv6(AAAA).
1594
+
2. If a workload doesn't account for being offered both IP families, it
1595
+
may fail in unexpected ways. For example, firewall rules may need to be
1596
+
updated to allow IPv6 addresses.
1597
+
3. Recommended to independently verify legacy workloads to ensure fidelity.
1598
+
1599
+
* IP-related error conditions to consider.
1600
+
1. pod IP allocation fails (this is due to CNI)
1601
+
a. Will result in the pod not running if there is no IP allocated from
1602
+
the CIDR.
1603
+
2. Service to pod routing fails
1604
+
a. kube proxy can't configure IP tables
1605
+
b. if the pod is created but routing is not working correctly, there
1606
+
will be an error in the kube-proxy event logs
1607
+
c. debugging by looking at iptables, similar to with single-stack.
1608
+
3. cluster IP allocation fails
1609
+
a. cluster IPs are allocated on save in a synchronous process
1610
+
b. if this fails, the service creation will fail and the service object
1611
+
will not be persisted.
1612
+
1582
1613
1583
1614
* **What steps should be taken if SLOs are not being met to determine the problem?**
0 commit comments