You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: keps/sig-windows/5100-windows-dsr-and-overlay-support/README.md
+43-7Lines changed: 43 additions & 7 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -141,10 +141,10 @@ Items marked with (R) are required *prior to targeting to a milestone / release*
141
141
-[ ] (R) Ensure GA e2e tests meet requirements for [Conformance Tests](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/conformance-tests.md)
142
142
-[ ] (R) Minimum Two Week Window for GA e2e tests to prove flake free
143
143
-[X] (R) Graduation criteria is in place
144
-
-[X] (R) [all GA Endpoints](https://github.com/kubernetes/community/pull/1806) must be hit by [Conformance Tests](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/conformance-tests.md)
145
-
-[] (R) Production readiness review completed
144
+
-[] (R) [all GA Endpoints](https://github.com/kubernetes/community/pull/1806) must be hit by [Conformance Tests](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/conformance-tests.md)
145
+
-[X] (R) Production readiness review completed
146
146
-[ ] (R) Production readiness review approved
147
-
-[] "Implementation History" section is up-to-date for milestone
147
+
-[X] "Implementation History" section is up-to-date for milestone
148
148
-[ ] User-facing documentation has been created in [kubernetes/website], for publication to [kubernetes.io]
149
149
-[ ] Supporting documentation—e.g., additional design documents, links to mailing list discussions/SIG meetings, relevant PRs/issues, release notes
150
150
@@ -232,7 +232,7 @@ DSR and Overlay networking mode support is already implemented in Windows kube-p
232
232
This proposal is to promote the existing implementations to GA.
233
233
234
234
Additionally, DSR support on Windows is supported on both EKS and AKS.
235
-
Both DSR and overlay networking support have been used in the Windows CI pipelines running release-informing
235
+
Both DSR and overlay networking support have been used in the Windows CI pipelines running release-informing jobs since K8s v1.20.
236
236
237
237
### User Stories (Optional)
238
238
@@ -281,7 +281,7 @@ Consider including folks who also work outside the SIG or subproject.
281
281
Enabling DSR and overlay networking mode support in Windows kube-proxy both have very little risk.
282
282
283
283
For DSR, the Windows Host Network Service handles all of the logic for managing network traffic; kube-proxy only needs to specify if DSR should be used when creating/sycing load balancer rules.
284
-
Additionally, DSR must be enabled with a kube-proxy command switch switch (--enable-dsr=true) disabling DSR is can be performed by redeploying kube-proxy on Windows nodes.
284
+
Additionally, DSR must be enabled with a kube-proxy command switch (--enable-dsr=true) disabling DSR is can be performed by redeploying kube-proxy on Windows nodes.
285
285
286
286
Overlay networking support in Windows has been used in the Windows CI pipelines running release-informing [capz-windows-master](https://testgrid.k8s.io/sig-windows-signal#capz-windows-master) jobs since K8s v1.20.
287
287
@@ -374,7 +374,10 @@ This can inform certain test coverage improvements that we want to do before
374
374
extending the production code to implement this enhancement.
375
375
-->
376
376
377
-
Unit tests validating overlay networking behavior exist at https://github.com/kubernetes/kubernetes/blob/master/pkg/proxy/winkernel/proxier_test.go but must run on Windows machines so coverage is not reported in ci-kubernetes-coverage-unit.
377
+
Kube-proxy for Windows must run on Windows machines so coverage is not reported in ci-kubernetes-coverage-unit.
378
+
This coverage data was run manually on a Windows Server 2022 machine:
379
+
380
+
- k8s.io/kubernetes/pkg/proxy/winkernel: 2025-02-11 - 58.8% of statements
378
381
379
382
380
383
##### Integration tests
@@ -450,6 +453,7 @@ N/A - This feature is already implemented.
450
453
451
454
- Test passes on testgrid with WinDSR and Winoverlay enabled on Windows nodes are running regularly.
452
455
- Unit tests validating expected behavior for both DSR and overlay networking mode are added.
456
+
- For DSR, unit tests validating feature gate is set correctly and that the correct flags are passed to HNS calls will also be added.
For overlay, no, because the feature requires the cluster to be configured for overlay networking mode and cannot be enabled on a per-node basis.
629
633
630
-
For DSR, no, but they can be added.
634
+
For DSR, unit tests will be added to validate that DSR is enabled and disabled correctly and that the correct flags are passed to HNS calls for each case.
635
+
These will be required for the feature to move to beta.
631
636
632
637
### Rollout, Upgrade and Rollback Planning
633
638
@@ -793,6 +798,9 @@ and creating new ones, as well as about cluster-level services (e.g. DNS):
793
798
794
799
DNS and CNI solutions must be deployed in the cluster.
795
800
801
+
Both DSR and overlay networking modes are supported for all patch versions of Windows Server 2022 and Windows Server 2025.
802
+
DSR requires Windows Server 2019 with May 2020 updates (or later).
803
+
796
804
### Scalability
797
805
798
806
<!--
@@ -908,6 +916,34 @@ splitting it into a dedicated `Playbook` document (potentially with some monitor
908
916
details). For now, we leave it here.
909
917
-->
910
918
919
+
A troubleshooting guide for general Windows networking issues can be found at https://learn.microsoft.com/en-us/troubleshoot/windows-server/software-defined-networking/troubleshoot-windows-server-software-defined-networking-stack
920
+
921
+
https://github.com/microsoft/SDN/ contains some additional troubleshooting scripts to collect detailed information and can help in troubleshooting
922
+
-https://github.com/microsoft/SDN/blob/master/Kubernetes/windows/hns.v2.psm1 is a powershell module with cmdlets for inspecting HNS policies and endpoints
923
+
-https://github.com/microsoft/SDN/blob/master/Kubernetes/windows/helper.psm1 contains useful helper functions for troubleshooting
924
+
-https://github.com/microsoft/SDN/tree/master/Kubernetes/windows/debug contains various powershell scripts for enabling tracing, collectings stats and perf counterd, starting packet captures, etc
925
+
926
+
Troubleshooting issues with Direct Server Return (DSR) on Windows:
927
+
928
+
- Ensure that the kube-proxy command line switch `--enable-dsr=true` is set and `--feature-gates=WinDSR=true` is set.
929
+
- Inspect kube-proxy logs for any warnings or errors
930
+
- If everything looks correct, log onto the node and inspect the HNS rules to ensure DSR is enabled for the load balancer rules.
931
+
- Log onto the node and use `hnsdiag.exe list loadbalancers -d` to list all the load balancers and details about their rules.
932
+
You should see `IsDSR:true` for load balancer policies proxied by kube-proxy.
933
+
- You can use `hnsdiag.exe` to get detailed infromation about local networks and endpoints in addition to loadbalancers.
934
+
- If you are still having issues create an issue at https://github.com/microsoft/windows-containers
935
+
936
+
Troubleshooting issues with overlay networking mode on Windows:
937
+
938
+
- Ensure that the CNI solution has either created a HNS network of type `Overlay` or that instructions provided by the CNI solution have been followed to create the network.
939
+
- Ensure that the name of the network created above is passed to kube-proxy with the `$Env:KUBE_NETWORK` environment variable.
940
+
- Check kube-proxy logs for any warnings or errors.
941
+
- If everything looks correct, log onto the node and inspect the HNS rules to ensure that the source VIP is being used correctly.
942
+
- Log onto the node and use `hnsdiag.exe list loadbalancers -d` to list all the load balancers and details about their rules.
943
+
You should see the source VIP being used for load balancer policies proxied by kube-proxy.
944
+
- You can use `hnsdiag.exe` to get detailed infromation about local networks and endpoints in addition to loadbalancers.
945
+
- If you are still having issues create an issue at https://github.com/microsoft/windows-containers
946
+
911
947
###### How does this feature react if the API server and/or etcd is unavailable?
912
948
913
949
This feature does not change the functionality of kube-proxy or other Kubernetes components if the API server or etcd is unavailable. Kube-proxy would retain the existing behavior if the API server or etcd is unavailable, which would result in new Pod and Service endpoints not routing correctly on the nodes.
0 commit comments