Skip to content

Commit 01723ab

Browse files
committed
Updates
1 parent 7650020 commit 01723ab

File tree

1 file changed

+43
-7
lines changed
  • keps/sig-windows/5100-windows-dsr-and-overlay-support

1 file changed

+43
-7
lines changed

keps/sig-windows/5100-windows-dsr-and-overlay-support/README.md

Lines changed: 43 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -141,10 +141,10 @@ Items marked with (R) are required *prior to targeting to a milestone / release*
141141
- [ ] (R) Ensure GA e2e tests meet requirements for [Conformance Tests](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/conformance-tests.md)
142142
- [ ] (R) Minimum Two Week Window for GA e2e tests to prove flake free
143143
- [X] (R) Graduation criteria is in place
144-
- [X] (R) [all GA Endpoints](https://github.com/kubernetes/community/pull/1806) must be hit by [Conformance Tests](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/conformance-tests.md)
145-
- [ ] (R) Production readiness review completed
144+
- [ ] (R) [all GA Endpoints](https://github.com/kubernetes/community/pull/1806) must be hit by [Conformance Tests](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/conformance-tests.md)
145+
- [X] (R) Production readiness review completed
146146
- [ ] (R) Production readiness review approved
147-
- [ ] "Implementation History" section is up-to-date for milestone
147+
- [X] "Implementation History" section is up-to-date for milestone
148148
- [ ] User-facing documentation has been created in [kubernetes/website], for publication to [kubernetes.io]
149149
- [ ] Supporting documentation—e.g., additional design documents, links to mailing list discussions/SIG meetings, relevant PRs/issues, release notes
150150

@@ -232,7 +232,7 @@ DSR and Overlay networking mode support is already implemented in Windows kube-p
232232
This proposal is to promote the existing implementations to GA.
233233

234234
Additionally, DSR support on Windows is supported on both EKS and AKS.
235-
Both DSR and overlay networking support have been used in the Windows CI pipelines running release-informing
235+
Both DSR and overlay networking support have been used in the Windows CI pipelines running release-informing jobs since K8s v1.20.
236236

237237
### User Stories (Optional)
238238

@@ -281,7 +281,7 @@ Consider including folks who also work outside the SIG or subproject.
281281
Enabling DSR and overlay networking mode support in Windows kube-proxy both have very little risk.
282282

283283
For DSR, the Windows Host Network Service handles all of the logic for managing network traffic; kube-proxy only needs to specify if DSR should be used when creating/sycing load balancer rules.
284-
Additionally, DSR must be enabled with a kube-proxy command switch switch (--enable-dsr=true) disabling DSR is can be performed by redeploying kube-proxy on Windows nodes.
284+
Additionally, DSR must be enabled with a kube-proxy command switch (--enable-dsr=true) disabling DSR is can be performed by redeploying kube-proxy on Windows nodes.
285285

286286
Overlay networking support in Windows has been used in the Windows CI pipelines running release-informing [capz-windows-master](https://testgrid.k8s.io/sig-windows-signal#capz-windows-master) jobs since K8s v1.20.
287287

@@ -374,7 +374,10 @@ This can inform certain test coverage improvements that we want to do before
374374
extending the production code to implement this enhancement.
375375
-->
376376

377-
Unit tests validating overlay networking behavior exist at https://github.com/kubernetes/kubernetes/blob/master/pkg/proxy/winkernel/proxier_test.go but must run on Windows machines so coverage is not reported in ci-kubernetes-coverage-unit.
377+
Kube-proxy for Windows must run on Windows machines so coverage is not reported in ci-kubernetes-coverage-unit.
378+
This coverage data was run manually on a Windows Server 2022 machine:
379+
380+
- k8s.io/kubernetes/pkg/proxy/winkernel: 2025-02-11 - 58.8% of statements
378381

379382

380383
##### Integration tests
@@ -450,6 +453,7 @@ N/A - This feature is already implemented.
450453

451454
- Test passes on testgrid with WinDSR and Winoverlay enabled on Windows nodes are running regularly.
452455
- Unit tests validating expected behavior for both DSR and overlay networking mode are added.
456+
- For DSR, unit tests validating feature gate is set correctly and that the correct flags are passed to HNS calls will also be added.
453457

454458
#### GA
455459

@@ -627,7 +631,8 @@ https://github.com/kubernetes/kubernetes/pull/97058/files#diff-7826f7adbc1996a05
627631

628632
For overlay, no, because the feature requires the cluster to be configured for overlay networking mode and cannot be enabled on a per-node basis.
629633

630-
For DSR, no, but they can be added.
634+
For DSR, unit tests will be added to validate that DSR is enabled and disabled correctly and that the correct flags are passed to HNS calls for each case.
635+
These will be required for the feature to move to beta.
631636

632637
### Rollout, Upgrade and Rollback Planning
633638

@@ -793,6 +798,9 @@ and creating new ones, as well as about cluster-level services (e.g. DNS):
793798

794799
DNS and CNI solutions must be deployed in the cluster.
795800

801+
Both DSR and overlay networking modes are supported for all patch versions of Windows Server 2022 and Windows Server 2025.
802+
DSR requires Windows Server 2019 with May 2020 updates (or later).
803+
796804
### Scalability
797805

798806
<!--
@@ -908,6 +916,34 @@ splitting it into a dedicated `Playbook` document (potentially with some monitor
908916
details). For now, we leave it here.
909917
-->
910918

919+
A troubleshooting guide for general Windows networking issues can be found at https://learn.microsoft.com/en-us/troubleshoot/windows-server/software-defined-networking/troubleshoot-windows-server-software-defined-networking-stack
920+
921+
https://github.com/microsoft/SDN/ contains some additional troubleshooting scripts to collect detailed information and can help in troubleshooting
922+
- https://github.com/microsoft/SDN/blob/master/Kubernetes/windows/hns.v2.psm1 is a powershell module with cmdlets for inspecting HNS policies and endpoints
923+
- https://github.com/microsoft/SDN/blob/master/Kubernetes/windows/helper.psm1 contains useful helper functions for troubleshooting
924+
- https://github.com/microsoft/SDN/tree/master/Kubernetes/windows/debug contains various powershell scripts for enabling tracing, collectings stats and perf counterd, starting packet captures, etc
925+
926+
Troubleshooting issues with Direct Server Return (DSR) on Windows:
927+
928+
- Ensure that the kube-proxy command line switch `--enable-dsr=true` is set and `--feature-gates=WinDSR=true` is set.
929+
- Inspect kube-proxy logs for any warnings or errors
930+
- If everything looks correct, log onto the node and inspect the HNS rules to ensure DSR is enabled for the load balancer rules.
931+
- Log onto the node and use `hnsdiag.exe list loadbalancers -d` to list all the load balancers and details about their rules.
932+
You should see `IsDSR:true` for load balancer policies proxied by kube-proxy.
933+
- You can use `hnsdiag.exe` to get detailed infromation about local networks and endpoints in addition to loadbalancers.
934+
- If you are still having issues create an issue at https://github.com/microsoft/windows-containers
935+
936+
Troubleshooting issues with overlay networking mode on Windows:
937+
938+
- Ensure that the CNI solution has either created a HNS network of type `Overlay` or that instructions provided by the CNI solution have been followed to create the network.
939+
- Ensure that the name of the network created above is passed to kube-proxy with the `$Env:KUBE_NETWORK` environment variable.
940+
- Check kube-proxy logs for any warnings or errors.
941+
- If everything looks correct, log onto the node and inspect the HNS rules to ensure that the source VIP is being used correctly.
942+
- Log onto the node and use `hnsdiag.exe list loadbalancers -d` to list all the load balancers and details about their rules.
943+
You should see the source VIP being used for load balancer policies proxied by kube-proxy.
944+
- You can use `hnsdiag.exe` to get detailed infromation about local networks and endpoints in addition to loadbalancers.
945+
- If you are still having issues create an issue at https://github.com/microsoft/windows-containers
946+
911947
###### How does this feature react if the API server and/or etcd is unavailable?
912948

913949
This feature does not change the functionality of kube-proxy or other Kubernetes components if the API server or etcd is unavailable. Kube-proxy would retain the existing behavior if the API server or etcd is unavailable, which would result in new Pod and Service endpoints not routing correctly on the nodes.

0 commit comments

Comments
 (0)