|
| 1 | +--- |
| 2 | +layout: blog |
| 3 | +title: "Introducing the Windows Operational Readiness Specification" |
| 4 | +date: 2024-04-03 |
| 5 | +slug: intro-windows-ops-readiness |
| 6 | +--- |
| 7 | + |
| 8 | +**Authors:** Jay Vyas (Tesla), Amim Knabben (Broadcom), and Tatenda Zifudzi (AWS) |
| 9 | + |
| 10 | + |
| 11 | +Since Windows support [graduated to stable](/blog/2019/03/25/kubernetes-1-14-release-announcement/) |
| 12 | +with Kubernetes 1.14 in 2019, the capability to run Windows workloads has been much |
| 13 | +appreciated by the end user community. The level of and availability of Windows workload |
| 14 | +support has consistently been a major differentiator for Kubernetes distributions used by |
| 15 | +large enterprises. However, with more Windows workloads being migrated to Kubernetes |
| 16 | +and new Windows features being continuously released, it became challenging to test |
| 17 | +Windows worker nodes in an effective and standardized way. |
| 18 | + |
| 19 | +The Kubernetes project values the ability to certify conformance without requiring a |
| 20 | +closed-source license for a certified distribution or service that has no intention |
| 21 | +of offering Windows. |
| 22 | + |
| 23 | +Some notable examples brought to the attention of SIG Windows were: |
| 24 | + |
| 25 | +- An issue with load balancer source address ranges functionality not operating correctly on |
| 26 | + Windows nodes, detailed in a GitHub issue: |
| 27 | + [kubernetes/kubernetes#120033](https://github.com/kubernetes/kubernetes/issues/120033). |
| 28 | +- Reports of functionality issues with Windows features, such as |
| 29 | + “[GMSA](https://learn.microsoft.com/en-us/windows-server/security/group-managed-service-accounts/group-managed-service-accounts-overview) not working with containerd, |
| 30 | + discussed in [microsoft/Windows-Containers#44](https://github.com/microsoft/Windows-Containers/issues/44). |
| 31 | +- Challenges developing networking policy tests that could objectively evaluate |
| 32 | + Container Network Interface (CNI) plugins across different operating system configurations, |
| 33 | + as discussed in [kubernetes/kubernetes#97751](https://github.com/kubernetes/kubernetes/issues/97751). |
| 34 | + |
| 35 | +SIG Windows therefore recognized the need for a tailored solution to ensure Windows |
| 36 | +nodes' operational readiness *before* their deployment into production environments. |
| 37 | +Thus, the idea to develop a [Windows Operational Readiness Specification](https://kep.k8s.io/2578) |
| 38 | +was born. |
| 39 | + |
| 40 | +## Can’t we just run the official Conformance tests? |
| 41 | + |
| 42 | +The Kubernetes project contains a set of [conformance tests](https://www.cncf.io/training/certification/software-conformance/#how), |
| 43 | +which are standardized tests designed to ensure that a Kubernetes cluster meets |
| 44 | +the required Kubernetes specifications. |
| 45 | + |
| 46 | +However, these tests were originally defined at a time when Linux was the *only* |
| 47 | +operating system compatible with Kubernetes, and thus, they were not easily |
| 48 | +extendable for use with Windows. Given that Windows workloads, despite their |
| 49 | +importance, account for a smaller portion of the Kubernetes community, it was |
| 50 | +important to ensure that the primary conformance suite relied upon by many |
| 51 | +Kubernetes distributions to certify Linux conformance, didn't become encumbered |
| 52 | +with Windows specific features or enhancements such as GMSA or multi-operating |
| 53 | +system kube-proxy behavior. |
| 54 | + |
| 55 | +Therefore, since there was a specialized need for Windows conformance testing, |
| 56 | +SIG Windows went down the path of offering Windows specific conformance tests |
| 57 | +through the Windows Operational Readiness Specification. |
| 58 | + |
| 59 | +## Can’t we just run the Kubernetes end-to-end test suite? |
| 60 | + |
| 61 | +In the Linux world, tools such as [Sonobuoy](https://sonobuoy.io/) simplify execution of the |
| 62 | +conformance suite, relieving users from needing to be aware of Kubernetes' |
| 63 | +compilation paths or the semantics of [Ginkgo](https://onsi.github.io/ginkgo) tags. |
| 64 | + |
| 65 | +Regarding needing to compile the Kubernetes tests, we realized that Windows |
| 66 | +users might similarly find the process of compiling and running the Kubernetes |
| 67 | +e2e suite from scratch similarly undesirable, hence, there was a clear need to |
| 68 | +provide a user-friendly, "push-button" solution that is ready to go. Moreover, |
| 69 | +regarding Ginkgo tags, applying conformance tests to Windows nodes through a set |
| 70 | +of [Ginkgo](https://onsi.github.io/ginkgo/) tags would also be burdensome for |
| 71 | +any user, including Linux enthusiasts or experienced Windows system admins alike. |
| 72 | + |
| 73 | +To bridge the gap and give users a straightforward way to confirm their clusters |
| 74 | +support a variety of features, the Kubernetes SIG for Windows found it necessary to |
| 75 | +therefore create the Windows Operational Readiness application. This application |
| 76 | +written in Go, simplifies the process to run the necessary Windows specific tests |
| 77 | +while delivering results in a clear, accessible format. |
| 78 | + |
| 79 | +This initiative has been a collaborative effort, with contributions from different |
| 80 | +cloud providers and platforms, including Amazon, Microsoft, SUSE, and Broadcom. |
| 81 | + |
| 82 | +## A closer look at the Windows Operational Readiness Specification {#specification} |
| 83 | + |
| 84 | +The Windows Operational Readiness specification specifically targets and executes |
| 85 | +tests found within the Kubernetes repository in a more user-friendly way than |
| 86 | +simply targeting [Ginkgo](https://onsi.github.io/ginkgo/) tags. It introduces a |
| 87 | +structured test suite that is split into sets of core and extended tests, with |
| 88 | +each set of tests containing categories directed at testing a specific area of |
| 89 | +testing, such as networking. Core tests target fundamental and critical |
| 90 | +functionalities that Windows nodes should support as defined by the Kubernetes |
| 91 | +specification. On the other hand, extended tests cover more complex features, |
| 92 | +more aligned with diving deeper into Windows-specific capabilities such as |
| 93 | +integrations with Active Directory. These goal of these tests is to be extensive, |
| 94 | +covering a wide array of Windows-specific capabilities to ensure compatibility |
| 95 | +with a diverse set of workloads and configurations, extending beyond basic |
| 96 | +requirements. Below is the current list of categories. |
| 97 | + |
| 98 | +| Category Name | Category Description | |
| 99 | +|--------------------------|-------------------------------------------------------------------------------------------------------------------------------------| |
| 100 | +| `Core.Network` | Tests minimal networking functionality (ability to access pod-by-pod IP.) | |
| 101 | +| `Core.Storage` | Tests minimal storage functionality, (ability to mount a hostPath storage volume.) | |
| 102 | +| `Core.Scheduling` | Tests minimal scheduling functionality, (ability to schedule a pod with CPU limits.) | |
| 103 | +| `Core.Concurrent` | Tests minimal concurrent functionality, (the ability of a node to handle traffic to multiple pods concurrently.) | |
| 104 | +| `Extend.HostProcess` | Tests features related to Windows HostProcess pod functionality. | |
| 105 | +| `Extend.ActiveDirectory` | Tests features related to Active Directory functionality. | |
| 106 | +| `Extend.NetworkPolicy` | Tests features related to Network Policy functionality. | |
| 107 | +| `Extend.Network` | Tests advanced networking functionality, (ability to support IPv6) | |
| 108 | +| `Extend.Worker` | Tests features related to Windows worker node functionality, (ability for nodes to access TCP and UDP services in the same cluster) | |
| 109 | + |
| 110 | +## How to conduct operational readiness tests for Windows nodes |
| 111 | + |
| 112 | +To run the Windows Operational Readiness test suite, refer to the test suite's |
| 113 | +[`README`](https://github.com/kubernetes-sigs/windows-operational-readiness/blob/main/README.md), which explains how to set it up and run it. The test suite offers |
| 114 | +flexibility in how you can execute tests, either using a compiled binary or a |
| 115 | +Sonobuoy plugin. You also have the choice to run the tests against the entire |
| 116 | +test suite or by specifying a list of categories. Cloud providers have the |
| 117 | +choice of uploading their conformance results, enhancing transparency and reliability. |
| 118 | + |
| 119 | +Once you have checked out that code, you can run a test. For example, this sample |
| 120 | +command runs the tests from the `Core.Concurrent` category: |
| 121 | + |
| 122 | +```shell |
| 123 | +./op-readiness --kubeconfig $KUBE_CONFIG --category Core.Concurrent |
| 124 | +``` |
| 125 | + |
| 126 | +As a contributor to Kubernetes, if you want to test your changes against a specific pull |
| 127 | +request using the Windows Operational Readiness Specification, use the following bot |
| 128 | +command in the new pull request. |
| 129 | + |
| 130 | +```shell |
| 131 | +/test operational-tests-capz-windows-2019 |
| 132 | +``` |
| 133 | + |
| 134 | +## Looking ahead |
| 135 | + |
| 136 | +We’re looking to improve our curated list of Windows-specific tests by adding |
| 137 | +new tests to the Kubernetes repository and also identifying existing test cases |
| 138 | +that can be targetted. The long term goal for the specification is to continually |
| 139 | +enhance test coverage for Windows worker nodes and improve the robustness of |
| 140 | +Windows support, facilitating a seamless experience across diverse cloud |
| 141 | +environments. We also have plans to integrate the Windows Operational Readiness |
| 142 | +tests into the official Kubernetes conformance suite. |
| 143 | + |
| 144 | +If you are interested in helping us out, please reach out to us! We welcome help |
| 145 | +in any form, from giving once-off feedback to making a code contribution, |
| 146 | +to having long-term owners to help us drive changes. The Windows Operational |
| 147 | +Readiness specification is owned by the SIG Windows team. You can reach out |
| 148 | +to the team on the [Kubernetes Slack workspace](https://slack.k8s.io/) **#sig-windows** |
| 149 | +channel. You can also explore the [Windows Operational Readiness test suite](https://github.com/kubernetes-sigs/windows-operational-readiness/#readme) |
| 150 | +and make contributions directly to the GitHub repository. |
| 151 | + |
| 152 | +Special thanks to Kulwant Singh (AWS), Pramita Gautam Rana (VMWare), Xinqi Li |
| 153 | +(Google) for their help in making notable contributions to the specification. Additionally, |
| 154 | +appreciation goes to James Sturtevant (Microsoft), Mark Rossetti (Microsoft), |
| 155 | +Claudiu Belu (Cloudbase Solutions) and Aravindh Puthiyaparambil |
| 156 | +(Softdrive Technologies Group Inc.) from the SIG Windows team for their guidance and support. |
0 commit comments