|
| 1 | +--- |
| 2 | +layout: blog |
| 3 | +title: "Spotlight on SIG Scheduling" |
| 4 | +slug: sig-scheduling-spotlight-2024 |
| 5 | +canonicalUrl: https://www.kubernetes.dev/blog/2024/09/24/sig-scheduling-spotlight-2024 |
| 6 | +date: 2024-09-24 |
| 7 | +author: "Arvind Parekh" |
| 8 | +--- |
| 9 | + |
| 10 | +In this SIG Scheduling spotlight we talked with [Kensei Nakada](https://github.com/sanposhiho/), an |
| 11 | +approver in SIG Scheduling. |
| 12 | + |
| 13 | +## Introductions |
| 14 | + |
| 15 | +**Arvind:** **Hello, thank you for the opportunity to learn more about SIG Scheduling! Would you |
| 16 | +like to introduce yourself and tell us a bit about your role, and how you got involved with |
| 17 | +Kubernetes?** |
| 18 | + |
| 19 | +**Kensei**: Hi, thanks for the opportunity! I’m Kensei Nakada |
| 20 | +([@sanposhiho](https://github.com/sanposhiho/)), a software engineer at |
| 21 | +[Tetrate.io](https://tetrate.io/). I have been contributing to Kubernetes in my free time for more |
| 22 | +than 3 years, and now I’m an approver of SIG-Scheduling in Kubernetes. Also, I’m a founder/owner of |
| 23 | +two SIG subprojects, |
| 24 | +[kube-scheduler-simulator](https://github.com/kubernetes-sigs/kube-scheduler-simulator) and |
| 25 | +[kube-scheduler-wasm-extension](https://github.com/kubernetes-sigs/kube-scheduler-wasm-extension). |
| 26 | + |
| 27 | +## About SIG Scheduling |
| 28 | + |
| 29 | +**AP: That's awesome! You've been involved with the project since a long time. Can you provide a |
| 30 | +brief overview of SIG Scheduling and explain its role within the Kubernetes ecosystem?** |
| 31 | + |
| 32 | +**KN**: As the name implies, our responsibility is to enhance scheduling within |
| 33 | +Kubernetes. Specifically, we develop the components that determine which Node is the best place for |
| 34 | +each Pod. In Kubernetes, our main focus is on maintaining the |
| 35 | +[kube-scheduler](https://kubernetes.io/docs/concepts/scheduling-eviction/kube-scheduler/), along |
| 36 | +with other scheduling-related components as part of our SIG subprojects. |
| 37 | + |
| 38 | +**AP: I see, got it! That makes me curious--what recent innovations or developments has SIG |
| 39 | +Scheduling introduced to Kubernetes scheduling?** |
| 40 | + |
| 41 | +**KN**: From a feature perspective, there have been [several |
| 42 | +enhancements](https://kubernetes.io/blog/2023/04/17/fine-grained-pod-topology-spread-features-beta/) |
| 43 | +to `PodTopologySpread` recently. `PodTopologySpread` is a relatively new feature in the scheduler, |
| 44 | +and we are still in the process of gathering feedback and making improvements. |
| 45 | + |
| 46 | +Most recently, we have been focusing on a new internal enhancement called |
| 47 | +[QueueingHint](https://github.com/kubernetes/enhancements/blob/master/keps/sig-scheduling/4247-queueinghint/README.md) |
| 48 | +which aims to enhance scheduling throughput. Throughput is one of our crucial metrics in |
| 49 | +scheduling. Traditionally, we have primarily focused on optimizing the latency of each scheduling |
| 50 | +cycle. QueueingHint takes a different approach, optimizing when to retry scheduling, thereby |
| 51 | +reducing the likelihood of wasting scheduling cycles. |
| 52 | + |
| 53 | +**A: That sounds interesting! Are there any other interesting topics or projects you are currently |
| 54 | +working on within SIG Scheduling?** |
| 55 | + |
| 56 | +**KN**: I’m leading the development of `QueueingHint` which I just shared. Given that it’s a big new |
| 57 | +challenge for us, we’ve been facing many unexpected challenges, especially around the scalability, |
| 58 | +and we’re trying to solve each of them to eventually enable it by default. |
| 59 | + |
| 60 | +And also, I believe |
| 61 | +[kube-scheduler-wasm-extention](https://github.com/kubernetes-sigs/kube-scheduler-wasm-extension) |
| 62 | +(SIG sub project) that I started last year would be interesting to many people. Kubernetes has |
| 63 | +various extensions from many components. Traditionally, extensions are provided via webhooks |
| 64 | +([extender](https://github.com/kubernetes/design-proposals-archive/blob/main/scheduling/scheduler_extender.md) |
| 65 | +in the scheduler) or Go SDK ([Scheduling |
| 66 | +Framework](https://kubernetes.io/docs/concepts/scheduling-eviction/scheduling-framework/) in the |
| 67 | +scheduler). However, these come with drawbacks - performance issues with webhooks and the need to |
| 68 | +rebuild and replace schedulers with Go SDK, posing difficulties for those seeking to extend the |
| 69 | +scheduler but lacking familiarity with it. The project is trying to introduce a new solution to |
| 70 | +this general challenge - a [WebAssembly](https://webassembly.org/) based extension. Wasm allows |
| 71 | +users to build plugins easily, without worrying about recompiling or replacing their scheduler, and |
| 72 | +sidestepping performance concerns. |
| 73 | + |
| 74 | +Through this project, sig-scheduling has been learning valuable insights about WebAssembly's |
| 75 | +interaction with large Kubernetes objects. And I believe the experience that we’re gaining should be |
| 76 | +useful broadly within the community, beyond sig-scheduling. |
| 77 | + |
| 78 | +**A: Definitely! Now, there are currently 8 subprojects inside SIG Scheduling. Would you like to |
| 79 | +talk about them? Are there some interesting contributions by those teams you want to highlight?** |
| 80 | + |
| 81 | +**KN**: Let me pick up three sub projects; Kueue, KWOK and descheduler. |
| 82 | + |
| 83 | +[Kueue](https://github.com/kubernetes-sigs/kueue): |
| 84 | +: Recently, many people have been trying to manage batch workloads with Kubernetes, and in 2022, |
| 85 | +Kubernetes community founded |
| 86 | +[WG-Batch](https://github.com/kubernetes/community/blob/master/wg-batch/README.md) for better |
| 87 | +support for such batch workloads in Kubernetes. [Kueue](https://github.com/kubernetes-sigs/kueue) |
| 88 | +is a project that takes a crucial role for it. It’s a job queueing controller, deciding when a job |
| 89 | +should wait, when a job should be admitted to start, and when a job should be preempted. Kueue aims |
| 90 | +to be installed on a vanilla Kubernetes cluster while cooperating with existing matured controllers |
| 91 | +(scheduler, cluster-autoscaler, kube-controller-manager, etc). |
| 92 | + |
| 93 | +[KWOK](https://github.com/kubernetes-sigs/kwok): |
| 94 | +: KWOK is a component in which you can create a cluster of thousands of Nodes in seconds. It’s |
| 95 | + mostly useful for simulation/testing as a lightweight cluster, and actually another SIG sub |
| 96 | + project [kube-scheduler-simulator](https://github.com/kubernetes-sigs/kube-scheduler-simulator) |
| 97 | + uses KWOK background. |
| 98 | + |
| 99 | +[descheduler](https://github.com/kubernetes-sigs/descheduler): |
| 100 | +: Descheduler is a component recreating pods that are running on undesired Nodes. In Kubernetes, |
| 101 | +scheduling constraints (`PodAffinity`, `NodeAffinity`, `PodTopologySpread`, etc) are honored only at |
| 102 | +Pod schedule, but it’s not guaranteed that the contrtaints are kept being satisfied afterwards. |
| 103 | +Descheduler evicts Pods violating their scheduling constraints (or other undesired conditions) so |
| 104 | +that they’re recreated and rescheduled. |
| 105 | + |
| 106 | +[Descheduling Framework](https://github.com/kubernetes-sigs/descheduler/blob/master/keps/753-descheduling-framework/README.md). |
| 107 | +: One very interesting on-going project, similar to [Scheduling |
| 108 | + Framework](https://kubernetes.io/docs/concepts/scheduling-eviction/scheduling-framework/) in the |
| 109 | + scheduler, aiming to make descheduling logic extensible and allow maintainers to focus on building |
| 110 | + a core engine of descheduler. |
| 111 | + |
| 112 | +**AP: Thank you for letting us know! And I have to ask, what are some of your favorite things about |
| 113 | +this SIG?** |
| 114 | + |
| 115 | +**KN**: What I really like about this SIG is how actively engaged everyone is. We come from various |
| 116 | +companies and industries, bringing diverse perspectives to the table. Instead of these differences |
| 117 | +causing division, they actually generate a wealth of opinions. Each view is respected, and this |
| 118 | +makes our discussions both rich and productive. |
| 119 | + |
| 120 | +I really appreciate this collaborative atmosphere, and I believe it has been key to continuously |
| 121 | +improving our components over the years. |
| 122 | + |
| 123 | +## Contributing to SIG Scheduling |
| 124 | + |
| 125 | +**AP: Kubernetes is a community-driven project. Any recommendations for new contributors or |
| 126 | +beginners looking to get involved and contribute to SIG scheduling? Where should they start?** |
| 127 | + |
| 128 | +**KN**: Let me start with a general recommendation for contributing to any SIG: a common approach is |
| 129 | +to look for |
| 130 | +[good-first-issue](https://github.com/kubernetes/kubernetes/issues?q=is%3Aopen+is%3Aissue+label%3A%22good+first+issue%22). |
| 131 | +However, you'll soon realize that many people worldwide are trying to contribute to the Kubernetes |
| 132 | +repository. |
| 133 | + |
| 134 | +I suggest starting by examining the implementation of a component that interests you. If you have |
| 135 | +any questions about it, ask in the corresponding Slack channel (e.g., #sig-scheduling for the |
| 136 | +scheduler, #sig-node for kubelet, etc). Once you have a rough understanding of the implementation, |
| 137 | +look at issues within the SIG (e.g., |
| 138 | +[sig-scheduling](https://github.com/kubernetes/kubernetes/issues?q=is%3Aopen+is%3Aissue+label%3Asig%2Fscheduling)), |
| 139 | +where you'll find more unassigned issues compared to good-first-issue ones. You may also want to |
| 140 | +filter issues with the |
| 141 | +[kind/cleanup](https://github.com/kubernetes/kubernetes/issues?q=is%3Aopen+is%3Aissue++label%3Akind%2Fcleanup+) |
| 142 | +label, which often indicates lower-priority tasks and can be starting points. |
| 143 | + |
| 144 | +Specifically for SIG Scheduling, you should first understand the [Scheduling |
| 145 | +Framework](https://kubernetes.io/docs/concepts/scheduling-eviction/scheduling-framework/), which is |
| 146 | +the fundamental architecture of kube-scheduler. Most of the implementation is found in |
| 147 | +[pkg/scheduler](https://github.com/kubernetes/kubernetes/tree/master/pkg/scheduler). I suggest |
| 148 | +starting with |
| 149 | +[ScheduleOne](https://github.com/kubernetes/kubernetes/blob/0590bb1ac495ae8af2a573f879408e48800da2c5/pkg/scheduler/schedule_one.go#L66) |
| 150 | +function and then exploring deeper from there. |
| 151 | + |
| 152 | +Additionally, apart from the main kubernetes/kubernetes repository, consider looking into |
| 153 | +sub-projects. These typically have fewer maintainers and offer more opportunities to make a |
| 154 | +significant impact. Despite being called "sub" projects, many have a large number of users and a |
| 155 | +considerable impact on the community. |
| 156 | + |
| 157 | +And last but not least, remember contributing to the community isn’t just about code. While I |
| 158 | +talked a lot about the implementation contribution, there are many ways to contribute, and each one |
| 159 | +is valuable. One comment to an issue, one feedback to an existing feature, one review comment in PR, |
| 160 | +one clarification on the documentation; every small contribution helps drive the Kubernetes |
| 161 | +ecosystem forward. |
| 162 | + |
| 163 | +**AP: Those are some pretty useful tips! And if I may ask, how do you assist new contributors in |
| 164 | +getting started, and what skills are contributors likely to learn by participating in SIG |
| 165 | +Scheduling?** |
| 166 | + |
| 167 | +**KN**: Our maintainers are available to answer your questions in the #sig-scheduling Slack |
| 168 | +channel. By participating, you'll gain a deeper understanding of Kubernetes scheduling and have the |
| 169 | +opportunity to collaborate and network with maintainers from diverse backgrounds. You'll learn not |
| 170 | +just how to write code, but also how to maintain a large project, design and discuss new features, |
| 171 | +address bugs, and much more. |
| 172 | + |
| 173 | +## Future Directions |
| 174 | + |
| 175 | +**AP: What are some Kubernetes-specific challenges in terms of scheduling? Are there any particular |
| 176 | +pain points?** |
| 177 | + |
| 178 | +**KN**: Scheduling in Kubernetes can be quite challenging because of the diverse needs of different |
| 179 | +organizations with different business requirements. Supporting all possible use cases in |
| 180 | +kube-scheduler is impossible. Therefore, extensibility is a key focus for us. A few years ago, we |
| 181 | +rearchitected kube-scheduler with [Scheduling |
| 182 | +Framework](https://kubernetes.io/docs/concepts/scheduling-eviction/scheduling-framework/), which |
| 183 | +offers flexible extensibility for users to implement various scheduling needs through plugins. This |
| 184 | +allows maintainers to focus on the core scheduling features and the framework runtime. |
| 185 | + |
| 186 | +Another major issue is maintaining sufficient scheduling throughput. Typically, a Kubernetes cluster |
| 187 | +has only one kube-scheduler, so its throughput directly affects the overall scheduling scalability |
| 188 | +and, consequently, the cluster's scalability. Although we have an internal performance test |
| 189 | +([scheduler_perf](https://github.com/kubernetes/kubernetes/tree/master/test/integration/scheduler_perf)), |
| 190 | +unfortunately, we sometimes overlook performance degradation in less common scenarios. It’s |
| 191 | +difficult as even small changes, which look irrelevant to performance, can lead to degradation. |
| 192 | + |
| 193 | +**AP: What are some upcoming goals or initiatives for SIG Scheduling? How do you envision the SIG evolving in the future?** |
| 194 | + |
| 195 | +**KN**: Our primary goal is always to build and maintain _extensible_ and _stable_ scheduling |
| 196 | +runtime, and I bet this goal will remain unchanged forever. |
| 197 | + |
| 198 | +As already mentioned, extensibility is key to solving the challenge of the diverse needs of |
| 199 | +scheduling. Rather than trying to support every different use case directly in kube-scheduler, we |
| 200 | +will continue to focus on enhancing extensibility so that it can accommodate various use |
| 201 | +cases. [kube-scheduler-wasm-extention](https://github.com/kubernetes-sigs/kube-scheduler-wasm-extension) |
| 202 | +that I mentioned is also part of this initiative. |
| 203 | + |
| 204 | +Regarding stability, introducing new optimizations like QueueHint is one of our |
| 205 | +strategies. Additionally, maintaining throughput is also a crucial goal towards the future. We’re |
| 206 | +planning to enhance our throughput monitoring |
| 207 | +([ref](https://github.com/kubernetes/kubernetes/issues/124774)), so that we can notice degradation |
| 208 | +as much as possible on our own before releasing. But, realistically, we can't cover every possible |
| 209 | +scenario. We highly appreciate any attention the community can give to scheduling throughput and |
| 210 | +encourage feedback and alerts regarding performance issues! |
| 211 | + |
| 212 | +## Closing Remarks |
| 213 | + |
| 214 | +**AP: Finally, what message would you like to convey to those who are interested in learning more |
| 215 | +about SIG Scheduling?** |
| 216 | + |
| 217 | +**KN**: Scheduling is one of the most complicated areas in Kubernetes, and you may find it difficult |
| 218 | +at first. But, as I shared earlier, you can find many opportunities for contributions, and many |
| 219 | +maintainers are willing to help you understand things. We know your unique perspective and skills |
| 220 | +are what makes our open source so powerful :) |
| 221 | + |
| 222 | +Feel free to reach out to us in Slack |
| 223 | +([#sig-scheduling](https://kubernetes.slack.com/archives/C09TP78DV)) or |
| 224 | +[meetings](https://github.com/kubernetes/community/blob/master/sig-scheduling/README.md#meetings). |
| 225 | +I hope this article interests everyone and we can see new contributors! |
| 226 | + |
| 227 | +**AP: Thank you so much for taking the time to do this! I'm confident that many will find this |
| 228 | +information invaluable for understanding more about SIG Scheduling and for contributing to the SIG.** |
0 commit comments