Skip to content

Commit 67e1ad6

Browse files
authored
Merge pull request #47104 from sanposhiho/simulator
blog: introducing kube-scheduler-simulator
2 parents 8e44486 + e3ea7ce commit 67e1ad6

File tree

2 files changed

+220
-0
lines changed
  • content/en/blog/_posts/2025-12-31-kube-scheduler-simulator
  • static/images/blog/2025-12-31-kube-scheduler-simulator

2 files changed

+220
-0
lines changed
Lines changed: 220 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,220 @@
1+
---
2+
layout: blog
3+
title: "Introducing kube-scheduler-simulator"
4+
date: 2025-12-31
5+
draft: true
6+
slug: introducing-kube-scheduler-simulator
7+
author: Kensei Nakada (Tetrate)
8+
---
9+
10+
The Kubernetes Scheduler is a crucial control plane component that determines which node a Pod will run on.
11+
Thus, anyone utilizing Kubernetes relies on a scheduler.
12+
13+
The [kube-scheduler-simulator](https://sigs.k8s.io/kube-scheduler-simulator) is a simulator for the Kubernetes scheduler, started as a [Google Summer of Code 2021](https://summerofcode.withgoogle.com/) project developed by me (Kensei Nakada) and later received a lot of contributions.
14+
This tool allows users to closely examine the scheduler’s behavior and decisions.
15+
16+
It is useful for casual users who employ scheduling constraints (e.g., [inter-Pod affinity](/docs/concepts/scheduling-eviction/assign-pod-node/#affinity-and-anti-affinity))
17+
and experts who extend the scheduler with custom plugins.
18+
19+
## Motivation
20+
21+
The scheduler often appears as a black box,
22+
composed of many plugins that each contribute to the scheduling decision-making process from their unique perspectives.
23+
Understanding its behavior can be challenging due to the multitude of factors it considers.
24+
Even if a Pod seems to be scheduled as expected in a simple test cluster,
25+
it may be coming from a different calculation than the expectation,
26+
which could result in unexpected scheduling results in a large production environment.
27+
28+
Also, testing a scheduler is a complex challenge.
29+
There are countless patterns of operations executed within a real cluster, making it impractical to anticipate every scenario with a finite number of tests.
30+
More often than not, bugs are discovered only when the scheduler is deployed in an actual cluster.
31+
Actually, many bugs are found by users after shipping the release,
32+
even in the upstream kube-scheduler.
33+
34+
Having a development or sandbox environment for testing the scheduler — or, indeed, any Kubernetes controllers — is a common practice.
35+
However, this approach falls short of capturing all the potential scenarios that might arise in a production cluster
36+
because a development cluster is often much smaller with notable differences in workload sizes and scaling dynamics.
37+
It never sees the exact same use or exhibits the same behavior as its production counterpart.
38+
39+
kube-scheduler-simulator aims to solve those problems.
40+
It enables users to test their scheduling constraints, scheduler configurations,
41+
and custom plugins while checking every detailed part of scheduling decisions.
42+
It also allows users to create a simulated cluster environment, where they can test their scheduler
43+
with the same resources as their production cluster without affecting actual workloads.
44+
45+
## Features of the kube-scheduler-simulator
46+
47+
kube-scheduler-simulator’s core feature is its ability to expose the scheduler's internal decisions.
48+
The scheduler operates based on the [scheduling framework](/docs/concepts/scheduling-eviction/scheduling-framework/),
49+
utilizing various plugins at different extension points,
50+
filter nodes (Filter phase), score nodes (Score phase), and ultimately determine the best node for the Pod.
51+
52+
The simulator allows users to create Kubernetes resources and observe how each plugin influences the scheduling decisions for Pods.
53+
This visibility helps users understand the scheduler’s workings and define appropriate scheduling constraints.
54+
55+
{{< figure src="/images/blog/2025-01-22-kube-scheduler-simulator/simulator.png" alt="Screenshot of the simulator web frontend that shows the detailed scheduling results per node and per extension point" title="The simulator web frontend" >}}
56+
57+
Inside the simulator, a debuggable scheduler runs instead of the vanilla scheduler.
58+
This debuggable scheduler outputs the results of each scheduler plugin at every extension point to the Pod’s annotations like the following Yaml shows
59+
and the web front end formats/visualizes the scheduling results based on these annotations.
60+
61+
```yaml
62+
kind: Pod
63+
apiVersion: v1
64+
metadata:
65+
# The JSONs within these annotations are manually formatted for clarity in the blog post.
66+
annotations:
67+
kube-scheduler-simulator.sigs.k8s.io/bind-result: '{"DefaultBinder":"success"}'
68+
kube-scheduler-simulator.sigs.k8s.io/filter-result: >-
69+
{
70+
"node-jjfg5":{
71+
"NodeName":"passed",
72+
"NodeResourcesFit":"passed",
73+
"NodeUnschedulable":"passed",
74+
"TaintToleration":"passed"
75+
},
76+
"node-mtb5x":{
77+
"NodeName":"passed",
78+
"NodeResourcesFit":"passed",
79+
"NodeUnschedulable":"passed",
80+
"TaintToleration":"passed"
81+
}
82+
}
83+
kube-scheduler-simulator.sigs.k8s.io/finalscore-result: >-
84+
{
85+
"node-jjfg5":{
86+
"ImageLocality":"0",
87+
"NodeAffinity":"0",
88+
"NodeResourcesBalancedAllocation":"52",
89+
"NodeResourcesFit":"47",
90+
"TaintToleration":"300",
91+
"VolumeBinding":"0"
92+
},
93+
"node-mtb5x":{
94+
"ImageLocality":"0",
95+
"NodeAffinity":"0",
96+
"NodeResourcesBalancedAllocation":"76",
97+
"NodeResourcesFit":"73",
98+
"TaintToleration":"300",
99+
"VolumeBinding":"0"
100+
}
101+
}
102+
kube-scheduler-simulator.sigs.k8s.io/permit-result: '{}'
103+
kube-scheduler-simulator.sigs.k8s.io/permit-result-timeout: '{}'
104+
kube-scheduler-simulator.sigs.k8s.io/postfilter-result: '{}'
105+
kube-scheduler-simulator.sigs.k8s.io/prebind-result: '{"VolumeBinding":"success"}'
106+
kube-scheduler-simulator.sigs.k8s.io/prefilter-result: '{}'
107+
kube-scheduler-simulator.sigs.k8s.io/prefilter-result-status: >-
108+
{
109+
"AzureDiskLimits":"",
110+
"EBSLimits":"",
111+
"GCEPDLimits":"",
112+
"InterPodAffinity":"",
113+
"NodeAffinity":"",
114+
"NodePorts":"",
115+
"NodeResourcesFit":"success",
116+
"NodeVolumeLimits":"",
117+
"PodTopologySpread":"",
118+
"VolumeBinding":"",
119+
"VolumeRestrictions":"",
120+
"VolumeZone":""
121+
}
122+
kube-scheduler-simulator.sigs.k8s.io/prescore-result: >-
123+
{
124+
"InterPodAffinity":"",
125+
"NodeAffinity":"success",
126+
"NodeResourcesBalancedAllocation":"success",
127+
"NodeResourcesFit":"success",
128+
"PodTopologySpread":"",
129+
"TaintToleration":"success"
130+
}
131+
kube-scheduler-simulator.sigs.k8s.io/reserve-result: '{"VolumeBinding":"success"}'
132+
kube-scheduler-simulator.sigs.k8s.io/result-history: >-
133+
[
134+
{
135+
"kube-scheduler-simulator.sigs.k8s.io/bind-result":"{\"DefaultBinder\":\"success\"}",
136+
"kube-scheduler-simulator.sigs.k8s.io/filter-result":"{\"node-jjfg5\":{\"NodeName\":\"passed\",\"NodeResourcesFit\":\"passed\",\"NodeUnschedulable\":\"passed\",\"TaintToleration\":\"passed\"},\"node-mtb5x\":{\"NodeName\":\"passed\",\"NodeResourcesFit\":\"passed\",\"NodeUnschedulable\":\"passed\",\"TaintToleration\":\"passed\"}}",
137+
"kube-scheduler-simulator.sigs.k8s.io/finalscore-result":"{\"node-jjfg5\":{\"ImageLocality\":\"0\",\"NodeAffinity\":\"0\",\"NodeResourcesBalancedAllocation\":\"52\",\"NodeResourcesFit\":\"47\",\"TaintToleration\":\"300\",\"VolumeBinding\":\"0\"},\"node-mtb5x\":{\"ImageLocality\":\"0\",\"NodeAffinity\":\"0\",\"NodeResourcesBalancedAllocation\":\"76\",\"NodeResourcesFit\":\"73\",\"TaintToleration\":\"300\",\"VolumeBinding\":\"0\"}}",
138+
"kube-scheduler-simulator.sigs.k8s.io/permit-result":"{}",
139+
"kube-scheduler-simulator.sigs.k8s.io/permit-result-timeout":"{}",
140+
"kube-scheduler-simulator.sigs.k8s.io/postfilter-result":"{}",
141+
"kube-scheduler-simulator.sigs.k8s.io/prebind-result":"{\"VolumeBinding\":\"success\"}",
142+
"kube-scheduler-simulator.sigs.k8s.io/prefilter-result":"{}",
143+
"kube-scheduler-simulator.sigs.k8s.io/prefilter-result-status":"{\"AzureDiskLimits\":\"\",\"EBSLimits\":\"\",\"GCEPDLimits\":\"\",\"InterPodAffinity\":\"\",\"NodeAffinity\":\"\",\"NodePorts\":\"\",\"NodeResourcesFit\":\"success\",\"NodeVolumeLimits\":\"\",\"PodTopologySpread\":\"\",\"VolumeBinding\":\"\",\"VolumeRestrictions\":\"\",\"VolumeZone\":\"\"}",
144+
"kube-scheduler-simulator.sigs.k8s.io/prescore-result":"{\"InterPodAffinity\":\"\",\"NodeAffinity\":\"success\",\"NodeResourcesBalancedAllocation\":\"success\",\"NodeResourcesFit\":\"success\",\"PodTopologySpread\":\"\",\"TaintToleration\":\"success\"}",
145+
"kube-scheduler-simulator.sigs.k8s.io/reserve-result":"{\"VolumeBinding\":\"success\"}",
146+
"kube-scheduler-simulator.sigs.k8s.io/score-result":"{\"node-jjfg5\":{\"ImageLocality\":\"0\",\"NodeAffinity\":\"0\",\"NodeResourcesBalancedAllocation\":\"52\",\"NodeResourcesFit\":\"47\",\"TaintToleration\":\"0\",\"VolumeBinding\":\"0\"},\"node-mtb5x\":{\"ImageLocality\":\"0\",\"NodeAffinity\":\"0\",\"NodeResourcesBalancedAllocation\":\"76\",\"NodeResourcesFit\":\"73\",\"TaintToleration\":\"0\",\"VolumeBinding\":\"0\"}}",
147+
"kube-scheduler-simulator.sigs.k8s.io/selected-node":"node-mtb5x"
148+
}
149+
]
150+
kube-scheduler-simulator.sigs.k8s.io/score-result: >-
151+
{
152+
"node-jjfg5":{
153+
"ImageLocality":"0",
154+
"NodeAffinity":"0",
155+
"NodeResourcesBalancedAllocation":"52",
156+
"NodeResourcesFit":"47",
157+
"TaintToleration":"0",
158+
"VolumeBinding":"0"
159+
},
160+
"node-mtb5x":{
161+
"ImageLocality":"0",
162+
"NodeAffinity":"0",
163+
"NodeResourcesBalancedAllocation":"76",
164+
"NodeResourcesFit":"73",
165+
"TaintToleration":"0",
166+
"VolumeBinding":"0"
167+
}
168+
}
169+
kube-scheduler-simulator.sigs.k8s.io/selected-node: node-mtb5x
170+
```
171+
172+
Users can also integrate [their custom plugins](/docs/concepts/scheduling-eviction/scheduling-framework/) or [extenders](https://github.com/kubernetes/design-proposals-archive/blob/main/scheduling/scheduler_extender.md), into the debuggable scheduler and visualize their results.
173+
174+
This debuggable scheduler can also run standalone, e.g., on any Kubernetes cluster or in integration tests.
175+
This would be useful to custom plugin developers who want to test their plugins or examine their custom scheduler in a real cluster with better debuggability.
176+
177+
## The simulator as a better dev cluster
178+
179+
As mentioned earlier, with a limited set of tests, it is impossible to predict every possible scenario in a real-world cluster.
180+
Typically, users will test the scheduler in a small, development cluster before deploying it to production, hoping that no issues arise.
181+
182+
[The simulator’s importing feature](https://github.com/kubernetes-sigs/kube-scheduler-simulator/blob/master/simulator/docs/import-cluster-resources.md)
183+
provides a solution by allowing users to simulate deploying a new scheduler version in a production-like environment without impacting their live workloads.
184+
185+
By continuously syncing between a production cluster and the simulator, users can safely test a new scheduler version with the same resources their production cluster handles.
186+
Once confident in its performance, they can proceed with the production deployment, reducing the risk of unexpected issues.
187+
188+
## What are the use cases?
189+
190+
1. **Cluster users**: Examine if scheduling constraints (e.g., PodAffinity, PodTopologySpread) work as intended.
191+
1. **Cluster admins**: Assess how a cluster would behave with changes to the scheduler configuration.
192+
1. **Scheduler plugin developers**: Test a custom scheduler plugins or extenders, use the debuggable scheduler in integration tests or development clusters, or use the [syncing](https://github.com/kubernetes-sigs/kube-scheduler-simulator/blob/simulator/v0.3.0/simulator/docs/import-cluster-resources.md) feature for testing within a production-like environment.
193+
194+
## Getting started
195+
196+
The simulator only requires Docker to be installed on a machine; a Kubernetes cluster is not necessary.
197+
198+
```
199+
git clone [email protected]:kubernetes-sigs/kube-scheduler-simulator.git
200+
cd kube-scheduler-simulator
201+
make docker_up
202+
```
203+
204+
You can then access the simulator's web UI at `http://localhost:3000`.
205+
206+
Visit the [kube-scheduler-simulator repository](https://sigs.k8s.io/kube-scheduler-simulator) for more details!
207+
208+
## Getting involved
209+
210+
The scheduler simulator is developed by [Kubernetes SIG Scheduling](https://github.com/kubernetes/community/blob/master/sig-scheduling/README.md#kube-scheduler-simulator). Your feedback and contributions are welcome!
211+
212+
Open issues or PRs at the [kube-scheduler-simulator repository](https://sigs.k8s.io/kube-scheduler-simulator).
213+
Join the conversation on the [#sig-scheduling](https://kubernetes.slack.com/messages/sig-scheduling) slack channel.
214+
215+
216+
## Acknowledgments
217+
218+
The simulator has been maintained by dedicated volunteer engineers, overcoming many challenges to reach its current form.
219+
220+
A big shout out to all [the awesome contributors](https://github.com/kubernetes-sigs/kube-scheduler-simulator/graphs/contributors)!
420 KB
Loading

0 commit comments

Comments
 (0)