Skip to content

Commit a181d54

Browse files
committed
fix: apply suggestions
1 parent 8ba6d96 commit a181d54

File tree

1 file changed

+16
-14
lines changed
  • content/en/blog/_posts/2024-12-12-scheduler-queueinghint

1 file changed

+16
-14
lines changed

content/en/blog/_posts/2024-12-12-scheduler-queueinghint/index.md

Lines changed: 16 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -27,11 +27,11 @@ The scheduler stores all unscheduled Pods in an internal component called the _s
2727
The scheduling queue consists of the following data structures:
2828
- **ActiveQ**: holds newly created Pods or Pods that are ready to be retried for scheduling.
2929
- **BackoffQ**: holds Pods that are ready to be retried but are waiting for a backoff period to end. The
30-
backoff period depends on the number of times the failed scheduler attempted to schedule the Pod.
30+
backoff period depends on the number of unsuccessful scheduling attempts performed by the scheduler on that Pod.
3131
- **Unschedulable Pod Pool**: holds Pods that the scheduler won't attempt to schedule for one of the
3232
following reasons:
33-
- The scheduler previously attempted, and was unable to, schedule the Pods. Since that attempt, the cluster
34-
hasn't changed in a way that makes those Pods schedulable.
33+
- The scheduler previously attempted and was unable to schedule the Pods. Since that attempt, the cluster
34+
hasn't changed in a way that could make those Pods schedulable.
3535
- The Pods are blocked from entering the scheduling cycles by PreEnqueue Plugins,
3636
for example, they have a [scheduling gate](/docs/concepts/scheduling-eviction/pod-scheduling-readiness/#configuring-pod-schedulinggates),
3737
and get blocked by the scheduling gate plugin.
@@ -52,13 +52,13 @@ The scheduler processes pending Pods in phases called _cycles_ as follows:
5252

5353
If the scheduler decides that a Pod can't be scheduled, that Pod enters the Unschedulable Pod Pool
5454
component of the scheduling queue. However, if the scheduler decides to place the Pod on a node,
55-
the next cycle executes for that Pod.
55+
the Pod goes to the binding cycle.
5656

5757
1. **Binding cycle**: the scheduler communicates the node placement decision to the Kubernetes API
58-
server. The Pod is then bound to the selected node.
58+
server. This operation bounds the Pod to the selected node.
5959

6060
Aside from some exceptions, most unscheduled Pods enter the unschedulable pod pool after each scheduling
61-
cycle. The Unschedulable Pod Pool component is crucial because of how the scheduling cycle processes Pods one by one. If the scheduler had to constantly retry placing unschedulable Pods instead of offloading those
61+
cycle. The Unschedulable Pod Pool component is crucial because of how the scheduling cycle processes Pods one by one. If the scheduler had to constantly retry placing unschedulable Pods, instead of offloading those
6262
Pods to the Unschedulable Pod Pool, multiple scheduling cycles would be wasted on those Pods.
6363

6464
## Improvements to retrying Pod scheduling with QueuingHint
@@ -75,9 +75,9 @@ For example, `preCheck` could filter out node-related events when the node statu
7575

7676
However, we had two issues for those approaches:
7777
- Requeueing with events was too broad, could lead to scheduling retries for no reason.
78-
- For example, a new scheduled Pod _might_ solve the `InterPodAffinity`'s failure, but not all of them do,
79-
for example, if a new Pod is created, but without a label matching `InterPodAffinity` of the unschedulable pod, the pod wouldn't be schedulable.
80-
- `preCheck` relied on the logic of in-tree plugins and caused some issues for custom plugins,
78+
- A new scheduled Pod _might_ solve the `InterPodAffinity`'s failure, but not all of them do.
79+
For example, if a new Pod is created, but without a label matching `InterPodAffinity` of the unschedulable pod, the pod wouldn't be schedulable.
80+
- `preCheck` relied on the logic of in-tree plugins and was not extensible to custom plugins,
8181
like in issue [#110175](https://github.com/kubernetes/kubernetes/issues/110175).
8282

8383
Here QueueingHints come into play;
@@ -87,7 +87,7 @@ For example, consider a Pod named `pod-a` that has a required Pod affinity. `pod
8787
the scheduling cycle by the `InterPodAffinity` plugin because no node had an existing Pod that matched
8888
the Pod affinity specification for `pod-a`.
8989

90-
![pod-a got rejected by InterPodAffinity](./queueinghint1.svg)
90+
{{< figure src="queueinghint1.svg" alt="A diagram showing the scheduling queue and pod-a rejected by InterPodAffinity plugin" caption="A diagram showing the scheduling queue and pod-a rejected by InterPodAffinity plugin" >}}
9191

9292
`pod-a` moves into the Unschedulable Pod Pool. The scheduling queue records which plugin caused
9393
the scheduling failure for the Pod. For `pod-a`, the scheduling queue records that the `InterPodAffinity`
@@ -100,24 +100,26 @@ Then, if a Pod gets a label update that matches the Pod affinity requirement of
100100
plugin's `QueuingHint` prompts the scheduling queue to move `pod-a` back into the ActiveQ or
101101
the BackoffQ component.
102102

103-
![pod-a is moved by InterPodAffinity QueueingHint](./queueinghint2.svg)
103+
{{< figure src="queueinghint2.svg" alt="A diagram showing the scheduling queue and pod-a being moved by InterPodAffinity QueueingHint" caption="A diagram showing the scheduling queue and pod-a being moved by InterPodAffinity QueueingHint" >}}
104104

105105
## QueueingHint's history and what's new in v1.32
106106

107-
Within SIG Scheduling, we have been working on the development of QueueingHint since
107+
At SIG Scheduling, we have been working on the development of QueueingHint since
108108
Kubernetes v1.28.
109109

110110
While QueuingHint isn't user-facing, we implemented the `SchedulerQueueingHints` feature gate as a
111111
safety measure when we originally added this feature. In v1.28, we implemented QueueingHints with a
112112
few in-tree plugins experimentally, and made the feature gate enabled by default.
113113

114-
However, users reported a memory leak issue, and consequently we disabled the feature gate in a
114+
However, users reported a memory leak, and consequently we disabled the feature gate in a
115115
patch release of v1.28. From v1.28 until v1.31, we kept working on the QueueingHint implementation
116116
within the rest of the in-tree plugins and fixing bugs.
117117

118-
In v1.32, we will make this feature enabled by default again. We finished implementing QueueingHints
118+
In v1.32, we made this feature enabled by default again. We finished implementing QueueingHints
119119
in all plugins and also identified the cause of the memory leak!
120120

121+
We thank all the contributors who participated in the development of this feature and those who reported and investigated the earlier issues.
122+
121123
## Getting involved
122124

123125
These features are managed by Kubernetes [SIG Scheduling](https://github.com/kubernetes/community/tree/master/sig-scheduling).

0 commit comments

Comments
 (0)