1
1
---
2
2
title : Kubernetes 调度器
3
3
content_template : templates/concept
4
- weight : 60
4
+ weight : 50
5
5
---
6
6
7
7
<!--
8
8
---
9
9
title: Kubernetes Scheduler
10
10
content_template: templates/concept
11
- weight: 60
11
+ weight: 50
12
12
---
13
13
-->
14
14
{{% capture overview %}}
@@ -100,13 +100,11 @@ locality, inter-workload interference, and so on.
100
100
kube-scheduler selects a node for the pod in a 2-step operation:
101
101
102
102
1. Filtering
103
-
104
103
2. Scoring
105
104
-->
106
105
kube-scheduler 给一个 pod 做调度选择包含两个步骤:
107
106
108
107
1 . 过滤
109
-
110
108
2 . 打分
111
109
112
110
<!--
@@ -133,184 +131,36 @@ one of these at random.
133
131
-->
134
132
最后,kube-scheduler 会将 Pod 调度到得分最高的 Node 上。如果存在多个得分最高的 Node,kube-scheduler 会从中随机选取一个。
135
133
136
- <!--
137
- ### Default policies
138
- -->
139
- ### 默认策略
140
-
141
- <!--
142
- kube-scheduler has a default set of scheduling policies.
143
- -->
144
- kube-scheduler 有一系列的默认调度策略。
145
-
146
- <!--
147
- ### Filtering
148
-
149
- - `PodFitsHostPorts`: Checks if a Node has free ports (the network protocol kind)
150
- for the Pod ports the the Pod is requesting.
151
-
152
- - `PodFitsHost`: Checks if a Pod specifies a specific Node by it hostname.
153
-
154
- - `PodFitsResources`: Checks if the Node has free resources (eg, CPU and Memory)
155
- to meet the requirement of the Pod.
156
-
157
- - `PodMatchNodeSelector`: Checks if a Pod's Node {{< glossary_tooltip term_id="selector" >}}
158
- matches the Node's {{< glossary_tooltip text="label(s)" term_id="label" >}}.
159
-
160
- - `NoVolumeZoneConflict`: Evaluate if the {{< glossary_tooltip text="Volumes" term_id="volume" >}}
161
- that a Pod requests are available on the Node, given the failure zone restrictions for
162
- that storage.
163
-
164
- - `NoDiskConflict`: Evaluates if a Pod can fit on a Node due to the volumes it requests,
165
- and those that are already mounted.
166
-
167
- - `MaxCSIVolumeCount`: Decides how many {{< glossary_tooltip text="CSI" term_id="csi" >}}
168
- volumes should be attached, and whether that's over a configured limit.
169
-
170
- - `CheckNodeMemoryPressure`: If a Node is reporting memory pressure, and there's no
171
- configured exception, the Pod won't be scheduled there.
172
-
173
- - `CheckNodePIDPressure`: If a Node is reporting that process IDs are scarce, and
174
- there's no configured exception, the Pod won't be scheduled there.
175
-
176
- - `CheckNodeDiskPressure`: If a Node is reporting storage pressure (a filesystem that
177
- is full or nearly full), and there's no configured exception, the Pod won't be
178
- scheduled there.
179
-
180
- - `CheckNodeCondition`: Nodes can report that they have a completely full filesystem,
181
- that networking isn't available or that kubelet is otherwise not ready to run Pods.
182
- If such a condition is set for a Node, and there's no configured exception, the Pod
183
- won't be scheduled there.
184
-
185
- - `PodToleratesNodeTaints`: checks if a Pod's {{< glossary_tooltip text="tolerations" term_id="toleration" >}}
186
- can tolerate the Node's {{< glossary_tooltip text="taints" term_id="taint" >}}.
187
-
188
- - `CheckVolumeBinding`: Evaluates if a Pod can fit due to the volumes it requests.
189
- This applies for both bound and unbound
190
- {{< glossary_tooltip text="PVCs" term_id="persistent-volume-claim" >}}
134
+ <!--
135
+ There are two supported ways to configure the filtering and scoring behavior
136
+ of the scheduler:
191
137
-->
192
- ### 过滤策略
193
-
194
- - ` PodFitsHostPorts ` :如果 Pod 中定义了 hostPort 属性,那么需要先检查这个指定端口是否
195
- 已经被 Node 上其他服务占用了。
196
-
197
- - ` PodFitsHost ` :若 pod 对象拥有 hostname 属性,则检查 Node 名称字符串与此属性是否匹配。
198
-
199
- - ` PodFitsResources ` :检查 Node 上是否有足够的资源(如,cpu 和内存)来满足 pod 的资源请求。
200
-
201
- - ` PodMatchNodeSelector ` :检查 Node 的 {{< glossary_tooltip text="标签" term_id="label" >}} 是否能匹配
202
- Pod 属性上 Node 的 {{< glossary_tooltip text="标签" term_id="label" >}} 值。
203
-
204
- - ` NoVolumeZoneConflict ` :检测 pod 请求的 {{< glossary_tooltip text="Volumes" term_id="volume" >}} 在
205
- Node 上是否可用,因为某些存储卷存在区域调度约束。
206
-
207
- - ` NoDiskConflict ` :检查 Pod 对象请求的存储卷在 Node 上是否可用,若不存在冲突则通过检查。
208
-
209
- - ` MaxCSIVolumeCount ` :检查 Node 上已经挂载的 {{< glossary_tooltip text="CSI" term_id="csi" >}}
210
- 存储卷数量是否超过了指定的最大值。
211
-
212
- - ` CheckNodeMemoryPressure ` :如果 Node 上报了内存资源压力过大,而且没有配置异常,那么 Pod 将不会被调度到这个 Node 上。
213
-
214
- - ` CheckNodePIDPressure ` :如果 Node 上报了 PID 资源压力过大,而且没有配置异常,那么 Pod 将不会被调度到这个 Node 上。
215
-
216
- - ` CheckNodeDiskPressure ` :如果 Node 上报了磁盘资源压力过大(文件系统满了或者将近满了),
217
- 而且配置没有异常,那么 Pod 将不会被调度到这个 Node 上。
138
+ 支持以下两种方式配置调度器的过滤和打分行为:
218
139
219
- - ` CheckNodeCondition ` :Node 可以上报其自身的状态,如磁盘、网络不可用,表明 kubelet 未准备好运行 pod。
220
- 如果 Node 被设置成这种状态,那么 pod 将不会被调度到这个 Node 上。
140
+ <!--
141
+ 1. [Scheduling Policies](/docs/reference/scheduling/policies) allow you to
142
+ configure _Predicates_ for filtering and _Priorities_ for scoring.
143
+ 1. [Scheduling Profiles](/docs/reference/scheduling/profiles) allow you to
144
+ configure Plugins that implement different scheduling stages, including:
145
+ `QueueSort`, `Filter`, `Score`, `Bind`, `Reserve`, `Permit`, and others. You
146
+ can also configure the kube-scheduler to run different profiles.
147
+ -->
148
+ 1 . [ 调度策略] ( /docs/reference/scheduling/policies ) 允许你配置过滤的 _ 谓词(Predicates)_ 和打分的 _ 优先级(Priorities)_ 。
149
+ 2 . [ 调度配置] ( /docs/reference/scheduling/profiles ) 允许你配置实现不同调度阶段的插件,包括:` QueueSort ` , ` Filter ` , ` Score ` , ` Bind ` , ` Reserve ` , ` Permit ` 等等。你也可以配置 kube-scheduler 运行不同的配置文件。
221
150
222
- - ` PodToleratesNodeTaints ` :检查 pod 属性上的 {{< glossary_tooltip text="tolerations" term_id="toleration" >}} 能否容忍
223
- Node 的 {{< glossary_tooltip text="taints" term_id="taint" >}}。
224
-
225
- - ` CheckVolumeBinding ` :检查 Node 上已经绑定的和未绑定的 {{< glossary_tooltip text="PVCs" term_id="persistent-volume-claim" >}}
226
- 能否满足 Pod 对象的存储卷需求。
227
-
228
- <!--
229
- ### Scoring
230
-
231
- - `SelectorSpreadPriority`: Spreads Pods across hosts, considering Pods that
232
- belonging to the same {{< glossary_tooltip text="Service" term_id="service" >}},
233
- {{< glossary_tooltip term_id="statefulset" >}} or
234
- {{< glossary_tooltip term_id="replica-set" >}}.
235
-
236
- - `InterPodAffinityPriority`: Computes a sum by iterating through the elements
237
- of weightedPodAffinityTerm and adding “weight” to the sum if the corresponding
238
- PodAffinityTerm is satisfied for that node; the node(s) with the highest sum
239
- are the most preferred.
240
-
241
- - `LeastRequestedPriority`: Favors nodes with fewer requested resources. In other
242
- words, the more Pods that are placed on a Node, and the more resources those
243
- Pods use, the lower the ranking this policy will give.
244
-
245
- - `MostRequestedPriority`: Favors nodes with most requested resources. This policy
246
- will fit the scheduled Pods onto the smallest number of Nodes needed to run your
247
- overall set of workloads.
248
-
249
- - `RequestedToCapacityRatioPriority`: Creates a requestedToCapacity based ResourceAllocationPriority using default resource scoring function shape.
250
-
251
- - `BalancedResourceAllocation`: Favors nodes with balanced resource usage.
252
-
253
- - `NodePreferAvoidPodsPriority`: Priorities nodes according to the node annotation
254
- `scheduler.alpha.kubernetes.io/preferAvoidPods`. You can use this to hint that
255
- two different Pods shouldn't run on the same Node.
256
-
257
- - `NodeAffinityPriority`: Prioritizes nodes according to node affinity scheduling
258
- preferences indicated in PreferredDuringSchedulingIgnoredDuringExecution.
259
- You can read more about this in [Assigning Pods to Nodes](https://kubernetes.io/docs/concepts/configuration/assign-pod-node/)
260
-
261
- - `TaintTolerationPriority`: Prepares the priority list for all the nodes, based on
262
- the number of intolerable taints on the node. This policy adjusts a node's rank
263
- taking that list into account.
264
-
265
- - `ImageLocalityPriority`: Favors nodes that already have the
266
- {{< glossary_tooltip text="container images" term_id="image" >}} for that
267
- Pod cached locally.
268
-
269
- - `ServiceSpreadingPriority`: For a given Service, this policy aims to make sure that
270
- the Pods for the Service run on different nodes. It favouring scheduling onto nodes
271
- that don't have Pods for the service already assigned there. The overall outcome is
272
- that the Service becomes more resilient to a single Node failure.
273
-
274
- - `CalculateAntiAffinityPriorityMap`: This policy helps implement
275
- [pod anti-affinity](https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#affinity-and-anti-affinity).
151
+ {{% /capture %}}
152
+ {{% capture whatsnext %}}
276
153
277
- - `EqualPriorityMap`: Gives an equal weight of one to all nodes.
154
+ <!--
155
+ * Read about [scheduler performance tuning](/docs/concepts/scheduling-eviction/scheduler-perf-tuning/)
156
+ * Read about [Pod topology spread constraints](/docs/concepts/workloads/pods/pod-topology-spread-constraints/)
157
+ * Read the [reference documentation](/docs/reference/command-line-tools-reference/kube-scheduler/) for kube-scheduler
158
+ * Learn about [configuring multiple schedulers](/docs/tasks/administer-cluster/configure-multiple-schedulers/)
159
+ * Learn about [topology management policies](/docs/tasks/administer-cluster/topology-manager/)
160
+ * Learn about [Pod Overhead](/docs/concepts/configuration/pod-overhead/)
278
161
-->
279
- ### 打分策略
280
-
281
- - ` SelectorSpreadPriority ` :尽量将归属于同一个 {{< glossary_tooltip text="Service" term_id="service" >}}、{{< glossary_tooltip term_id="statefulset" >}} 或 {{< glossary_tooltip term_id="replica-set" >}} 的 Pod 资源分散到不同的 Node 上。
282
-
283
- - ` InterPodAffinityPriority ` :遍历 Pod 对象的亲和性条目,并将那些能够匹配到给定 Node 的条目的权重相加,结果值越大的 Node 得分越高。
284
-
285
- - ` LeastRequestedPriority ` :空闲资源比例越高的 Node 得分越高。换句话说,Node 上的 Pod 越多,并且资源被占用的越多,那么这个 Node 的得分就会越少。
286
-
287
- - ` MostRequestedPriority ` :空闲资源比例越低的 Node 得分越高。这个调度策略将会把你所有的工作负载(Pod)调度到尽量少的 Node 上。
288
-
289
- - ` RequestedToCapacityRatioPriority ` :为 Node 上每个资源占用比例设定得分值,给资源打分函数在打分时使用。
290
-
291
- - ` BalancedResourceAllocation ` :优选那些使得资源利用率更为均衡的节点。
292
-
293
- - ` NodePreferAvoidPodsPriority ` :这个策略将根据 Node 的注解信息中是否含有 ` scheduler.alpha.kubernetes.io/preferAvoidPods ` 来
294
- 计算其优先级。使用这个策略可以将两个不同 Pod 运行在不同的 Node 上。
295
162
296
- - ` NodeAffinityPriority ` :基于 Pod 属性中 PreferredDuringSchedulingIgnoredDuringExecution 来进行 Node 亲和性调度。你可以通过这篇文章
297
- [ Pods 到 Nodes 的分派] ( /zh/docs/concepts/configuration/assign-pod-node/ ) 来了解到更详细的内容。
298
-
299
- - ` TaintTolerationPriority ` :基于 Pod 中对每个 Node 上污点容忍程度进行优先级评估,这个策略能够调整待选 Node 的排名。
300
-
301
- - ` ImageLocalityPriority ` :Node 上已经拥有 Pod 需要的 {{< glossary_tooltip text="容器镜像" term_id="image" >}} 的 Node 会有较高的优先级。
302
-
303
- - ` ServiceSpreadingPriority ` :这个调度策略的主要目的是确保将归属于同一个 Service 的 Pod 调度到不同的 Node 上。如果 Node 上
304
- 没有归属于同一个 Service 的 Pod,这个策略更倾向于将 Pod 调度到这类 Node 上。最终的目的:即使在一个 Node 宕机之后 Service 也具有很强容灾能力。
305
-
306
- - ` CalculateAntiAffinityPriorityMap ` :这个策略主要是用来实现[ pod反亲和]
307
- (/zh/docs/concepts/configuration/assign-pod-node/#affinity-and-anti-affinity)。
308
-
309
- - ` EqualPriorityMap ` :将所有的 Node 设置成相同的权重为 1。
310
-
311
- {{% /capture %}}
312
- {{% capture whatsnext %}}
313
- * 阅读关于 [ 调度器性能调优] ( /zh/docs/concepts/scheduling/scheduler-perf-tuning/ )
163
+ * 阅读关于 [ 调度器性能调优] ( /zh/docs/concepts/scheduling-eviction/scheduler-perf-tuning/ )
314
164
* 阅读关于 [ Pod 拓扑分布约束] ( /zh/docs/concepts/workloads/pods/pod-topology-spread-constraints/ )
315
165
* 阅读关于 kube-scheduler 的 [ 参考文档] ( /zh/docs/reference/command-line-tools-reference/kube-scheduler/ )
316
166
* 了解关于 [ 配置多个调度器] ( /zh/docs/tasks/administer-cluster/configure-multiple-schedulers/ ) 的方式
0 commit comments