|
| 1 | +--- |
| 2 | +layout: blog |
| 3 | +title: "Kubernetes 1.26:Pod 调度就绪态" |
| 4 | +date: 2022-12-26 |
| 5 | +slug: pod-scheduling-readiness-alpha |
| 6 | +--- |
| 7 | + |
| 8 | +<!-- |
| 9 | +layout: blog |
| 10 | +title: "Kubernetes 1.26: Pod Scheduling Readiness" |
| 11 | +date: 2022-12-26 |
| 12 | +slug: pod-scheduling-readiness-alpha |
| 13 | +--> |
| 14 | + |
| 15 | +<!-- |
| 16 | +**Author:** Wei Huang (Apple), Abdullah Gharaibeh (Google) |
| 17 | +--> |
| 18 | +**作者:** Wei Huang (Apple), Abdullah Gharaibeh (Google) |
| 19 | + |
| 20 | +**译者:** XiaoYang Zhang (HuaWei) |
| 21 | + |
| 22 | +<!-- |
| 23 | +Kubernetes 1.26 introduced a new Pod feature: _scheduling gates_. In Kubernetes, scheduling gates |
| 24 | +are keys that tell the scheduler when a Pod is ready to be considered for scheduling. |
| 25 | +--> |
| 26 | +Kubernetes 1.26 引入了一个新的 Pod 特性:**调度门控**。 |
| 27 | +在 Kubernetes 中,调度门控是通知调度器何时可以考虑 Pod 调度的关键。 |
| 28 | + |
| 29 | +<!-- |
| 30 | +## What problem does it solve? |
| 31 | +
|
| 32 | +When a Pod is created, the scheduler will continuously attempt to find a node that fits it. This |
| 33 | +infinite loop continues until the scheduler either finds a node for the Pod, or the Pod gets deleted. |
| 34 | +--> |
| 35 | +## 它解决了什么问题? |
| 36 | + |
| 37 | +当 Pod 被创建时,调度器会不断尝试寻找适合它的节点。这个无限循环一直持续到调度程序为 Pod 找到节点,或者 Pod 被删除。 |
| 38 | + |
| 39 | +<!-- |
| 40 | +Pods that remain unschedulable for long periods of time (e.g., ones that are blocked on some external event) |
| 41 | +waste scheduling cycles. A scheduling cycle may take ≅20ms or more depending on the complexity of |
| 42 | +the Pod's scheduling constraints. Therefore, at scale, those wasted cycles significantly impact the |
| 43 | +scheduler's performance. See the arrows in the "scheduler" box below. |
| 44 | +--> |
| 45 | +长时间无法被调度的 Pod(例如,被某些外部事件阻塞的 Pod)会浪费调度周期。 |
| 46 | +一个调度周期可能需要约 20ms 或更长时间,这取决于 Pod 的调度约束的复杂度。 |
| 47 | +因此,大量浪费的被调度周期会严重影响调度器的性能。请参阅下面 “调度器” 框中的箭头。 |
| 48 | + |
| 49 | +{{< mermaid >}} |
| 50 | +graph LR; |
| 51 | + pod((新 Pod))-->queue |
| 52 | + subgraph 调度器 |
| 53 | + queue(调度器队列) |
| 54 | + sched_cycle[/调度周期/] |
| 55 | + schedulable{可调度?} |
| 56 | + |
| 57 | + queue==>|弹出|sched_cycle |
| 58 | + sched_cycle==>schedulable |
| 59 | + schedulable==>|否|queue |
| 60 | + subgraph note [循环浪费在不断重新安排 'unready' 状态的 Pod 上] |
| 61 | + end |
| 62 | + end |
| 63 | + |
| 64 | + classDef plain fill:#ddd,stroke:#fff,stroke-width:1px,color:#000; |
| 65 | + classDef k8s fill:#326ce5,stroke:#fff,stroke-width:1px,color:#fff; |
| 66 | + classDef Scheduler fill:#fff,stroke:#bbb,stroke-width:2px,color:#326ce5; |
| 67 | + classDef note fill:#edf2ae,stroke:#fff,stroke-width:1px; |
| 68 | + class queue,sched_cycle,schedulable k8s; |
| 69 | + class pod plain; |
| 70 | + class note note; |
| 71 | + class Scheduler Scheduler; |
| 72 | +{{< /mermaid >}} |
| 73 | + |
| 74 | +<!-- |
| 75 | +Scheduling gates helps address this problem. It allows declaring that newly created Pods are not |
| 76 | +ready for scheduling. When scheduling gates are present on a Pod, the scheduler ignores the Pod |
| 77 | +and therefore saves unnecessary scheduling attempts. Those Pods will also be ignored by Cluster |
| 78 | +Autoscaler if you have it installed in the cluster. |
| 79 | +--> |
| 80 | +调度门控有助于解决这个问题。它允许声明新创建的 Pod 尚未准备好进行调度。 |
| 81 | +当 Pod 上设置了调度门控时,调度程序会忽略该 Pod,从而避免不必要的调度尝试。 |
| 82 | +如果你在集群中安装了 Cluster Autoscaler,这些 Pod 也将被忽略。 |
| 83 | + |
| 84 | +<!-- |
| 85 | +Clearing the gates is the responsibility of external controllers with knowledge of when the Pod |
| 86 | +should be considered for scheduling (e.g., a quota manager). |
| 87 | +--> |
| 88 | +清除门控是外部控制器的责任,外部控制器知道何时应考虑对 Pod 进行调度(例如,配额管理器)。 |
| 89 | + |
| 90 | +{{< mermaid >}} |
| 91 | +graph LR; |
| 92 | + pod((新 Pod))-->queue |
| 93 | + subgraph 调度器 |
| 94 | + queue(调度器队列) |
| 95 | + sched_cycle[/调度周期/] |
| 96 | + schedulable{可调度?} |
| 97 | + popout{弹出?} |
| 98 | + |
| 99 | + queue==>|PreEnqueue 检查|popout |
| 100 | + popout-->|是|sched_cycle |
| 101 | + popout==>|否|queue |
| 102 | + sched_cycle-->schedulable |
| 103 | + schedulable-->|否|queue |
| 104 | + subgraph note [控制 Pod 调度的开关] |
| 105 | + end |
| 106 | + end |
| 107 | + |
| 108 | + classDef plain fill:#ddd,stroke:#fff,stroke-width:1px,color:#000; |
| 109 | + classDef k8s fill:#326ce5,stroke:#fff,stroke-width:1px,color:#fff; |
| 110 | + classDef Scheduler fill:#fff,stroke:#bbb,stroke-width:2px,color:#326ce5; |
| 111 | + classDef note fill:#edf2ae,stroke:#fff,stroke-width:1px; |
| 112 | + classDef popout fill:#f96,stroke:#fff,stroke-width:1px; |
| 113 | + class queue,sched_cycle,schedulable k8s; |
| 114 | + class pod plain; |
| 115 | + class note note; |
| 116 | + class popout popout; |
| 117 | + class Scheduler Scheduler; |
| 118 | +{{< /mermaid >}} |
| 119 | + |
| 120 | +<!-- |
| 121 | +## How does it work? |
| 122 | +
|
| 123 | +Scheduling gates in general works very similar to Finalizers. Pods with a non-empty |
| 124 | +`spec.schedulingGates` field will show as status `SchedulingGated` and be blocked from |
| 125 | +scheduling. Note that more than one gate can be added, but they all should be added upon Pod |
| 126 | +creation (e.g., you can add them as part of the spec or via a mutating webhook). |
| 127 | +--> |
| 128 | +## 它是如何工作的? |
| 129 | + |
| 130 | +总体而言,调度门控的工作方式与 Finalizer 非常相似。具有非空 `spec.schedulingGates` 字段的 Pod |
| 131 | +的状态将显示为 `SchedulingGated`,并阻止被调度。请注意,添加多个门控是可以的,但它们都应该在创建 Pod |
| 132 | +时添加(例如,你可以将它们作为规约的一部分或者通过变更性质的 Webhook)。 |
| 133 | + |
| 134 | +``` |
| 135 | +NAME READY STATUS RESTARTS AGE |
| 136 | +test-pod 0/1 SchedulingGated 0 10s |
| 137 | +``` |
| 138 | + |
| 139 | +<!-- |
| 140 | +To clear the gates, you update the Pod by removing all of the items from the Pod's `schedulingGates` |
| 141 | +field. The gates do not need to be removed all at once, but only when all the gates are removed the |
| 142 | +scheduler will start to consider the Pod for scheduling. |
| 143 | +--> |
| 144 | +要清除这些门控,你可以通过删除 Pod 的 `schedulingGates` 字段中的所有条目来更新 Pod。 |
| 145 | +不需要一次性移除所有门控,但是,只有当所有门控都移除后,调度器才会开始考虑对 Pod 进行调度。 |
| 146 | + |
| 147 | +<!-- |
| 148 | +Under the hood, scheduling gates are implemented as a PreEnqueue scheduler plugin, a new scheduler |
| 149 | +framework extension point that is invoked at the beginning of each scheduling cycle. |
| 150 | +--> |
| 151 | +在后台,调度门控以 PreEnqueue 调度器插件的方式实现,这是调度器框架的新扩展点,在每个调度周期开始时调用。 |
| 152 | + |
| 153 | +<!-- |
| 154 | +## Use Cases |
| 155 | +
|
| 156 | +An important use case this feature enables is dynamic quota management. Kubernetes supports |
| 157 | +[ResourceQuota](/docs/concepts/policy/resource-quotas/), however the API Server enforces quota at |
| 158 | +the time you attempt Pod creation. For example, if a new Pod exceeds the CPU quota, it gets rejected. |
| 159 | +The API Server doesn't queue the Pod; therefore, whoever created the Pod needs to continuously attempt |
| 160 | +to recreate it again. This either means a delay between resources becoming available and the Pod |
| 161 | +actually running, or it means load on the API server and Scheduler due to constant attempts. |
| 162 | +--> |
| 163 | +## 用例 |
| 164 | + |
| 165 | +此特性所支持的一个重要使用场景是动态配额管理。Kubernetes 支持[资源配额](/zh-cn/docs/concepts/policy/resource-quotas/), |
| 166 | +但是 API Server 会在你尝试创建 Pod 时强制执行配额。例如,如果一个新的 Pod 超过了 CPU 配额,它就会被拒绝。 |
| 167 | +API Server 不会对 Pod 进行排队;因此,无论是谁创建了 Pod,都需要不断尝试重新创建它。 |
| 168 | +这意味着在资源可用和 Pod 实际运行之间会有延迟,或者意味着由于不断尝试,会增加 API Server 和 Scheduler 的负载。 |
| 169 | + |
| 170 | +<!-- |
| 171 | +Scheduling gates allows an external quota manager to address the above limitation of ResourceQuota. |
| 172 | +Specifically, the manager could add a `example.com/quota-check` scheduling gate to all Pods created in the |
| 173 | +cluster (using a mutating webhook). The manager would then remove the gate when there is quota to |
| 174 | +start the Pod. |
| 175 | +--> |
| 176 | +调度门控允许外部配额管理器解决 ResourceQuota 的上述限制。具体来说, |
| 177 | +管理员可以(使用变更性质的 Webhook)为集群中创建的所有 Pod 添加一个 |
| 178 | +`example.com/quota-check` 调度门控。当存在用于启动 Pod 的配额时,管理器将移除此门控 |
| 179 | + |
| 180 | +<!-- |
| 181 | +## Whats next? |
| 182 | +
|
| 183 | +To use this feature, the `PodSchedulingReadiness` feature gate must be enabled in the API Server |
| 184 | +and scheduler. You're more than welcome to test it out and tell us (SIG Scheduling) what you think! |
| 185 | +--> |
| 186 | +## 接下来 |
| 187 | + |
| 188 | +要使用此特性,必须在 API Server 和调度器中启用 `PodScheduleingReadiness` 特性门控。 |
| 189 | +非常欢迎你对其进行测试并告诉我们(SIG Scheduling)你的想法! |
| 190 | + |
| 191 | +<!-- |
| 192 | +## Additional resources |
| 193 | +
|
| 194 | +- [Pod Scheduling Readiness](https://kubernetes.io/docs/concepts/scheduling-eviction/pod-scheduling-readiness/) |
| 195 | + in the Kubernetes documentation |
| 196 | +- [Kubernetes Enhancement Proposal](https://github.com/kubernetes/enhancements/blob/master/keps/sig-scheduling/3521-pod-scheduling-readiness/README.md) |
| 197 | +--> |
| 198 | +## 附加资源 |
| 199 | + |
| 200 | +- Kubernetes 文档中的 [Pod 调度就绪态](/zh-cn/docs/concepts/scheduling-eviction/pod-scheduling-readiness/) |
| 201 | +- [Kubernetes 增强提案](https://github.com/kubernetes/enhancements/blob/master/keps/sig-scheduling/3521-pod-scheduling-readiness/README.md) |
0 commit comments