|
| 1 | +--- |
| 2 | +layout: blog |
| 3 | +title: "Kubernetes 1.26:PodDisruptionBudget 守护的不健康 Pod 所用的驱逐策略" |
| 4 | +date: 2023-01-06 |
| 5 | +slug: "unhealthy-pod-eviction-policy-for-pdbs" |
| 6 | +--- |
| 7 | +<!-- |
| 8 | +layout: blog |
| 9 | +title: "Kubernetes 1.26: Eviction policy for unhealthy pods guarded by PodDisruptionBudgets" |
| 10 | +date: 2023-01-06 |
| 11 | +slug: "unhealthy-pod-eviction-policy-for-pdbs" |
| 12 | +--> |
| 13 | + |
| 14 | +<!-- |
| 15 | +**Authors:** Filip Křepinský (Red Hat), Morten Torkildsen (Google), Ravi Gudimetla (Apple) |
| 16 | +--> |
| 17 | +**作者:** Filip Křepinský (Red Hat), Morten Torkildsen (Google), Ravi Gudimetla (Apple) |
| 18 | + |
| 19 | +**译者:** Michael Yao (DaoCloud) |
| 20 | + |
| 21 | +<!-- |
| 22 | +Ensuring the disruptions to your applications do not affect its availability isn't a simple |
| 23 | +task. Last month's release of Kubernetes v1.26 lets you specify an _unhealthy pod eviction policy_ |
| 24 | +for [PodDisruptionBudgets](/docs/concepts/workloads/pods/disruptions/#pod-disruption-budgets) (PDBs) |
| 25 | +to help you maintain that availability during node management operations. |
| 26 | +In this article, we will dive deeper into what modifications were introduced for PDBs to |
| 27 | +give application owners greater flexibility in managing disruptions. |
| 28 | +--> |
| 29 | +确保对应用的干扰不影响其可用性不是一个简单的任务。 |
| 30 | +上个月发布的 Kubernetes v1.26 允许针对 |
| 31 | +[PodDisruptionBudget](/zh-cn/docs/concepts/workloads/pods/disruptions/#pod-disruption-budgets) (PDB) |
| 32 | +指定**不健康 Pod 驱逐策略**,这有助于在节点执行管理操作期间保持可用性。 |
| 33 | + |
| 34 | +<!-- |
| 35 | +## What problems does this solve? |
| 36 | +
|
| 37 | +API-initiated eviction of pods respects PodDisruptionBudgets (PDBs). This means that a requested [voluntary disruption](https://kubernetes.io/docs/concepts/scheduling-eviction/#pod-disruption) |
| 38 | +via an eviction to a Pod, should not disrupt a guarded application and `.status.currentHealthy` of a PDB should not fall |
| 39 | +below `.status.desiredHealthy`. Running pods that are [Unhealthy](/docs/tasks/run-application/configure-pdb/#healthiness-of-a-pod) |
| 40 | +do not count towards the PDB status, but eviction of these is only possible in case the application |
| 41 | +is not disrupted. This helps disrupted or not yet started application to achieve availability |
| 42 | +as soon as possible without additional downtime that would be caused by evictions. |
| 43 | +--> |
| 44 | +## 这解决什么问题? {#what-problem-does-this-solve} |
| 45 | + |
| 46 | +API 发起的 Pod 驱逐尊重 PodDisruptionBudget (PDB) 约束。这意味着因驱逐 Pod |
| 47 | +而请求的[自愿干扰](/zh-cn/docs/concepts/scheduling-eviction/#pod-disruption)不应干扰守护的应用且 |
| 48 | +PDB 的 `.status.currentHealthy` 不应低于 `.status.desiredHealthy`。 |
| 49 | +如果正在运行的 Pod 状态为 [Unhealthy](/zh-cn/docs/tasks/run-application/configure-pdb/#healthiness-of-a-pod), |
| 50 | +则该 Pod 不计入 PDB 状态,只有在应用不受干扰时才可以驱逐这些 Pod。 |
| 51 | +这有助于尽可能确保受干扰或还未启动的应用的可用性,不会因驱逐造成额外的停机时间。 |
| 52 | + |
| 53 | +<!-- |
| 54 | +Unfortunately, this poses a problem for cluster administrators that would like to drain nodes |
| 55 | +without any manual interventions. Misbehaving applications with pods in `CrashLoopBackOff` |
| 56 | +state (due to a bug or misconfiguration) or pods that are simply failing to become ready |
| 57 | +make this task much harder. Any eviction request will fail due to violation of a PDB, |
| 58 | +when all pods of an application are unhealthy. Draining of a node cannot make any progress |
| 59 | +in that case. |
| 60 | +--> |
| 61 | +不幸的是,对于想要腾空节点但又不进行任何手动干预的集群管理员而言,这种机制是有问题的。 |
| 62 | +若一些应用因 Pod 处于 `CrashLoopBackOff` 状态(由于漏洞或配置错误)或 Pod 无法进入就绪状态而行为异常, |
| 63 | +会使这项任务变得更加困难。当某应用的所有 Pod 均不健康时,所有驱逐请求都会因违反 PDB 而失败。 |
| 64 | +在这种情况下,腾空节点不会有任何作用。 |
| 65 | + |
| 66 | +<!-- |
| 67 | +On the other hand there are users that depend on the existing behavior, in order to: |
| 68 | +- prevent data-loss that would be caused by deleting pods that are guarding an underlying resource or storage |
| 69 | +- achieve the best availability possible for their application |
| 70 | +--> |
| 71 | +另一方面,有些用户依赖于现有行为,以便: |
| 72 | + |
| 73 | +- 防止因删除守护基础资源或存储的 Pod 而造成数据丢失 |
| 74 | +- 让应用达到最佳可用性 |
| 75 | + |
| 76 | +<!-- |
| 77 | +Kubernetes 1.26 introduced a new experimental field to the PodDisruptionBudget API: `.spec.unhealthyPodEvictionPolicy`. |
| 78 | +When enabled, this field lets you support both of those requirements. |
| 79 | +--> |
| 80 | +Kubernetes 1.26 为 PodDisruptionBudget API 引入了新的实验性字段: |
| 81 | +`.spec.unhealthyPodEvictionPolicy`。启用此字段后,将允许你支持上述两种需求。 |
| 82 | + |
| 83 | +<!-- |
| 84 | +## How does it work? |
| 85 | +
|
| 86 | +API-initiated eviction is the process that triggers graceful pod termination. |
| 87 | +The process can be initiated either by calling the API directly, |
| 88 | +by using a `kubectl drain` command, or other actors in the cluster. |
| 89 | +During this process every pod removal is consulted with appropriate PDBs, |
| 90 | +to ensure that a sufficient number of pods is always running in the cluster. |
| 91 | +--> |
| 92 | +## 工作原理 {#how-does-it-work} |
| 93 | + |
| 94 | +API 发起的驱逐是触发 Pod 优雅终止的一个进程。 |
| 95 | +这个进程可以通过直接调用 API 发起,也能使用 `kubectl drain` 或集群中的其他主体来发起。 |
| 96 | +在这个过程中,移除每个 Pod 时将与对应的 PDB 协商,确保始终有足够数量的 Pod 在集群中运行。 |
| 97 | + |
| 98 | +<!-- |
| 99 | +The following policies allow PDB authors to have a greater control how the process deals with unhealthy pods. |
| 100 | +
|
| 101 | +There are two policies `IfHealthyBudget` and `AlwaysAllow` to choose from. |
| 102 | +
|
| 103 | +The former, `IfHealthyBudget`, follows the existing behavior to achieve the best availability |
| 104 | +that you get by default. Unhealthy pods can be disrupted only if their application |
| 105 | +has a minimum available `.status.desiredHealthy` number of pods. |
| 106 | +--> |
| 107 | +以下策略允许 PDB 作者进一步控制此进程如何处理不健康的 Pod。 |
| 108 | + |
| 109 | +有两个策略可供选择:`IfHealthyBudget` 和 `AlwaysAllow`。 |
| 110 | + |
| 111 | +前者,`IfHealthyBudget` 采用现有行为以达到你默认可获得的最佳的可用性。 |
| 112 | +不健康的 Pod 只有在其应用中可用的 Pod 个数达到 `.status.desiredHealthy` 即最小可用个数时才会被干扰。 |
| 113 | + |
| 114 | +<!-- |
| 115 | +By setting the `spec.unhealthyPodEvictionPolicy` field of your PDB to `AlwaysAllow`, |
| 116 | +you are choosing the best effort availability for your application. |
| 117 | +With this policy it is always possible to evict unhealthy pods. |
| 118 | +This will make it easier to maintain and upgrade your clusters. |
| 119 | +
|
| 120 | +We think that `AlwaysAllow` will often be a better choice, but for some critical workloads you may |
| 121 | +still prefer to protect even unhealthy Pods from node drains or other forms of API-initiated |
| 122 | +eviction. |
| 123 | +--> |
| 124 | +通过将 PDB 的 `spec.unhealthyPodEvictionPolicy` 字段设置为 `AlwaysAllow`, |
| 125 | +可以表示尽可能为应用选择最佳的可用性。采用此策略时,始终能够驱逐不健康的 Pod。 |
| 126 | +这可以简化集群的维护和升级。 |
| 127 | + |
| 128 | +我们认为 `AlwaysAllow` 通常是一个更好的选择,但是对于某些关键工作负载, |
| 129 | +你可能仍然倾向于防止不健康的 Pod 被从节点上腾空或其他形式的 API 发起的驱逐。 |
| 130 | + |
| 131 | +<!-- |
| 132 | +## How do I use it? |
| 133 | +
|
| 134 | +This is an alpha feature, which means you have to enable the `PDBUnhealthyPodEvictionPolicy` |
| 135 | +[feature gate](/docs/reference/command-line-tools-reference/feature-gates/), |
| 136 | +with the command line argument `--feature-gates=PDBUnhealthyPodEvictionPolicy=true` |
| 137 | +to the kube-apiserver. |
| 138 | +--> |
| 139 | +## 如何使用? {#how-do-i-use-it} |
| 140 | + |
| 141 | +这是一个 Alpha 特性,意味着你必须使用命令行参数 `--feature-gates=PDBUnhealthyPodEvictionPolicy=true` |
| 142 | +为 kube-apiserver 启用 `PDBUnhealthyPodEvictionPolicy` |
| 143 | +[特性门控](/zh-cn/docs/reference/command-line-tools-reference/feature-gates/)。 |
| 144 | + |
| 145 | +<!-- |
| 146 | +Here's an example. Assume that you've enabled the feature gate in your cluster, and that you |
| 147 | +already defined a Deployment that runs a plain webserver. You labelled the Pods for that |
| 148 | +Deployment with `app: nginx`. |
| 149 | +You want to limit avoidable disruption, and you know that best effort availability is |
| 150 | +sufficient for this app. |
| 151 | +You decide to allow evictions even if those webserver pods are unhealthy. |
| 152 | +You create a PDB to guard this application, with the `AlwaysAllow` policy for evicting |
| 153 | +unhealthy pods: |
| 154 | +--> |
| 155 | +以下是一个例子。假设你已在集群中启用了此特性门控且你已定义了运行普通 Web 服务器的 Deployment。 |
| 156 | +你已为 Deployment 的 Pod 打了标签 `app: nginx`。 |
| 157 | +你想要限制可避免的干扰,你知道对于此应用而言尽力而为的可用性也是足够的。 |
| 158 | +你决定即使这些 Web 服务器 Pod 不健康也允许驱逐。 |
| 159 | +你创建 PDB 守护此应用,使用 `AlwaysAllow` 策略驱逐不健康的 Pod: |
| 160 | + |
| 161 | +```yaml |
| 162 | +apiVersion: policy/v1 |
| 163 | +kind: PodDisruptionBudget |
| 164 | +metadata: |
| 165 | + name: nginx-pdb |
| 166 | +spec: |
| 167 | + selector: |
| 168 | + matchLabels: |
| 169 | + app: nginx |
| 170 | + maxUnavailable: 1 |
| 171 | + unhealthyPodEvictionPolicy: AlwaysAllow |
| 172 | +``` |
| 173 | +
|
| 174 | +<!-- |
| 175 | +## How can I learn more? |
| 176 | +
|
| 177 | +- Read the KEP: [Unhealthy Pod Eviction Policy for PDBs](https://github.com/kubernetes/enhancements/tree/master/keps/sig-apps/3017-pod-healthy-policy-for-pdb) |
| 178 | +- Read the documentation: [Unhealthy Pod Eviction Policy](/docs/tasks/run-application/configure-pdb/#unhealthy-pod-eviction-policy) for PodDisruptionBudgets |
| 179 | +- Review the Kubernetes documentation for [PodDisruptionBudgets](/docs/concepts/workloads/pods/disruptions/#pod-disruption-budgets), [draining of Nodes](/docs/tasks/administer-cluster/safely-drain-node/) and [evictions](/docs/concepts/scheduling-eviction/api-eviction/) |
| 180 | +--> |
| 181 | +## 查阅更多资料 {#how-can-i-learn-more} |
| 182 | +
|
| 183 | +- 阅读 KEP:[Unhealthy Pod Eviction Policy for PDBs](https://github.com/kubernetes/enhancements/tree/master/keps/sig-apps/3017-pod-healthy-policy-for-pdb) |
| 184 | +- 阅读针对 PodDisruptionBudget |
| 185 | + 的[不健康 Pod 驱逐策略](/zh-cn/docs/tasks/run-application/configure-pdb/#unhealthy-pod-eviction-policy)文档 |
| 186 | +- 参阅 [PodDisruptionBudget](/zh-cn/docs/concepts/workloads/pods/disruptions/#pod-disruption-budgets)、 |
| 187 | + [腾空节点](/zh-cn/docs/tasks/administer-cluster/safely-drain-node/)和[驱逐](/zh-cn/docs/concepts/scheduling-eviction/api-eviction/)等 Kubernetes 文档 |
| 188 | +
|
| 189 | +<!-- |
| 190 | +## How do I get involved? |
| 191 | +
|
| 192 | +If you have any feedback, please reach out to us in the [#sig-apps](https://kubernetes.slack.com/archives/C18NZM5K9) channel on Slack (visit https://slack.k8s.io/ for an invitation if you need one), or on the SIG Apps mailing list: [email protected] |
| 193 | +--> |
| 194 | +## 我如何参与? {#how-do-i-get-involved} |
| 195 | +
|
| 196 | +如果你有任何反馈,请通过 Slack [#sig-apps](https://kubernetes.slack.com/archives/C18NZM5K9) 频道 |
| 197 | +(如有需要,请访问 https://slack.k8s.io/ 获取邀请)或通过 SIG Apps 邮件列表 |
| 198 | +[[email protected]](https://groups.google.com/g/kubernetes-sig-apps) 联系我们。 |
0 commit comments