|
| 1 | +--- |
| 2 | +title: 确保 PodDisruptionBudget 的前提下安全地清空一个节点 |
| 3 | +content_type: task |
| 4 | +--- |
| 5 | +<!-- |
| 6 | +reviewers: |
| 7 | +- davidopp |
| 8 | +- mml |
| 9 | +- foxish |
| 10 | +- kow3ns |
| 11 | +title: Safely Drain a Node while Respecting the PodDisruptionBudget |
| 12 | +content_type: task |
| 13 | +--> |
| 14 | + |
| 15 | +<!-- overview --> |
| 16 | +<!-- |
| 17 | +This page shows how to safely drain a node, respecting the PodDisruptionBudget you have defined. |
| 18 | + --> |
| 19 | +本页展示了如何在确保 PodDisruptionBudget 的前提下,安全地清空一个节点。 |
| 20 | + |
| 21 | +## {{% heading "prerequisites" %}} |
| 22 | + |
| 23 | +<!-- |
| 24 | +This task assumes that you have met the following prerequisites: |
| 25 | +
|
| 26 | +* You are using Kubernetes release >= 1.5. |
| 27 | +* Either: |
| 28 | + 1. You do not require your applications to be highly available during the |
| 29 | + node drain, or |
| 30 | + 1. You have read about the [PodDisruptionBudget concept](/docs/concepts/workloads/pods/disruptions/) |
| 31 | + and [Configured PodDisruptionBudgets](/docs/tasks/run-application/configure-pdb/) for |
| 32 | + applications that need them. |
| 33 | +--> |
| 34 | +此任务假设您已经满足以下先决条件: |
| 35 | + |
| 36 | +* 使用的 Kubernetes 版本 >= 1.5。 |
| 37 | +* 以下两项,具备其一: |
| 38 | + 1. 在节点清空期间,不要求应用程序具有高可用性 |
| 39 | + 1. 你已经了解了 [PodDisruptionBudget 的概念](/zh/docs/concepts/workloads/pods/disruptions/),并为需要它的应用程序[配置了 PodDisruptionBudget](/zh/docs/tasks/run-application/configure-pdb/)。 |
| 40 | + |
| 41 | +<!-- steps --> |
| 42 | + |
| 43 | +<!-- |
| 44 | +## Use `kubectl drain` to remove a node from service |
| 45 | +
|
| 46 | +You can use `kubectl drain` to safely evict all of your pods from a |
| 47 | +node before you perform maintenance on the node (e.g. kernel upgrade, |
| 48 | +hardware maintenance, etc.). Safe evictions allow the pod's containers |
| 49 | +to [gracefully terminate](/docs/concepts/workloads/pods/pod-lifecycle/#pod-termination) |
| 50 | +and will respect the `PodDisruptionBudgets` you have specified. |
| 51 | +--> |
| 52 | +## 使用 `kubectl drain` 从服务中删除一个节点 {#use-kubectl-drain-to-remove-a-node-from-service} |
| 53 | + |
| 54 | +在对节点执行维护(例如内核升级、硬件维护等)之前, |
| 55 | +可以使用 `kubectl drain` 从节点安全地逐出所有 Pods。 |
| 56 | +安全的驱逐过程允许 Pod 的容器 |
| 57 | +[体面地终止](/zh/docs/concepts/workloads/pods/pod-lifecycle/#pod-termination), |
| 58 | +并确保满足指定的 `PodDisruptionBudgets` 。 |
| 59 | + |
| 60 | +<!-- |
| 61 | +By default `kubectl drain` will ignore certain system pods on the node |
| 62 | +that cannot be killed; see |
| 63 | +the [kubectl drain](/docs/reference/generated/kubectl/kubectl-commands/#drain) |
| 64 | +documentation for more details. |
| 65 | +--> |
| 66 | +{{< note >}} |
| 67 | +默认情况下, `kubectl drain` 将忽略节点上不能杀死的特定系统 Pod; |
| 68 | +有关更多细节,请参阅 |
| 69 | +[kubectl drain](/docs/reference/generated/kubectl/kubectl-commands/#drain) 文档。 |
| 70 | +{{< /note >}} |
| 71 | + |
| 72 | +<!-- |
| 73 | +When `kubectl drain` returns successfully, that indicates that all of |
| 74 | +the pods (except the ones excluded as described in the previous paragraph) |
| 75 | +have been safely evicted (respecting the desired graceful termination period, |
| 76 | +and respecting the PodDisruptionBudget you have defined). It is then safe to |
| 77 | +bring down the node by powering down its physical machine or, if running on a |
| 78 | +cloud platform, deleting its virtual machine. |
| 79 | +
|
| 80 | +First, identify the name of the node you wish to drain. You can list all of the nodes in your cluster with |
| 81 | +--> |
| 82 | +`kubectl drain` 的成功返回,表明所有的 Pods(除了上一段中描述的被排除的那些), |
| 83 | +已经被安全地逐出(考虑到期望的终止宽限期和你定义的 PodDisruptionBudget)。 |
| 84 | +然后就可以安全地关闭节点, |
| 85 | +比如关闭物理机器的电源,如果它运行在云平台上,则删除它的虚拟机。 |
| 86 | + |
| 87 | +```shell |
| 88 | +kubectl get nodes |
| 89 | +``` |
| 90 | + |
| 91 | +<!-- |
| 92 | +Next, tell Kubernetes to drain the node: |
| 93 | +--> |
| 94 | +接下来,告诉 Kubernetes 清空节点: |
| 95 | + |
| 96 | +```shell |
| 97 | +kubectl drain <node name> |
| 98 | +``` |
| 99 | + |
| 100 | +<!-- |
| 101 | +Once it returns (without giving an error), you can power down the node |
| 102 | +(or equivalently, if on a cloud platform, delete the virtual machine backing the node). |
| 103 | +If you leave the node in the cluster during the maintenance operation, you need to run |
| 104 | +--> |
| 105 | +一旦它返回(没有报错), |
| 106 | +你就可以下电此节点(或者等价地,如果在云平台上,删除支持该节点的虚拟机)。 |
| 107 | +如果要在维护操作期间将节点留在集群中,则需要运行: |
| 108 | + |
| 109 | +```shell |
| 110 | +kubectl uncordon <node name> |
| 111 | +``` |
| 112 | +<!-- |
| 113 | +afterwards to tell Kubernetes that it can resume scheduling new pods onto the node. |
| 114 | +--> |
| 115 | +然后告诉 Kubernetes,它可以继续在此节点上调度新的 Pods。 |
| 116 | + |
| 117 | +<!-- |
| 118 | +## Draining multiple nodes in parallel |
| 119 | +
|
| 120 | +The `kubectl drain` command should only be issued to a single node at a |
| 121 | +time. However, you can run multiple `kubectl drain` commands for |
| 122 | +different nodes in parallel, in different terminals or in the |
| 123 | +background. Multiple drain commands running concurrently will still |
| 124 | +respect the `PodDisruptionBudget` you specify. |
| 125 | +--> |
| 126 | +## 并行清空多个节点 {#draining-multiple-nodes-in-parallel} |
| 127 | + |
| 128 | + `kubectl drain` 命令一次只能发送给一个节点。 |
| 129 | + 但是,你可以在不同的终端或后台为不同的节点并行地运行多个 `kubectl drain` 命令。 |
| 130 | + 同时运行的多个 drain 命令仍然遵循你指定的 `PodDisruptionBudget` 。 |
| 131 | + |
| 132 | +<!-- |
| 133 | +For example, if you have a StatefulSet with three replicas and have |
| 134 | +set a `PodDisruptionBudget` for that set specifying `minAvailable: |
| 135 | +2`. `kubectl drain` will only evict a pod from the StatefulSet if all |
| 136 | +three pods are ready, and if you issue multiple drain commands in |
| 137 | +parallel, Kubernetes will respect the PodDisruptionBudget and ensure |
| 138 | +that only one pod is unavailable at any given time. Any drains that |
| 139 | +would cause the number of ready replicas to fall below the specified |
| 140 | +budget are blocked. |
| 141 | +--> |
| 142 | +例如,如果你有一个三副本的 StatefulSet, |
| 143 | +并设置了一个 `PodDisruptionBudget`,指定 `minAvailable: 2`。 |
| 144 | +如果所有的三个 Pod 均就绪,并且你并行地发出多个 drain 命令, |
| 145 | +那么 `kubectl drain` 只会从 StatefulSet 中逐出一个 Pod, |
| 146 | +因为 Kubernetes 会遵守 PodDisruptionBudget 并确保在任何时候只有一个 Pod 不可用。 |
| 147 | +任何会导致就绪副本数量低于指定预算的清空操作都将被阻止。 |
| 148 | + |
| 149 | +<!-- |
| 150 | +## The Eviction API |
| 151 | +
|
| 152 | +If you prefer not to use [kubectl drain](/docs/reference/generated/kubectl/kubectl-commands/#drain) (such as |
| 153 | +to avoid calling to an external command, or to get finer control over the pod |
| 154 | +eviction process), you can also programmatically cause evictions using the eviction API. |
| 155 | +--> |
| 156 | +## 驱逐 API {#the-eviction-api} |
| 157 | +如果你不喜欢使用 |
| 158 | +[kubectl drain](/zh/docs/reference/generated/kubectl/kubectl-commands/#drain) |
| 159 | +(比如避免调用外部命令,或者更细化地控制 pod 驱逐过程), |
| 160 | +你也可以用驱逐 API 通过编程的方式达到驱逐的效果。 |
| 161 | + |
| 162 | +<!-- |
| 163 | +You should first be familiar with using [Kubernetes language clients](/docs/tasks/administer-cluster/access-cluster-api/#programmatic-access-to-the-api). |
| 164 | +
|
| 165 | +The eviction subresource of a |
| 166 | +pod can be thought of as a kind of policy-controlled DELETE operation on the pod |
| 167 | +itself. To attempt an eviction (perhaps more REST-precisely, to attempt to |
| 168 | +*create* an eviction), you POST an attempted operation. Here's an example: |
| 169 | +--> |
| 170 | +首先应该熟悉使用 |
| 171 | +[Kubernetes 语言客户端](/zh/docs/tasks/administer-cluster/access-cluster-api/#programmatic-access-to-the-api)。 |
| 172 | + |
| 173 | +Pod 的 Eviction 子资源可以看作是一种策略控制的 DELETE 操作,作用于 Pod 本身。 |
| 174 | +要尝试驱逐(更准确地说,尝试 *创建* 一个 Eviction),需要用 POST 发出所尝试的操作。这里有一个例子: |
| 175 | + |
| 176 | +```json |
| 177 | +{ |
| 178 | + "apiVersion": "policy/v1beta1", |
| 179 | + "kind": "Eviction", |
| 180 | + "metadata": { |
| 181 | + "name": "quux", |
| 182 | + "namespace": "default" |
| 183 | + } |
| 184 | +} |
| 185 | +``` |
| 186 | + |
| 187 | +<!-- |
| 188 | +You can attempt an eviction using `curl`: |
| 189 | +--> |
| 190 | +你可以使用 `curl` 尝试驱逐: |
| 191 | + |
| 192 | +```bash |
| 193 | +curl -v -H 'Content-type: application/json' http://127.0.0.1:8080/api/v1/namespaces/default/pods/quux/eviction -d @eviction.json |
| 194 | +``` |
| 195 | + |
| 196 | +<!-- |
| 197 | +The API can respond in one of three ways: |
| 198 | +
|
| 199 | +- If the eviction is granted, then the pod is deleted just as if you had sent |
| 200 | + a `DELETE` request to the pod's URL and you get back `200 OK`. |
| 201 | +- If the current state of affairs wouldn't allow an eviction by the rules set |
| 202 | + forth in the budget, you get back `429 Too Many Requests`. This is |
| 203 | + typically used for generic rate limiting of *any* requests, but here we mean |
| 204 | + that this request isn't allowed *right now* but it may be allowed later. |
| 205 | + Currently, callers do not get any `Retry-After` advice, but they may in |
| 206 | + future versions. |
| 207 | +- If there is some kind of misconfiguration, like multiple budgets pointing at |
| 208 | + the same pod, you will get `500 Internal Server Error`. |
| 209 | +--> |
| 210 | +API可以通过以下三种方式之一进行响应: |
| 211 | + |
| 212 | +- 如果驱逐被授权,那么 Pod 将被删掉,并且你会收到 `200 OK`, |
| 213 | + 就像你向 Pod 的 URL 发送了 `DELETE` 请求一样。 |
| 214 | +- 如果按照预算中规定,目前的情况不允许的驱逐,你会收到 `429 Too Many Requests`。 |
| 215 | + 这通常用于对 *一些* 请求进行通用速率限制, |
| 216 | + 但这里我们的意思是:此请求 *现在* 不允许,但以后可能会允许。 |
| 217 | + 目前,调用者不会得到任何 `Retry-After` 的提示,但在将来的版本中可能会得到。 |
| 218 | +- 如果有一些错误的配置,比如多个预算指向同一个 Pod,你将得到 `500 Internal Server Error`。 |
| 219 | + |
| 220 | +<!-- |
| 221 | +For a given eviction request, there are two cases: |
| 222 | +
|
| 223 | +- There is no budget that matches this pod. In this case, the server always |
| 224 | + returns `200 OK`. |
| 225 | +- There is at least one budget. In this case, any of the three above responses may |
| 226 | + apply. |
| 227 | +--> |
| 228 | +对于一个给定的驱逐请求,有两种情况: |
| 229 | + |
| 230 | +- 没有匹配这个 Pod 的预算。这种情况,服务器总是返回 `200 OK`。 |
| 231 | +- 至少匹配一个预算。在这种情况下,上述三种回答中的任何一种都可能适用。 |
| 232 | + |
| 233 | +<!-- |
| 234 | +In some cases, an application may reach a broken state where it will never return anything |
| 235 | +other than 429 or 500. This can happen, for example, if the replacement pod created by the |
| 236 | +application's controller does not become ready, or if the last pod evicted has a very long |
| 237 | +termination grace period. |
| 238 | +
|
| 239 | +In this case, there are two potential solutions: |
| 240 | +
|
| 241 | +- Abort or pause the automated operation. Investigate the reason for the stuck application, and restart the automation. |
| 242 | +- After a suitably long wait, `DELETE` the pod instead of using the eviction API. |
| 243 | +
|
| 244 | +Kubernetes does not specify what the behavior should be in this case; it is up to the |
| 245 | +application owners and cluster owners to establish an agreement on behavior in these cases. |
| 246 | +--> |
| 247 | +在某些情况下,应用程序可能会到达一个中断状态,除了 429 或 500 之外,它将永远不会返回任何内容。 |
| 248 | +例如应用程序控制器创建的替换 Pod 没有准备好,或者被驱逐的最后一个 Pod 有很长的终止宽限期,就会发生这种情况。 |
| 249 | + |
| 250 | +在这种情况下,有两种可能的解决方案: |
| 251 | + |
| 252 | +- 中止或暂停自动操作。调查应用程序卡住的原因,并重新启动自动化。 |
| 253 | +- 经过适当的长时间等待后, `DELETE` Pod,而不是使用驱逐 API。 |
| 254 | + |
| 255 | +Kubernetes 并没有具体说明在这种情况下应该采取什么行为; |
| 256 | +这应该由应用程序所有者和集群所有者紧密沟通,并达成对行动一致意见。 |
| 257 | + |
| 258 | +## {{% heading "whatsnext" %}} |
| 259 | + |
| 260 | + |
| 261 | +<!-- |
| 262 | +* Follow steps to protect your application by [configuring a Pod Disruption Budget](/docs/tasks/run-application/configure-pdb/). |
| 263 | +* Learn more about [maintenance on a node](/docs/tasks/administer-cluster/cluster-management/#maintenance-on-a-node). |
| 264 | +--> |
| 265 | +* 跟随以下步骤保护应用程序:[配置 Pod 中断预算](/zh/docs/tasks/run-application/configure-pdb/)。 |
| 266 | +* 进一步了解[节点维护](/zh/docs/tasks/administer-cluster/cluster-management/#maintenance-on-a-node)。 |
| 267 | + |
| 268 | + |
0 commit comments