Skip to content

Commit 09d3244

Browse files
committed
[zh] add page:/zh/docs/tasks/administer-cluster/safely-drain-node/ ,fix 24097
change according to tengqm's comment
1 parent cf30636 commit 09d3244

File tree

1 file changed

+268
-0
lines changed

1 file changed

+268
-0
lines changed
Lines changed: 268 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,268 @@
1+
---
2+
title: 确保 PodDisruptionBudget 的前提下安全地清空一个节点
3+
content_type: task
4+
---
5+
<!--
6+
reviewers:
7+
- davidopp
8+
- mml
9+
- foxish
10+
- kow3ns
11+
title: Safely Drain a Node while Respecting the PodDisruptionBudget
12+
content_type: task
13+
-->
14+
15+
<!-- overview -->
16+
<!--
17+
This page shows how to safely drain a node, respecting the PodDisruptionBudget you have defined.
18+
-->
19+
本页展示了如何在确保 PodDisruptionBudget 的前提下,安全地清空一个节点。
20+
21+
## {{% heading "prerequisites" %}}
22+
23+
<!--
24+
This task assumes that you have met the following prerequisites:
25+
26+
* You are using Kubernetes release >= 1.5.
27+
* Either:
28+
1. You do not require your applications to be highly available during the
29+
node drain, or
30+
1. You have read about the [PodDisruptionBudget concept](/docs/concepts/workloads/pods/disruptions/)
31+
and [Configured PodDisruptionBudgets](/docs/tasks/run-application/configure-pdb/) for
32+
applications that need them.
33+
-->
34+
此任务假设您已经满足以下先决条件:
35+
36+
* 使用的 Kubernetes 版本 >= 1.5。
37+
* 以下两项,具备其一:
38+
1. 在节点清空期间,不要求应用程序具有高可用性
39+
1. 你已经了解了 [PodDisruptionBudget 的概念](/zh/docs/concepts/workloads/pods/disruptions/),并为需要它的应用程序[配置了 PodDisruptionBudget](/zh/docs/tasks/run-application/configure-pdb/)
40+
41+
<!-- steps -->
42+
43+
<!--
44+
## Use `kubectl drain` to remove a node from service
45+
46+
You can use `kubectl drain` to safely evict all of your pods from a
47+
node before you perform maintenance on the node (e.g. kernel upgrade,
48+
hardware maintenance, etc.). Safe evictions allow the pod's containers
49+
to [gracefully terminate](/docs/concepts/workloads/pods/pod-lifecycle/#pod-termination)
50+
and will respect the `PodDisruptionBudgets` you have specified.
51+
-->
52+
## 使用 `kubectl drain` 从服务中删除一个节点 {#use-kubectl-drain-to-remove-a-node-from-service}
53+
54+
在对节点执行维护(例如内核升级、硬件维护等)之前,
55+
可以使用 `kubectl drain` 从节点安全地逐出所有 Pods。
56+
安全的驱逐过程允许 Pod 的容器
57+
[体面地终止](/zh/docs/concepts/workloads/pods/pod-lifecycle/#pod-termination)
58+
并确保满足指定的 `PodDisruptionBudgets`
59+
60+
<!--
61+
By default `kubectl drain` will ignore certain system pods on the node
62+
that cannot be killed; see
63+
the [kubectl drain](/docs/reference/generated/kubectl/kubectl-commands/#drain)
64+
documentation for more details.
65+
-->
66+
{{< note >}}
67+
默认情况下, `kubectl drain` 将忽略节点上不能杀死的特定系统 Pod;
68+
有关更多细节,请参阅
69+
[kubectl drain](/docs/reference/generated/kubectl/kubectl-commands/#drain) 文档。
70+
{{< /note >}}
71+
72+
<!--
73+
When `kubectl drain` returns successfully, that indicates that all of
74+
the pods (except the ones excluded as described in the previous paragraph)
75+
have been safely evicted (respecting the desired graceful termination period,
76+
and respecting the PodDisruptionBudget you have defined). It is then safe to
77+
bring down the node by powering down its physical machine or, if running on a
78+
cloud platform, deleting its virtual machine.
79+
80+
First, identify the name of the node you wish to drain. You can list all of the nodes in your cluster with
81+
-->
82+
`kubectl drain` 的成功返回,表明所有的 Pods(除了上一段中描述的被排除的那些),
83+
已经被安全地逐出(考虑到期望的终止宽限期和你定义的 PodDisruptionBudget)。
84+
然后就可以安全地关闭节点,
85+
比如关闭物理机器的电源,如果它运行在云平台上,则删除它的虚拟机。
86+
87+
```shell
88+
kubectl get nodes
89+
```
90+
91+
<!--
92+
Next, tell Kubernetes to drain the node:
93+
-->
94+
接下来,告诉 Kubernetes 清空节点:
95+
96+
```shell
97+
kubectl drain <node name>
98+
```
99+
100+
<!--
101+
Once it returns (without giving an error), you can power down the node
102+
(or equivalently, if on a cloud platform, delete the virtual machine backing the node).
103+
If you leave the node in the cluster during the maintenance operation, you need to run
104+
-->
105+
一旦它返回(没有报错),
106+
你就可以下电此节点(或者等价地,如果在云平台上,删除支持该节点的虚拟机)。
107+
如果要在维护操作期间将节点留在集群中,则需要运行:
108+
109+
```shell
110+
kubectl uncordon <node name>
111+
```
112+
<!--
113+
afterwards to tell Kubernetes that it can resume scheduling new pods onto the node.
114+
-->
115+
然后告诉 Kubernetes,它可以继续在此节点上调度新的 Pods。
116+
117+
<!--
118+
## Draining multiple nodes in parallel
119+
120+
The `kubectl drain` command should only be issued to a single node at a
121+
time. However, you can run multiple `kubectl drain` commands for
122+
different nodes in parallel, in different terminals or in the
123+
background. Multiple drain commands running concurrently will still
124+
respect the `PodDisruptionBudget` you specify.
125+
-->
126+
## 并行清空多个节点 {#draining-multiple-nodes-in-parallel}
127+
128+
`kubectl drain` 命令一次只能发送给一个节点。
129+
但是,你可以在不同的终端或后台为不同的节点并行地运行多个 `kubectl drain` 命令。
130+
同时运行的多个 drain 命令仍然遵循你指定的 `PodDisruptionBudget`
131+
132+
<!--
133+
For example, if you have a StatefulSet with three replicas and have
134+
set a `PodDisruptionBudget` for that set specifying `minAvailable:
135+
2`. `kubectl drain` will only evict a pod from the StatefulSet if all
136+
three pods are ready, and if you issue multiple drain commands in
137+
parallel, Kubernetes will respect the PodDisruptionBudget and ensure
138+
that only one pod is unavailable at any given time. Any drains that
139+
would cause the number of ready replicas to fall below the specified
140+
budget are blocked.
141+
-->
142+
例如,如果你有一个三副本的 StatefulSet,
143+
并设置了一个 `PodDisruptionBudget`,指定 `minAvailable: 2`
144+
如果所有的三个 Pod 均就绪,并且你并行地发出多个 drain 命令,
145+
那么 `kubectl drain` 只会从 StatefulSet 中逐出一个 Pod,
146+
因为 Kubernetes 会遵守 PodDisruptionBudget 并确保在任何时候只有一个 Pod 不可用。
147+
任何会导致就绪副本数量低于指定预算的清空操作都将被阻止。
148+
149+
<!--
150+
## The Eviction API
151+
152+
If you prefer not to use [kubectl drain](/docs/reference/generated/kubectl/kubectl-commands/#drain) (such as
153+
to avoid calling to an external command, or to get finer control over the pod
154+
eviction process), you can also programmatically cause evictions using the eviction API.
155+
-->
156+
## 驱逐 API {#the-eviction-api}
157+
如果你不喜欢使用
158+
[kubectl drain](/zh/docs/reference/generated/kubectl/kubectl-commands/#drain)
159+
(比如避免调用外部命令,或者更细化地控制 pod 驱逐过程),
160+
你也可以用驱逐 API 通过编程的方式达到驱逐的效果。
161+
162+
<!--
163+
You should first be familiar with using [Kubernetes language clients](/docs/tasks/administer-cluster/access-cluster-api/#programmatic-access-to-the-api).
164+
165+
The eviction subresource of a
166+
pod can be thought of as a kind of policy-controlled DELETE operation on the pod
167+
itself. To attempt an eviction (perhaps more REST-precisely, to attempt to
168+
*create* an eviction), you POST an attempted operation. Here's an example:
169+
-->
170+
首先应该熟悉使用
171+
[Kubernetes 语言客户端](/zh/docs/tasks/administer-cluster/access-cluster-api/#programmatic-access-to-the-api)
172+
173+
Pod 的 Eviction 子资源可以看作是一种策略控制的 DELETE 操作,作用于 Pod 本身。
174+
要尝试驱逐(更准确地说,尝试 *创建* 一个 Eviction),需要用 POST 发出所尝试的操作。这里有一个例子:
175+
176+
```json
177+
{
178+
"apiVersion": "policy/v1beta1",
179+
"kind": "Eviction",
180+
"metadata": {
181+
"name": "quux",
182+
"namespace": "default"
183+
}
184+
}
185+
```
186+
187+
<!--
188+
You can attempt an eviction using `curl`:
189+
-->
190+
你可以使用 `curl` 尝试驱逐:
191+
192+
```bash
193+
curl -v -H 'Content-type: application/json' http://127.0.0.1:8080/api/v1/namespaces/default/pods/quux/eviction -d @eviction.json
194+
```
195+
196+
<!--
197+
The API can respond in one of three ways:
198+
199+
- If the eviction is granted, then the pod is deleted just as if you had sent
200+
a `DELETE` request to the pod's URL and you get back `200 OK`.
201+
- If the current state of affairs wouldn't allow an eviction by the rules set
202+
forth in the budget, you get back `429 Too Many Requests`. This is
203+
typically used for generic rate limiting of *any* requests, but here we mean
204+
that this request isn't allowed *right now* but it may be allowed later.
205+
Currently, callers do not get any `Retry-After` advice, but they may in
206+
future versions.
207+
- If there is some kind of misconfiguration, like multiple budgets pointing at
208+
the same pod, you will get `500 Internal Server Error`.
209+
-->
210+
API可以通过以下三种方式之一进行响应:
211+
212+
- 如果驱逐被授权,那么 Pod 将被删掉,并且你会收到 `200 OK`
213+
就像你向 Pod 的 URL 发送了 `DELETE` 请求一样。
214+
- 如果按照预算中规定,目前的情况不允许的驱逐,你会收到 `429 Too Many Requests`
215+
这通常用于对 *一些* 请求进行通用速率限制,
216+
但这里我们的意思是:此请求 *现在* 不允许,但以后可能会允许。
217+
目前,调用者不会得到任何 `Retry-After` 的提示,但在将来的版本中可能会得到。
218+
- 如果有一些错误的配置,比如多个预算指向同一个 Pod,你将得到 `500 Internal Server Error`
219+
220+
<!--
221+
For a given eviction request, there are two cases:
222+
223+
- There is no budget that matches this pod. In this case, the server always
224+
returns `200 OK`.
225+
- There is at least one budget. In this case, any of the three above responses may
226+
apply.
227+
-->
228+
对于一个给定的驱逐请求,有两种情况:
229+
230+
- 没有匹配这个 Pod 的预算。这种情况,服务器总是返回 `200 OK`
231+
- 至少匹配一个预算。在这种情况下,上述三种回答中的任何一种都可能适用。
232+
233+
<!--
234+
In some cases, an application may reach a broken state where it will never return anything
235+
other than 429 or 500. This can happen, for example, if the replacement pod created by the
236+
application's controller does not become ready, or if the last pod evicted has a very long
237+
termination grace period.
238+
239+
In this case, there are two potential solutions:
240+
241+
- Abort or pause the automated operation. Investigate the reason for the stuck application, and restart the automation.
242+
- After a suitably long wait, `DELETE` the pod instead of using the eviction API.
243+
244+
Kubernetes does not specify what the behavior should be in this case; it is up to the
245+
application owners and cluster owners to establish an agreement on behavior in these cases.
246+
-->
247+
在某些情况下,应用程序可能会到达一个中断状态,除了 429 或 500 之外,它将永远不会返回任何内容。
248+
例如应用程序控制器创建的替换 Pod 没有准备好,或者被驱逐的最后一个 Pod 有很长的终止宽限期,就会发生这种情况。
249+
250+
在这种情况下,有两种可能的解决方案:
251+
252+
- 中止或暂停自动操作。调查应用程序卡住的原因,并重新启动自动化。
253+
- 经过适当的长时间等待后, `DELETE` Pod,而不是使用驱逐 API。
254+
255+
Kubernetes 并没有具体说明在这种情况下应该采取什么行为;
256+
这应该由应用程序所有者和集群所有者紧密沟通,并达成对行动一致意见。
257+
258+
## {{% heading "whatsnext" %}}
259+
260+
261+
<!--
262+
* Follow steps to protect your application by [configuring a Pod Disruption Budget](/docs/tasks/run-application/configure-pdb/).
263+
* Learn more about [maintenance on a node](/docs/tasks/administer-cluster/cluster-management/#maintenance-on-a-node).
264+
-->
265+
* 跟随以下步骤保护应用程序:[配置 Pod 中断预算](/zh/docs/tasks/run-application/configure-pdb/)
266+
* 进一步了解[节点维护](/zh/docs/tasks/administer-cluster/cluster-management/#maintenance-on-a-node)
267+
268+

0 commit comments

Comments
 (0)