@@ -18,20 +18,20 @@ weight: 60
18
18
<!--
19
19
This guide is for application owners who want to build
20
20
highly available applications, and thus need to understand
21
- what types of Disruptions can happen to Pods.
21
+ what types of disruptions can happen to Pods.
22
22
-->
23
- 本指南针对的是希望构建高可用性应用程序的应用所有者 ,他们有必要了解可能发生在 Pod 上的干扰类型。
23
+ 本指南针对的是希望构建高可用性应用的应用所有者 ,他们有必要了解可能发生在 Pod 上的干扰类型。
24
24
25
25
<!--
26
- It is also for Cluster Administrators who want to perform automated
26
+ It is also for cluster administrators who want to perform automated
27
27
cluster actions, like upgrading and autoscaling clusters.
28
28
-->
29
29
文档同样适用于想要执行自动化集群操作(例如升级和自动扩展集群)的集群管理员。
30
30
31
31
<!-- body -->
32
32
33
33
<!--
34
- ## Voluntary and Involuntary Disruptions
34
+ ## Voluntary and involuntary disruptions
35
35
36
36
Pods do not disappear until someone (a person or a controller) destroys them, or
37
37
there is an unavoidable hardware or system software error.
@@ -44,7 +44,7 @@ Pod 不会消失,除非有人(用户或控制器)将其销毁,或者出
44
44
We call these unavoidable cases *involuntary disruptions* to
45
45
an application. Examples are:
46
46
-->
47
- 我们把这些不可避免的情况称为应用的* 非自愿干扰(Involuntary Disruptions)* 。例如:
47
+ 我们把这些不可避免的情况称为应用的** 非自愿干扰(Involuntary Disruptions)* * 。例如:
48
48
49
49
<!--
50
50
- a hardware failure of the physical machine backing the node
@@ -74,9 +74,9 @@ We call other cases *voluntary disruptions*. These include both
74
74
actions initiated by the application owner and those initiated by a Cluster
75
75
Administrator. Typical application owner actions include:
76
76
-->
77
- 我们称其他情况为* 自愿干扰(Voluntary Disruptions)* 。
78
- 包括由应用程序所有者发起的操作和由集群管理员发起的操作。典型的应用程序所有者的操
79
- 作包括 :
77
+ 我们称其他情况为** 自愿干扰(Voluntary Disruptions)* * 。
78
+ 包括由应用所有者发起的操作和由集群管理员发起的操作。
79
+ 典型的应用所有者的操作包括 :
80
80
81
81
<!--
82
82
- deleting the deployment or other controller that manages the pod
@@ -88,7 +88,7 @@ Administrator. Typical application owner actions include:
88
88
- 直接删除 Pod(例如,因为误操作)
89
89
90
90
<!--
91
- Cluster Administrator actions include:
91
+ Cluster administrator actions include:
92
92
93
93
- [Draining a node](/docs/tasks/administer-cluster/safely-drain-node/) for repair or upgrade.
94
94
- Draining a node from a cluster to scale the cluster down (learn about
@@ -126,7 +126,7 @@ deleting deployments or pods bypasses Pod Disruption Budgets.
126
126
{{< /caution >}}
127
127
128
128
<!--
129
- ## Dealing with Disruptions
129
+ ## Dealing with disruptions
130
130
131
131
Here are some ways to mitigate involuntary disruptions:
132
132
-->
@@ -135,7 +135,7 @@ Here are some ways to mitigate involuntary disruptions:
135
135
以下是减轻非自愿干扰的一些方法:
136
136
137
137
<!--
138
- - Ensure your pod [requests the resources](/docs/tasks/configure-pod-container/assign-cpu-ram-container ) it needs.
138
+ - Ensure your pod [requests the resources](/docs/tasks/configure-pod-container/assign-memory-resource ) it needs.
139
139
- Replicate your application if you need higher availability. (Learn about running replicated
140
140
[stateless](/docs/tasks/run-application/run-stateless-application-deployment/)
141
141
and [stateful](/docs/tasks/run-application/run-replicated-stateful-application/) applications.)
@@ -146,12 +146,12 @@ and [stateful](/docs/tasks/run-application/run-replicated-stateful-application/)
146
146
[multi-zone cluster](/docs/setup/multiple-zones).)
147
147
-->
148
148
- 确保 Pod 在请求中给出[ 所需资源] ( /zh-cn/docs/tasks/configure-pod-container/assign-memory-resource/ ) 。
149
- - 如果需要更高的可用性,请复制应用程序 。
149
+ - 如果需要更高的可用性,请复制应用 。
150
150
(了解有关运行多副本的[ 无状态] ( /zh-cn/docs/tasks/run-application/run-stateless-application-deployment/ )
151
- 和[ 有状态] ( /zh-cn/docs/tasks/run-application/run-replicated-stateful-application/ ) 应用程序的信息 。)
152
- - 为了在运行复制应用程序时获得更高的可用性 ,请跨机架(使用
151
+ 和[ 有状态] ( /zh-cn/docs/tasks/run-application/run-replicated-stateful-application/ ) 应用的信息 。)
152
+ - 为了在运行复制应用时获得更高的可用性 ,请跨机架(使用
153
153
[ 反亲和性] ( /zh-cn/docs/concepts/scheduling-eviction/assign-pod-node/#affinity-and-anti-affinity )
154
- 或跨区域(如果使用[ 多区域集群] ( /zh-cn/docs/setup/best-practices/multiple-zones/ ) )扩展应用程序 。
154
+ 或跨区域(如果使用[ 多区域集群] ( /zh-cn/docs/setup/best-practices/multiple-zones/ ) )扩展应用 。
155
155
156
156
<!--
157
157
The frequency of voluntary disruptions varies. On a basic Kubernetes cluster, there are
@@ -178,18 +178,18 @@ Kubernetes offers features to help run highly available applications at the same
178
178
time as frequent voluntary disruptions. We call this set of features
179
179
*Disruption Budgets*.
180
180
-->
181
- Kubernetes 提供特性来满足在出现频繁自愿干扰的同时运行高可用的应用程序 。我们称这些特性为
182
- * 干扰预算(Disruption Budget)* 。
181
+ Kubernetes 提供特性来满足在出现频繁自愿干扰的同时运行高可用的应用 。我们称这些特性为
182
+ ** 干扰预算(Disruption Budget)* * 。
183
183
184
184
<!--
185
185
## Pod disruption budgets
186
186
187
187
Kubernetes offers features to help you run highly available applications even when you
188
188
introduce frequent voluntary disruptions.
189
189
190
- An Application Owner can create a ` PodDisruptionBudget` object (PDB) for each application.
191
- A PDB limits the number of pods of a replicated application that are down simultaneously from
192
- voluntary disruptions. For example, a quorum-based application would
190
+ As an application owner, you can create a PodDisruptionBudget (PDB) for each application.
191
+ A PDB limits the number of Pods of a replicated application that are down simultaneously from
192
+ voluntary disruptions. For example, a quorum-based application would
193
193
like to ensure that the number of replicas running is never brought below the
194
194
number needed for a quorum. A web front end might want to
195
195
ensure that the number of replicas serving load never falls below a certain
@@ -199,18 +199,17 @@ percentage of the total.
199
199
200
200
{{< feature-state for_k8s_version="v1.21" state="stable" >}}
201
201
202
- 即使你会经常引入自愿性干扰,Kubernetes 也能够支持你运行高度可用的应用 。
202
+ 即使你会经常引入自愿性干扰,Kubernetes 提供的功能也能够支持你运行高度可用的应用 。
203
203
204
- 应用程序所有者可以为每个应用程序创建 ` PodDisruptionBudget ` 对象 (PDB)。
205
- PDB 将限制在同一时间因自愿干扰导致的复制应用程序中宕机的 pod 数量。
206
- 例如,基于票选机制的应用程序希望确保运行的副本数永远不会低于仲裁所需的数量 。
204
+ 作为一个应用的所有者,你可以为每个应用创建一个 ` PodDisruptionBudget ` (PDB)。
205
+ PDB 将限制在同一时间因自愿干扰导致的多副本应用中发生宕机的 Pod 数量。
206
+ 例如,基于票选机制的应用希望确保运行中的副本数永远不会低于票选所需的数量 。
207
207
Web 前端可能希望确保提供负载的副本数量永远不会低于总数的某个百分比。
208
208
209
209
<!--
210
210
Cluster managers and hosting providers should use tools which
211
211
respect PodDisruptionBudgets by calling the [Eviction API](/docs/tasks/administer-cluster/safely-drain-node/#eviction-api)
212
- instead of directly deleting pods or deployments. Examples are the `kubectl drain` command
213
- and the Kubernetes-on-GCE cluster upgrade script (`cluster/gce/upgrade.sh`).
212
+ instead of directly deleting pods or deployments.
214
213
-->
215
214
集群管理员和托管提供商应该使用遵循 PodDisruptionBudgets 的接口
216
215
(通过调用[ Eviction API] ( /zh-cn/docs/tasks/administer-cluster/safely-drain-node/#the-eviction-api ) ),
@@ -219,38 +218,41 @@ and the Kubernetes-on-GCE cluster upgrade script (`cluster/gce/upgrade.sh`).
219
218
<!--
220
219
For example, the `kubectl drain` subcommand lets you mark a node as going out of
221
220
service. When you run `kubectl drain`, the tool tries to evict all of the Pods on
222
- the Node you'are taking out of service. The eviction request may be temporarily rejected,
223
- and the tool periodically retries all failed requests until all pods
224
- are terminated, or until a configurable timeout is reached.
221
+ the Node you're taking out of service. The eviction request that `kubectl` submits on
222
+ your behalf may be temporarily rejected, so the tool periodically retries all failed
223
+ requests until all Pods on the target node are terminated, or until a configurable timeout is reached.
225
224
-->
226
225
例如,` kubectl drain ` 命令可以用来标记某个节点即将停止服务。
227
- 运行 ` kubectl drain ` 命令时,工具会尝试驱逐机器上的所有 Pod。
228
- ` kubectl ` 所提交的驱逐请求可能会暂时被拒绝,所以该工具会定时重试失败的请求,
229
- 直到所有的 Pod 都被终止,或者达到配置的超时时间。
226
+ 运行 ` kubectl drain ` 命令时,工具会尝试驱逐你所停服的节点上的所有 Pod。
227
+ ` kubectl ` 代表你所提交的驱逐请求可能会暂时被拒绝,
228
+ 所以该工具会周期性地重试所有失败的请求,
229
+ 直到目标节点上的所有的 Pod 都被终止,或者达到配置的超时时间。
230
230
231
231
<!--
232
232
A PDB specifies the number of replicas that an application can tolerate having, relative to how
233
233
many it is intended to have. For example, a Deployment which has a `.spec.replicas: 5` is
234
234
supposed to have 5 pods at any given time. If its PDB allows for there to be 4 at a time,
235
- then the Eviction API will allow voluntary disruption of one, but not two pods, at a time.
235
+ then the Eviction API will allow voluntary disruption of one ( but not two) pods at a time.
236
236
-->
237
- PDB 指定应用程序可以容忍的副本数量 (相当于应该有多少副本)。
237
+ PDB 指定应用可以容忍的副本数量 (相当于应该有多少副本)。
238
238
例如,具有 ` .spec.replicas: 5 ` 的 Deployment 在任何时间都应该有 5 个 Pod。
239
- 如果 PDB 允许其在某一时刻有 4 个副本,那么驱逐 API 将允许同一时刻仅有一个而不是两个 Pod 自愿干扰。
239
+ 如果 PDB 允许其在某一时刻有 4 个副本,那么驱逐 API 将允许同一时刻仅有一个(而不是两个) Pod 自愿干扰。
240
240
241
241
<!--
242
242
The group of pods that comprise the application is specified using a label selector, the same
243
243
as the one used by the application's controller (deployment, stateful-set, etc).
244
244
-->
245
- 使用标签选择器来指定构成应用程序的一组 Pod,这与应用程序的控制器 (Deployment,StatefulSet 等)
245
+ 使用标签选择器来指定构成应用的一组 Pod,这与应用的控制器 (Deployment,StatefulSet 等)
246
246
选择 Pod 的逻辑一样。
247
247
248
248
<!--
249
- The "intended" number of pods is computed from the `.spec.replicas` of the pods controller.
250
- The controller is discovered from the pods using the `.metadata.ownerReferences` of the object.
249
+ The "intended" number of pods is computed from the `.spec.replicas` of the workload resource
250
+ that is managing those pods. The control plane discovers the owning workload resource by
251
+ examining the `.metadata.ownerReferences` of the Pod.
251
252
-->
252
- Pod 控制器的 ` .spec.replicas ` 计算“预期的” Pod 数量。
253
- 根据 Pod 对象的 ` .metadata.ownerReferences ` 字段来发现控制器。
253
+ Pod 的“预期”数量由管理这些 Pod 的工作负载资源的 ` .spec.replicas ` 参数计算出来的。
254
+ 控制平面通过检查 Pod 的
255
+ ` .metadata.ownerReferences ` 来发现关联的工作负载资源。
254
256
255
257
<!--
256
258
[Involuntary disruptions](#voluntary-and-involuntary-disruptions) cannot be prevented by PDBs; however they
@@ -262,13 +264,14 @@ PDB 无法防止[非自愿干扰](#voluntary-and-involuntary-disruptions);
262
264
263
265
<!--
264
266
Pods which are deleted or unavailable due to a rolling upgrade to an application do count
265
- against the disruption budget, but controllers (like deployment and stateful-set )
266
- are not limited by PDBs when doing rolling upgrades - the handling of failures
267
- during application updates is configured in spec for the specific workload resource.
267
+ against the disruption budget, but workload resources (such as Deployment and StatefulSet )
268
+ are not limited by PDBs when doing rolling upgrades. Instead, the handling of failures
269
+ during application updates is configured in the spec for the specific workload resource.
268
270
-->
269
- 由于应用程序的滚动升级而被删除或不可用的 Pod 确实会计入干扰预算,
270
- 但是控制器(如 Deployment 和 StatefulSet)在进行滚动升级时不受 PDB
271
- 的限制。应用程序更新期间的故障处理方式是在对应的工作负载资源的 ` spec ` 中配置的。
271
+ 由于应用的滚动升级而被删除或不可用的 Pod 确实会计入干扰预算,
272
+ 但是工作负载资源(如 Deployment 和 StatefulSet)
273
+ 在进行滚动升级时不受 PDB 的限制。
274
+ 应用更新期间的故障处理方式是在对应的工作负载资源的 ` spec ` 中配置的。
272
275
273
276
<!--
274
277
When a pod is evicted using the eviction API, it is gracefully
@@ -282,14 +285,13 @@ hornoring the
282
285
中的 ` terminationGracePeriodSeconds ` 配置值。
283
286
284
287
<!--
285
- ## PDB Example
286
-
288
+ ## PodDisruptionBudget example {#pdb-example}
287
289
Consider a cluster with 3 nodes, `node-1` through `node-3`.
288
290
The cluster is running several applications. One of them has 3 replicas initially called
289
291
`pod-a`, `pod-b`, and `pod-c`. Another, unrelated pod without a PDB, called `pod-x`, is also shown.
290
292
Initially, the pods are laid out as follows:
291
293
-->
292
- ## PDB 例子 {#pdb-example}
294
+ ## PodDisruptionBudget 例子 {#pdb-example}
293
295
294
296
假设集群有 3 个节点,` node-1 ` 到 ` node-3 ` 。集群上运行了一些应用。
295
297
其中一个应用有 3 个副本,分别是 ` pod-a ` ,` pod-b ` 和 ` pod-c ` 。
@@ -316,7 +318,7 @@ This puts the cluster in this state:
316
318
-->
317
319
318
320
例如,假设集群管理员想要重启系统,升级内核版本来修复内核中的缺陷。
319
- 集群管理员首先使用 ` kubectl drain ` 命令尝试排空 ` node-1 ` 节点。
321
+ 集群管理员首先使用 ` kubectl drain ` 命令尝试腾空 ` node-1 ` 节点。
320
322
命令尝试驱逐 ` pod-a ` 和 ` pod-x ` 。操作立即就成功了。
321
323
两个 Pod 同时进入 ` terminating ` 状态。这时的集群处于下面的状态:
322
324
@@ -426,7 +428,7 @@ can happen, according to:
426
428
- the type of controller
427
429
- the cluster's resource capacity
428
430
-->
429
- - 应用程序需要多少个副本
431
+ - 应用需要多少个副本
430
432
- 优雅关闭应用实例需要多长时间
431
433
- 启动应用新实例需要多长时间
432
434
- 控制器的类型
@@ -531,7 +533,7 @@ may make sense in these scenarios:
531
533
there is natural specialization of roles
532
534
- when third-party tools or services are used to automate cluster management
533
535
-->
534
- - 当有许多应用程序团队共用一个 Kubernetes 集群,并且有自然的专业角色
536
+ - 当有许多应用团队共用一个 Kubernetes 集群,并且有自然的专业角色
535
537
- 当第三方工具或服务用于集群自动化管理
536
538
537
539
<!--
@@ -573,11 +575,11 @@ the nodes in your cluster, such as a node or system software upgrade, here are s
573
575
- 接受升级期间的停机时间。
574
576
- 故障转移到另一个完整的副本集群。
575
577
- 没有停机时间,但是对于重复的节点和人工协调成本可能是昂贵的。
576
- - 编写可容忍干扰的应用程序和使用 PDB。
578
+ - 编写可容忍干扰的应用和使用 PDB。
577
579
- 不停机。
578
580
- 最小的资源重复。
579
581
- 允许更多的集群管理自动化。
580
- - 编写可容忍干扰的应用程序是棘手的 ,但对于支持容忍自愿干扰所做的工作,和支持自动扩缩和容忍非
582
+ - 编写可容忍干扰的应用是棘手的 ,但对于支持容忍自愿干扰所做的工作,和支持自动扩缩和容忍非
581
583
自愿干扰所做工作相比,有大量的重叠
582
584
583
585
## {{% heading "whatsnext" %}}
0 commit comments