@@ -49,6 +49,13 @@ You should already be familiar with the basic use of [Job](/docs/concepts/worklo
49
49
50
50
{{< include "task-tutorial-prereqs.md" >}} {{< version-check >}}
51
51
52
+ <!--
53
+ Ensure that the [feature gates](/docs/reference/command-line-tools-reference/feature-gates/)
54
+ `PodDisruptionConditions` and `JobPodFailurePolicy` are both enabled in your cluster.
55
+ -->
56
+ 确保[ 特性门控] ( /zh-cn/docs/reference/command-line-tools-reference/feature-gates/ )
57
+ ` PodDisruptionConditions ` 和 ` JobPodFailurePolicy ` 在你的集群中均已启用。
58
+
52
59
<!--
53
60
## Using Pod failure policy to avoid unnecessary Pod retries
54
61
@@ -218,6 +225,180 @@ The cluster automatically cleans up the Pods.
218
225
-->
219
226
集群自动清理 Pod。
220
227
228
+ <!--
229
+ ## Using Pod failure policy to avoid unnecessary Pod retries based on custom Pod Conditions
230
+
231
+ With the following example, you can learn how to use Pod failure policy to
232
+ avoid unnecessary Pod restarts based on custom Pod Conditions.
233
+ -->
234
+ ## 基于自定义 Pod 状况使用 Pod 失效策略避免不必要的 Pod 重试 {#avoid-pod-retries-based-on-custom-conditions}
235
+
236
+ 根据以下示例,你可以学习如何基于自定义 Pod 状况使用 Pod 失效策略避免不必要的 Pod 重启。
237
+
238
+ {{< note >}}
239
+ <!--
240
+ The example below works since version 1.27 as it relies on transitioning of
241
+ deleted pods, in the `Pending` phase, to a terminal phase
242
+ (see: [Pod Phase](/docs/concepts/workloads/pods/pod-lifecycle/#pod-phase)).
243
+ -->
244
+ 以下示例自 v1.27 起开始生效,因为它依赖于将已删除的 Pod 从 ` Pending ` 阶段过渡到终止阶段
245
+ (参阅 [ Pod 阶段] ( /zh-cn/docs/concepts/workloads/pods/pod-lifecycle/#pod-phase ) )。
246
+ {{< /note >}}
247
+
248
+ <!--
249
+ 1. First, create a Job based on the config:
250
+ -->
251
+ 1 . 首先基于配置创建一个 Job:
252
+
253
+ {{< codenew file="/controllers/job-pod-failure-policy-config-issue.yaml" >}}
254
+
255
+ <!--
256
+ by running:
257
+ -->
258
+ 执行以下命令:
259
+
260
+ ``` sh
261
+ kubectl create -f job-pod-failure-policy-config-issue.yaml
262
+ ```
263
+
264
+ <!--
265
+ Note that, the image is misconfigured, as it does not exist.
266
+ -->
267
+ 请注意,镜像配置不正确,因为该镜像不存在。
268
+
269
+ <!--
270
+ 2. Inspect the status of the job's Pods by running:
271
+ -->
272
+ 2 . 通过执行以下命令检查任务 Pod 的状态:
273
+
274
+ ``` sh
275
+ kubectl get pods -l job-name=job-pod-failure-policy-config-issue -o yaml
276
+ ```
277
+
278
+ <!--
279
+ You will see output similar to this:
280
+ -->
281
+ 你将看到类似以下输出:
282
+
283
+ ``` yaml
284
+ containerStatuses :
285
+ - image : non-existing-repo/non-existing-image:example
286
+ ...
287
+ state :
288
+ waiting :
289
+ message : Back-off pulling image "non-existing-repo/non-existing-image:example"
290
+ reason : ImagePullBackOff
291
+ ...
292
+ phase : Pending
293
+ ` ` `
294
+
295
+ <!--
296
+ Note that the pod remains in the ` Pending` phase as it fails to pull the
297
+ misconfigured image. This, in principle, could be a transient issue and the
298
+ image could get pulled. However, in this case, the image does not exist so
299
+ we indicate this fact by a custom condition.
300
+ -->
301
+ 请注意,Pod 依然处于 `Pending` 阶段,因为它无法拉取错误配置的镜像。
302
+ 原则上讲这可能是一个暂时问题,镜像还是会被拉取。然而这种情况下,
303
+ 镜像不存在,因为我们通过一个自定义状况表明了这个事实。
304
+
305
+ <!--
306
+ 3. Add the custom condition. First prepare the patch by running :
307
+ -->
308
+ 3. 添加自定义状况。执行以下命令先准备补丁:
309
+
310
+ ` ` ` sh
311
+ cat <<EOF > patch.yaml
312
+ status:
313
+ conditions:
314
+ - type: ConfigIssue
315
+ status: "True"
316
+ reason: "NonExistingImage"
317
+ lastTransitionTime: "$(date -u +"%Y-%m-%dT%H:%M:%SZ")"
318
+ EOF
319
+ ` ` `
320
+
321
+ <!--
322
+ Second, select one of the pods created by the job by running :
323
+ -->
324
+ 其次,执行以下命令选择通过任务创建的其中一个 Pod:
325
+
326
+ ```
327
+ podName=$(kubectl get pods -l job-name=job-pod-failure-policy-config-issue -o jsonpath='{.items[ 0] .metadata.name}')
328
+ ```
329
+
330
+ <!--
331
+ Then, apply the patch on one of the pods by running the following command:
332
+ -->
333
+ 随后执行以下命令将补丁应用到其中一个 Pod 上:
334
+
335
+ ```sh
336
+ kubectl patch pod $podName --subresource=status --patch-file=patch.yaml
337
+ ```
338
+
339
+ <!--
340
+ If applied successfully, you will get a notification like this:
341
+ -->
342
+ 如果被成功应用,你将看到类似以下的一条通知:
343
+
344
+ ``` sh
345
+ pod/job-pod-failure-policy-config-issue-k6pvp patched
346
+ ```
347
+
348
+ <!--
349
+ 4. Delete the pod to transition it to `Failed` phase, by running the command:
350
+ -->
351
+ 4 . 执行以下命令删除此 Pod 将其过渡到 ` Failed ` 阶段:
352
+
353
+ ``` sh
354
+ kubectl delete pods/$podName
355
+ ```
356
+
357
+ <!--
358
+ 5. Inspect the status of the Job by running:
359
+ -->
360
+ 5 . 执行以下命令查验 Job 的状态:
361
+
362
+ ``` sh
363
+ kubectl get jobs -l job-name=job-pod-failure-policy-config-issue -o yaml
364
+ ```
365
+
366
+ <!--
367
+ In the Job status, see a job `Failed` condition with the field `reason`
368
+ equal `PodFailurePolicy`. Additionally, the `message` field contains a
369
+ more detailed information about the Job termination, such as:
370
+ `Pod default/job-pod-failure-policy-config-issue-k6pvp has condition ConfigIssue matching FailJob rule at index 0`.
371
+ -->
372
+ 在 Job 状态中,看到任务 ` Failed ` 状况的 ` reason ` 字段等于 ` PodFailurePolicy ` 。
373
+ 此外,` message ` 字段包含了与 Job 终止相关的更多详细信息,例如:
374
+ ` Pod default/job-pod-failure-policy-config-issue-k6pvp has condition ConfigIssue matching FailJob rule at index 0 ` 。
375
+
376
+ {{< note >}}
377
+ <!--
378
+ In a production environment, the steps 3 and 4 should be automated by a
379
+ user-provided controller.
380
+ -->
381
+ 在生产环境中,第 3 和 4 步应由用户提供的控制器进行自动化处理。
382
+ {{< /note >}}
383
+
384
+ <!--
385
+ ### Cleaning up
386
+
387
+ Delete the Job you created:
388
+ -->
389
+ ### 清理
390
+
391
+ 删除你创建的 Job:
392
+
393
+ ``` sh
394
+ kubectl delete jobs/job-pod-failure-policy-config-issue
395
+ ```
396
+
397
+ <!--
398
+ The cluster automatically cleans up the Pods.
399
+ -->
400
+ 集群自动清理 Pod。
401
+
221
402
<!--
222
403
## Alternatives
223
404
0 commit comments