Skip to content

Commit 39141bf

Browse files
authored
Merge pull request #40873 from windsonsea/podfai
[zh] sync pod-failure-policy.md
2 parents 2134bf5 + 2559d03 commit 39141bf

File tree

2 files changed

+200
-0
lines changed

2 files changed

+200
-0
lines changed

content/zh-cn/docs/tasks/job/pod-failure-policy.md

Lines changed: 181 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -49,6 +49,13 @@ You should already be familiar with the basic use of [Job](/docs/concepts/worklo
4949

5050
{{< include "task-tutorial-prereqs.md" >}} {{< version-check >}}
5151

52+
<!--
53+
Ensure that the [feature gates](/docs/reference/command-line-tools-reference/feature-gates/)
54+
`PodDisruptionConditions` and `JobPodFailurePolicy` are both enabled in your cluster.
55+
-->
56+
确保[特性门控](/zh-cn/docs/reference/command-line-tools-reference/feature-gates/)
57+
`PodDisruptionConditions``JobPodFailurePolicy` 在你的集群中均已启用。
58+
5259
<!--
5360
## Using Pod failure policy to avoid unnecessary Pod retries
5461
@@ -218,6 +225,180 @@ The cluster automatically cleans up the Pods.
218225
-->
219226
集群自动清理 Pod。
220227

228+
<!--
229+
## Using Pod failure policy to avoid unnecessary Pod retries based on custom Pod Conditions
230+
231+
With the following example, you can learn how to use Pod failure policy to
232+
avoid unnecessary Pod restarts based on custom Pod Conditions.
233+
-->
234+
## 基于自定义 Pod 状况使用 Pod 失效策略避免不必要的 Pod 重试 {#avoid-pod-retries-based-on-custom-conditions}
235+
236+
根据以下示例,你可以学习如何基于自定义 Pod 状况使用 Pod 失效策略避免不必要的 Pod 重启。
237+
238+
{{< note >}}
239+
<!--
240+
The example below works since version 1.27 as it relies on transitioning of
241+
deleted pods, in the `Pending` phase, to a terminal phase
242+
(see: [Pod Phase](/docs/concepts/workloads/pods/pod-lifecycle/#pod-phase)).
243+
-->
244+
以下示例自 v1.27 起开始生效,因为它依赖于将已删除的 Pod 从 `Pending` 阶段过渡到终止阶段
245+
(参阅 [Pod 阶段](/zh-cn/docs/concepts/workloads/pods/pod-lifecycle/#pod-phase))。
246+
{{< /note >}}
247+
248+
<!--
249+
1. First, create a Job based on the config:
250+
-->
251+
1. 首先基于配置创建一个 Job:
252+
253+
{{< codenew file="/controllers/job-pod-failure-policy-config-issue.yaml" >}}
254+
255+
<!--
256+
by running:
257+
-->
258+
执行以下命令:
259+
260+
```sh
261+
kubectl create -f job-pod-failure-policy-config-issue.yaml
262+
```
263+
264+
<!--
265+
Note that, the image is misconfigured, as it does not exist.
266+
-->
267+
请注意,镜像配置不正确,因为该镜像不存在。
268+
269+
<!--
270+
2. Inspect the status of the job's Pods by running:
271+
-->
272+
2. 通过执行以下命令检查任务 Pod 的状态:
273+
274+
```sh
275+
kubectl get pods -l job-name=job-pod-failure-policy-config-issue -o yaml
276+
```
277+
278+
<!--
279+
You will see output similar to this:
280+
-->
281+
你将看到类似以下输出:
282+
283+
```yaml
284+
containerStatuses:
285+
- image: non-existing-repo/non-existing-image:example
286+
...
287+
state:
288+
waiting:
289+
message: Back-off pulling image "non-existing-repo/non-existing-image:example"
290+
reason: ImagePullBackOff
291+
...
292+
phase: Pending
293+
```
294+
295+
<!--
296+
Note that the pod remains in the `Pending` phase as it fails to pull the
297+
misconfigured image. This, in principle, could be a transient issue and the
298+
image could get pulled. However, in this case, the image does not exist so
299+
we indicate this fact by a custom condition.
300+
-->
301+
请注意,Pod 依然处于 `Pending` 阶段,因为它无法拉取错误配置的镜像。
302+
原则上讲这可能是一个暂时问题,镜像还是会被拉取。然而这种情况下,
303+
镜像不存在,因为我们通过一个自定义状况表明了这个事实。
304+
305+
<!--
306+
3. Add the custom condition. First prepare the patch by running:
307+
-->
308+
3. 添加自定义状况。执行以下命令先准备补丁:
309+
310+
```sh
311+
cat <<EOF > patch.yaml
312+
status:
313+
conditions:
314+
- type: ConfigIssue
315+
status: "True"
316+
reason: "NonExistingImage"
317+
lastTransitionTime: "$(date -u +"%Y-%m-%dT%H:%M:%SZ")"
318+
EOF
319+
```
320+
321+
<!--
322+
Second, select one of the pods created by the job by running:
323+
-->
324+
其次,执行以下命令选择通过任务创建的其中一个 Pod:
325+
326+
```
327+
podName=$(kubectl get pods -l job-name=job-pod-failure-policy-config-issue -o jsonpath='{.items[0].metadata.name}')
328+
```
329+
330+
<!--
331+
Then, apply the patch on one of the pods by running the following command:
332+
-->
333+
随后执行以下命令将补丁应用到其中一个 Pod 上:
334+
335+
```sh
336+
kubectl patch pod $podName --subresource=status --patch-file=patch.yaml
337+
```
338+
339+
<!--
340+
If applied successfully, you will get a notification like this:
341+
-->
342+
如果被成功应用,你将看到类似以下的一条通知:
343+
344+
```sh
345+
pod/job-pod-failure-policy-config-issue-k6pvp patched
346+
```
347+
348+
<!--
349+
4. Delete the pod to transition it to `Failed` phase, by running the command:
350+
-->
351+
4. 执行以下命令删除此 Pod 将其过渡到 `Failed` 阶段:
352+
353+
```sh
354+
kubectl delete pods/$podName
355+
```
356+
357+
<!--
358+
5. Inspect the status of the Job by running:
359+
-->
360+
5. 执行以下命令查验 Job 的状态:
361+
362+
```sh
363+
kubectl get jobs -l job-name=job-pod-failure-policy-config-issue -o yaml
364+
```
365+
366+
<!--
367+
In the Job status, see a job `Failed` condition with the field `reason`
368+
equal `PodFailurePolicy`. Additionally, the `message` field contains a
369+
more detailed information about the Job termination, such as:
370+
`Pod default/job-pod-failure-policy-config-issue-k6pvp has condition ConfigIssue matching FailJob rule at index 0`.
371+
-->
372+
在 Job 状态中,看到任务 `Failed` 状况的 `reason` 字段等于 `PodFailurePolicy`
373+
此外,`message` 字段包含了与 Job 终止相关的更多详细信息,例如:
374+
`Pod default/job-pod-failure-policy-config-issue-k6pvp has condition ConfigIssue matching FailJob rule at index 0`
375+
376+
{{< note >}}
377+
<!--
378+
In a production environment, the steps 3 and 4 should be automated by a
379+
user-provided controller.
380+
-->
381+
在生产环境中,第 3 和 4 步应由用户提供的控制器进行自动化处理。
382+
{{< /note >}}
383+
384+
<!--
385+
### Cleaning up
386+
387+
Delete the Job you created:
388+
-->
389+
### 清理
390+
391+
删除你创建的 Job:
392+
393+
```sh
394+
kubectl delete jobs/job-pod-failure-policy-config-issue
395+
```
396+
397+
<!--
398+
The cluster automatically cleans up the Pods.
399+
-->
400+
集群自动清理 Pod。
401+
221402
<!--
222403
## Alternatives
223404
Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,19 @@
1+
apiVersion: batch/v1
2+
kind: Job
3+
metadata:
4+
name: job-pod-failure-policy-config-issue
5+
spec:
6+
completions: 8
7+
parallelism: 2
8+
template:
9+
spec:
10+
restartPolicy: Never
11+
containers:
12+
- name: main
13+
image: "non-existing-repo/non-existing-image:example"
14+
backoffLimit: 6
15+
podFailurePolicy:
16+
rules:
17+
- action: FailJob
18+
onPodConditions:
19+
- type: ConfigIssue

0 commit comments

Comments
 (0)