@@ -53,6 +53,7 @@ kubectl create -f job-pod-failure-policy-failjob.yaml
53
53
```
54
54
55
55
After around 30s the entire Job should be terminated. Inspect the status of the Job by running:
56
+
56
57
``` sh
57
58
kubectl get jobs -l job-name=job-pod-failure-policy-failjob -o yaml
58
59
```
@@ -68,9 +69,11 @@ of the Pod, taking at least 2 minutes.
68
69
### Clean up
69
70
70
71
Delete the Job you created:
72
+
71
73
``` sh
72
74
kubectl delete jobs/job-pod-failure-policy-failjob
73
75
```
76
+
74
77
The cluster automatically cleans up the Pods.
75
78
76
79
## Using Pod failure policy to ignore Pod disruptions
@@ -87,34 +90,37 @@ node while the Pod is running on it (within 90s since the Pod is scheduled).
87
90
88
91
1 . Create a Job based on the config:
89
92
90
- {{< codenew file="/controllers/job-pod-failure-policy-ignore.yaml" >}}
93
+ {{< codenew file="/controllers/job-pod-failure-policy-ignore.yaml" >}}
91
94
92
- by running:
95
+ by running:
93
96
94
- ``` sh
95
- kubectl create -f job-pod-failure-policy-ignore.yaml
96
- ```
97
+ ``` sh
98
+ kubectl create -f job-pod-failure-policy-ignore.yaml
99
+ ```
97
100
98
101
2 . Run this command to check the ` nodeName ` the Pod is scheduled to:
99
102
100
- ``` sh
101
- nodeName=$( kubectl get pods -l job-name=job-pod-failure-policy-ignore -o jsonpath=' {.items[0].spec.nodeName}' )
102
- ```
103
+ ``` sh
104
+ nodeName=$( kubectl get pods -l job-name=job-pod-failure-policy-ignore -o jsonpath=' {.items[0].spec.nodeName}' )
105
+ ```
103
106
104
107
3 . Drain the node to evict the Pod before it completes (within 90s):
105
- ``` sh
106
- kubectl drain nodes/$nodeName --ignore-daemonsets --grace-period=0
107
- ```
108
+
109
+ ``` sh
110
+ kubectl drain nodes/$nodeName --ignore-daemonsets --grace-period=0
111
+ ```
108
112
109
113
4 . Inspect the ` .status.failed ` to check the counter for the Job is not incremented:
110
- ``` sh
111
- kubectl get jobs -l job-name=job-pod-failure-policy-ignore -o yaml
112
- ```
114
+
115
+ ``` sh
116
+ kubectl get jobs -l job-name=job-pod-failure-policy-ignore -o yaml
117
+ ```
113
118
114
119
5 . Uncordon the node:
115
- ``` sh
116
- kubectl uncordon nodes/$nodeName
117
- ```
120
+
121
+ ``` sh
122
+ kubectl uncordon nodes/$nodeName
123
+ ```
118
124
119
125
The Job resumes and succeeds.
120
126
@@ -124,16 +130,18 @@ result in terminating the entire Job (as the `.spec.backoffLimit` is set to 0).
124
130
### Cleaning up
125
131
126
132
Delete the Job you created:
133
+
127
134
``` sh
128
135
kubectl delete jobs/job-pod-failure-policy-ignore
129
136
```
137
+
130
138
The cluster automatically cleans up the Pods.
131
139
132
140
## Alternatives
133
141
134
142
You could rely solely on the
135
143
[ Pod backoff failure policy] ( /docs/concepts/workloads/controllers/job#pod-backoff-failure-policy ) ,
136
144
by specifying the Job's ` .spec.backoffLimit ` field. However, in many situations
137
- it is problematic to find a balance between setting the a low value for ` .spec.backoffLimit `
145
+ it is problematic to find a balance between setting a low value for ` .spec.backoffLimit `
138
146
to avoid unnecessary Pod retries, yet high enough to make sure the Job would
139
147
not be terminated by Pod disruptions.
0 commit comments