1
1
---
2
2
title : Parallel Processing using Expansions
3
- content_template : templates/concept
3
+ content_template : templates/task
4
4
min-kubernetes-server-version : v1.8
5
5
weight : 20
6
6
---
7
7
8
8
{{% capture overview %}}
9
9
10
- In this example, we will run multiple Kubernetes Jobs created from
11
- a common template. You may want to be familiar with the basic,
12
- non- parallel, use of [ Jobs ] ( /docs/concepts/workloads/controllers/jobs-run-to-completion/ ) first .
10
+ This task demonstrates running multiple {{< glossary_tooltip text=" Jobs" term_id="job" >}}
11
+ based on a common template. You can use this approach to process batches of work in
12
+ parallel.
13
13
14
+ For this example there are only three items: _ apple_ , _ banana_ , and _ cherry_ .
15
+ The sample Jobs process each item simply by printing a string then pausing.
16
+
17
+ See [ using Jobs in real workloads] ( #using-jobs-in-real-workloads ) to learn about how
18
+ this pattern fits more realistic use cases.
14
19
{{% /capture %}}
15
20
21
+ {{% capture prerequisites %}}
16
22
17
- {{% capture body %}}
23
+ You should be familiar with the basic,
24
+ non-parallel, use of [ Job] ( /docs/concepts/jobs/run-to-completion-finite-workloads/ ) .
18
25
19
- ## Basic Template Expansion
26
+ {{< include "task-tutorial-prereqs.md" >}}
20
27
21
- First, download the following template of a job to a file called ` job-tmpl.yaml `
28
+ For basic templating you need the command-line utility ` sed ` .
22
29
23
- {{< codenew file="application/job/job-tmpl.yaml" >}}
30
+ To follow the advanced templating example, you need a working installation of
31
+ [ Python] ( https://www.python.org/ ) , and the Jinja2 template
32
+ library for Python.
33
+
34
+ Once you have Python set up, you can install Jinja2 by running:
35
+ ``` shell
36
+ pip install --user jinja2
37
+ ```
38
+ {{% /capture %}}
24
39
25
- Unlike a * pod template* , our * job template* is not a Kubernetes API type. It is just
26
- a yaml representation of a Job object that has some placeholders that need to be filled
27
- in before it can be used. The ` $ITEM ` syntax is not meaningful to Kubernetes.
28
40
29
- In this example, the only processing the container does is to ` echo ` a string and sleep for a bit.
30
- In a real use case, the processing would be some substantial computation, such as rendering a frame
31
- of a movie, or processing a range of rows in a database. The ` $ITEM ` parameter would specify for
32
- example, the frame number or the row range.
41
+ {{% capture steps %}}
33
42
34
- This Job and its Pod template have a label: ` jobgroup=jobexample ` . There is nothing special
35
- to the system about this label. This label
36
- makes it convenient to operate on all the jobs in this group at once.
37
- We also put the same label on the pod template so that we can check on all Pods of these Jobs
38
- with a single command.
39
- After the job is created, the system will add more labels that distinguish one Job's pods
40
- from another Job's pods.
41
- Note that the label key ` jobgroup ` is not special to Kubernetes. You can pick your own label scheme.
43
+ ## Create Jobs based on a template
42
44
43
- Next, expand the template into multiple files, one for each item to be processed.
45
+ First, download the following template of a Job to a file called ` job-tmpl.yaml ` .
46
+ Here's what you'll download:
47
+
48
+ {{< codenew file="application/job/job-tmpl.yaml" >}}
44
49
45
50
``` shell
46
- # Download job-templ .yaml
51
+ # Use curl to download job-tmpl .yaml
47
52
curl -L -s -O https://k8s.io/examples/application/job/job-tmpl.yaml
53
+ ```
54
+
55
+ The file you downloaded is not yet a valid Kubernetes
56
+ {{< glossary_tooltip text="manifest" term_id="manifest" >}}.
57
+ Instead that template is a YAML representation of a Job object with some placeholders
58
+ that need to be filled in before it can be used. The ` $ITEM ` syntax is not meaningful to Kubernetes.
59
+
48
60
49
- # Expand files into a temporary directory
61
+ ### Create manifests from the template
62
+
63
+ The following shell snippet uses ` sed ` to replace the string ` $ITEM ` with the loop
64
+ variable, writing into a temporary directory named ` jobs ` . Run this now:
65
+
66
+ ``` shell
67
+ # Expand the template into multiple files, one for each item to be processed.
50
68
mkdir ./jobs
51
69
for i in apple banana cherry
52
70
do
@@ -68,11 +86,12 @@ job-banana.yaml
68
86
job-cherry.yaml
69
87
```
70
88
71
- Here, we used ` sed ` to replace the string ` $ITEM ` with the loop variable.
72
- You could use any type of template language (jinja2, erb) or write a program
73
- to generate the Job objects.
89
+ You could use any type of template language (for example: Jinja2; ERB), or
90
+ write a program to generate the Job manifests.
74
91
75
- Next, create all the jobs with one kubectl command:
92
+ ### Create Jobs from the manifests
93
+
94
+ Next, create all the Jobs with one kubectl command:
76
95
77
96
``` shell
78
97
kubectl create -f ./jobs
@@ -96,22 +115,23 @@ The output is similar to this:
96
115
97
116
```
98
117
NAME COMPLETIONS DURATION AGE
99
- process-item-apple 1/1 14s 20s
100
- process-item-banana 1/1 12s 20s
118
+ process-item-apple 1/1 14s 22s
119
+ process-item-banana 1/1 12s 21s
101
120
process-item-cherry 1/1 12s 20s
102
121
```
103
122
104
- Here we use the ` -l ` option to select all jobs that are part of this
105
- group of jobs. (There might be other unrelated jobs in the system that we
106
- do not care to see.)
123
+ Using the ` -l ` option to kubectl selects only the Jobs that are part
124
+ of this group of jobs (there might be other unrelated jobs in the system).
125
+
126
+ You can check on the Pods as well using the same
127
+ {{< glossary_tooltip text="label selector" term_id="selector" >}}:
107
128
108
- We can check on the pods as well using the same label selector:
109
129
110
130
``` shell
111
131
kubectl get pods -l jobgroup=jobexample
112
132
```
113
133
114
- The output is similar to this :
134
+ The output is similar to:
115
135
116
136
```
117
137
NAME READY STATUS RESTARTS AGE
@@ -126,34 +146,48 @@ We can use this single command to check on the output of all jobs at once:
126
146
kubectl logs -f -l jobgroup=jobexample
127
147
```
128
148
129
- The output is :
149
+ The output should be :
130
150
131
151
```
132
152
Processing item apple
133
153
Processing item banana
134
154
Processing item cherry
135
155
```
136
156
137
- ## Multiple Template Parameters
157
+ ### Clean up {#cleanup-1}
158
+
159
+ ``` shell
160
+ # Remove the Jobs you created
161
+ # Your cluster automatically cleans up their Pods
162
+ kubectl delete job -l jobgroup=jobexample
163
+ ```
164
+
165
+ ## Use advanced template parameters
166
+
167
+ In the [ first example] ( #create-jobs-based-on-a-template ) , each instance of the template had one
168
+ parameter, and that parameter was also used in the Job's name. However,
169
+ [ names] ( /docs/concepts/overview/working-with-objects/names/#names ) are restricted
170
+ to contain only certain characters.
138
171
139
- In the first example, each instance of the template had one parameter, and that parameter was also
140
- used as a label. However label keys are limited in [ what characters they can
141
- contain ] ( /docs/concepts/overview/working-with-objects/labels/#syntax- and-character-set ) .
172
+ This slightly more complex example uses the
173
+ [ Jinja template language ] ( https://palletsprojects.com/p/jinja/ ) to generate manifests
174
+ and then objects from those manifests, with a multiple parameters for each Job .
142
175
143
- This slightly more complex example uses the jinja2 template language to generate our objects.
144
- We will use a one-line python script to convert the template to a file .
176
+ For this part of the task, you are going to use a one-line Python script to
177
+ convert the template to a set of manifests .
145
178
146
179
First, copy and paste the following template of a Job object, into a file called ` job.yaml.jinja2 ` :
147
180
148
181
149
182
``` liquid
150
- {%- set params = [{ "name": "apple", "url": "https ://www.orangepippin.com/varieties/apples ", },
151
- { "name": "banana", "url": "https ://en.wikipedia. org/wiki /Banana", },
152
- { "name": "raspberry ", "url": "https ://www.raspberrypi. org/" }]
183
+ {%- set params = [{ "name": "apple", "url": "http ://dbpedia.org/resource/Apple ", },
184
+ { "name": "banana", "url": "http ://dbpedia. org/resource /Banana", },
185
+ { "name": "cherry ", "url": "http ://dbpedia. org/resource/Cherry " }]
153
186
%}
154
187
{%- for p in params %}
155
188
{%- set name = p["name"] %}
156
189
{%- set url = p["url"] %}
190
+ ---
157
191
apiVersion: batch/v1
158
192
kind: Job
159
193
metadata:
@@ -172,51 +206,108 @@ spec:
172
206
image: busybox
173
207
command: ["sh", "-c", "echo Processing URL {{ url }} && sleep 5"]
174
208
restartPolicy: Never
175
- ---
176
209
{%- endfor %}
177
-
178
210
```
179
211
180
- The above template defines parameters for each job object using a list of
181
- python dicts (lines 1-4). Then a for loop emits one job yaml object
182
- for each set of parameters (remaining lines).
183
- We take advantage of the fact that multiple yaml documents can be concatenated
184
- with the ` --- ` separator (second to last line).
185
- .) We can pipe the output directly to kubectl to
186
- create the objects.
212
+ The above template defines two parameters for each Job object using a list of
213
+ python dicts (lines 1-4). A ` for ` loop emits one Job manifest for each
214
+ set of parameters (remaining lines).
187
215
188
- You will need the jinja2 package if you do not already have it: ` pip install --user jinja2 ` .
189
- Now, use this one-line python program to expand the template:
216
+ This example relies on a feature of YAML. One YAML file can contain multiple
217
+ documents (Kubernetes manifests, in this case), separated by ` --- ` on a line
218
+ by itself.
219
+ You can pipe the output directly to ` kubectl ` to create the Jobs.
220
+
221
+ Next, use this one-line Python program to expand the template:
190
222
191
223
``` shell
192
224
alias render_template=' python -c "from jinja2 import Template; import sys; print(Template(sys.stdin.read()).render());"'
193
225
```
194
226
195
-
196
-
197
- The output can be saved to a file, like this:
227
+ Use ` render_template ` to convert the parameters and template into a single
228
+ YAML file containing Kubernetes manifests:
198
229
199
230
``` shell
231
+ # This requires the alias you defined earlier
200
232
cat job.yaml.jinja2 | render_template > jobs.yaml
201
233
```
202
234
203
- Or sent directly to kubectl, like this:
235
+ You can view ` jobs.yaml ` to verify that the ` render_template ` script worked
236
+ correctly.
237
+
238
+ Once you are happy that ` render_template ` is working how you intend,
239
+ you can pipe its output into ` kubectl ` :
204
240
205
241
``` shell
206
242
cat job.yaml.jinja2 | render_template | kubectl apply -f -
207
243
```
208
244
245
+ Kubernetes accepts and runs the Jobs you created.
246
+
247
+ ### Clean up {#cleanup-2}
248
+
249
+ ``` shell
250
+ # Remove the Jobs you created
251
+ # Your cluster automatically cleans up their Pods
252
+ kubectl delete job -l jobgroup=jobexample
253
+ ```
254
+
255
+ {{% /capture %}}
256
+ {{% capture discussion %}}
257
+
258
+ ## Using Jobs in real workloads
259
+
260
+ In a real use case, each Job performs some substantial computation, such as rendering a frame
261
+ of a movie, or processing a range of rows in a database. If you were rendering a movie
262
+ you would set ` $ITEM ` to the frame number. If you were processing rows from a database
263
+ table, you would set ` $ITEM ` to represent the range of database rows to process.
264
+
265
+ In the task, you ran a command to collect the output from Pods by fetching
266
+ their logs. In a real use case, each Pod for a Job writes its output to
267
+ durable storage before completing. You can use a PersistentVolume for each Job,
268
+ or an external storage service. For example, if you are rendering frames for a movie,
269
+ use HTTP to ` PUT ` the rendered frame data to a URL, using a different URL for each
270
+ frame.
271
+
272
+ ## Labels on Jobs and Pods
273
+
274
+ After you create a Job, Kubernetes automatically adds additional
275
+ {{< glossary_tooltip text="labels" term_id="label" >}} that
276
+ distinguish one Job's pods from another Job's pods.
277
+
278
+ In this example, each Job and its Pod template have a label:
279
+ ` jobgroup=jobexample ` .
280
+
281
+ Kubernetes itself pays no attention to labels named ` jobgroup ` . Setting a label
282
+ for all the Jobs you create from a template makes it convenient to operate on all
283
+ those Jobs at once.
284
+ In the [ first example] ( #create-jobs-based-on-a-template ) you used a template to
285
+ create several Jobs. The template ensures that each Pod also gets the same label, so
286
+ you can check on all Pods for these templated Jobs with a single command.
287
+
288
+ {{< note >}}
289
+ The label key ` jobgroup ` is not special or reserved.
290
+ You can pick your own labelling scheme.
291
+ There are [ recommended labels] ( /docs/concepts/overview/working-with-objects/common-labels/#labels )
292
+ that you can use if you wish.
293
+ {{< /note >}}
294
+
209
295
## Alternatives
210
296
211
- If you have a large number of job objects, you may find that:
297
+ If you plan to create a large number of Job objects, you may find that:
212
298
213
- - Even using labels, managing so many Job objects is cumbersome.
214
- - You exceed resource quota when creating all the Jobs at once,
215
- and do not want to wait to create them incrementally.
216
- - Very large numbers of jobs created at once overload the
217
- Kubernetes apiserver, controller, or scheduler.
299
+ - Even using labels, managing so many Jobs is cumbersome.
300
+ - If you create many Jobs in a batch, you might place high load
301
+ on the Kubernetes control plane. Alternatively, the Kubernetes API
302
+ server could rate limit you, temporarily rejecting your requests with a 429 status.
303
+ - You are limited by a {{< glossary_tooltip text="resource quota" term_id="resource-quota" >}}
304
+ on Jobs: the API server permanently rejects some of your requests
305
+ when you create a great deal of work in one batch.
218
306
219
- In this case, you can consider one of the
220
- other [ job patterns] ( /docs/concepts/jobs/run-to-completion-finite-workloads/#job-patterns ) .
307
+ There are other [ job patterns] ( /docs/concepts/jobs/run-to-completion-finite-workloads/#job-patterns )
308
+ that you can use to process large amounts of work without creating very many Job
309
+ objects.
221
310
311
+ You could also consider writing your own [ controller] ( /docs/concepts/architecture/controller/ )
312
+ to manage Job objects automatically.
222
313
{{% /capture %}}
0 commit comments