Skip to content

Commit 460b65f

Browse files
authored
Merge pull request #17688 from sftim/20191120_improve_job_parallel_processing_expansion_task
Improve Job parallel processing expansion task
2 parents 78c6606 + 67e49d6 commit 460b65f

File tree

1 file changed

+161
-70
lines changed

1 file changed

+161
-70
lines changed
Lines changed: 161 additions & 70 deletions
Original file line numberDiff line numberDiff line change
@@ -1,52 +1,70 @@
11
---
22
title: Parallel Processing using Expansions
3-
content_template: templates/concept
3+
content_template: templates/task
44
min-kubernetes-server-version: v1.8
55
weight: 20
66
---
77

88
{{% capture overview %}}
99

10-
In this example, we will run multiple Kubernetes Jobs created from
11-
a common template. You may want to be familiar with the basic,
12-
non-parallel, use of [Jobs](/docs/concepts/workloads/controllers/jobs-run-to-completion/) first.
10+
This task demonstrates running multiple {{< glossary_tooltip text="Jobs" term_id="job" >}}
11+
based on a common template. You can use this approach to process batches of work in
12+
parallel.
1313

14+
For this example there are only three items: _apple_, _banana_, and _cherry_.
15+
The sample Jobs process each item simply by printing a string then pausing.
16+
17+
See [using Jobs in real workloads](#using-jobs-in-real-workloads) to learn about how
18+
this pattern fits more realistic use cases.
1419
{{% /capture %}}
1520

21+
{{% capture prerequisites %}}
1622

17-
{{% capture body %}}
23+
You should be familiar with the basic,
24+
non-parallel, use of [Job](/docs/concepts/jobs/run-to-completion-finite-workloads/).
1825

19-
## Basic Template Expansion
26+
{{< include "task-tutorial-prereqs.md" >}}
2027

21-
First, download the following template of a job to a file called `job-tmpl.yaml`
28+
For basic templating you need the command-line utility `sed`.
2229

23-
{{< codenew file="application/job/job-tmpl.yaml" >}}
30+
To follow the advanced templating example, you need a working installation of
31+
[Python](https://www.python.org/), and the Jinja2 template
32+
library for Python.
33+
34+
Once you have Python set up, you can install Jinja2 by running:
35+
```shell
36+
pip install --user jinja2
37+
```
38+
{{% /capture %}}
2439

25-
Unlike a *pod template*, our *job template* is not a Kubernetes API type. It is just
26-
a yaml representation of a Job object that has some placeholders that need to be filled
27-
in before it can be used. The `$ITEM` syntax is not meaningful to Kubernetes.
2840

29-
In this example, the only processing the container does is to `echo` a string and sleep for a bit.
30-
In a real use case, the processing would be some substantial computation, such as rendering a frame
31-
of a movie, or processing a range of rows in a database. The `$ITEM` parameter would specify for
32-
example, the frame number or the row range.
41+
{{% capture steps %}}
3342

34-
This Job and its Pod template have a label: `jobgroup=jobexample`. There is nothing special
35-
to the system about this label. This label
36-
makes it convenient to operate on all the jobs in this group at once.
37-
We also put the same label on the pod template so that we can check on all Pods of these Jobs
38-
with a single command.
39-
After the job is created, the system will add more labels that distinguish one Job's pods
40-
from another Job's pods.
41-
Note that the label key `jobgroup` is not special to Kubernetes. You can pick your own label scheme.
43+
## Create Jobs based on a template
4244

43-
Next, expand the template into multiple files, one for each item to be processed.
45+
First, download the following template of a Job to a file called `job-tmpl.yaml`.
46+
Here's what you'll download:
47+
48+
{{< codenew file="application/job/job-tmpl.yaml" >}}
4449

4550
```shell
46-
# Download job-templ.yaml
51+
# Use curl to download job-tmpl.yaml
4752
curl -L -s -O https://k8s.io/examples/application/job/job-tmpl.yaml
53+
```
54+
55+
The file you downloaded is not yet a valid Kubernetes
56+
{{< glossary_tooltip text="manifest" term_id="manifest" >}}.
57+
Instead that template is a YAML representation of a Job object with some placeholders
58+
that need to be filled in before it can be used. The `$ITEM` syntax is not meaningful to Kubernetes.
59+
4860

49-
# Expand files into a temporary directory
61+
### Create manifests from the template
62+
63+
The following shell snippet uses `sed` to replace the string `$ITEM` with the loop
64+
variable, writing into a temporary directory named `jobs`. Run this now:
65+
66+
```shell
67+
# Expand the template into multiple files, one for each item to be processed.
5068
mkdir ./jobs
5169
for i in apple banana cherry
5270
do
@@ -68,11 +86,12 @@ job-banana.yaml
6886
job-cherry.yaml
6987
```
7088

71-
Here, we used `sed` to replace the string `$ITEM` with the loop variable.
72-
You could use any type of template language (jinja2, erb) or write a program
73-
to generate the Job objects.
89+
You could use any type of template language (for example: Jinja2; ERB), or
90+
write a program to generate the Job manifests.
7491

75-
Next, create all the jobs with one kubectl command:
92+
### Create Jobs from the manifests
93+
94+
Next, create all the Jobs with one kubectl command:
7695

7796
```shell
7897
kubectl create -f ./jobs
@@ -96,22 +115,23 @@ The output is similar to this:
96115

97116
```
98117
NAME COMPLETIONS DURATION AGE
99-
process-item-apple 1/1 14s 20s
100-
process-item-banana 1/1 12s 20s
118+
process-item-apple 1/1 14s 22s
119+
process-item-banana 1/1 12s 21s
101120
process-item-cherry 1/1 12s 20s
102121
```
103122

104-
Here we use the `-l` option to select all jobs that are part of this
105-
group of jobs. (There might be other unrelated jobs in the system that we
106-
do not care to see.)
123+
Using the `-l` option to kubectl selects only the Jobs that are part
124+
of this group of jobs (there might be other unrelated jobs in the system).
125+
126+
You can check on the Pods as well using the same
127+
{{< glossary_tooltip text="label selector" term_id="selector" >}}:
107128

108-
We can check on the pods as well using the same label selector:
109129

110130
```shell
111131
kubectl get pods -l jobgroup=jobexample
112132
```
113133

114-
The output is similar to this:
134+
The output is similar to:
115135

116136
```
117137
NAME READY STATUS RESTARTS AGE
@@ -126,34 +146,48 @@ We can use this single command to check on the output of all jobs at once:
126146
kubectl logs -f -l jobgroup=jobexample
127147
```
128148

129-
The output is:
149+
The output should be:
130150

131151
```
132152
Processing item apple
133153
Processing item banana
134154
Processing item cherry
135155
```
136156

137-
## Multiple Template Parameters
157+
### Clean up {#cleanup-1}
158+
159+
```shell
160+
# Remove the Jobs you created
161+
# Your cluster automatically cleans up their Pods
162+
kubectl delete job -l jobgroup=jobexample
163+
```
164+
165+
## Use advanced template parameters
166+
167+
In the [first example](#create-jobs-based-on-a-template), each instance of the template had one
168+
parameter, and that parameter was also used in the Job's name. However,
169+
[names](/docs/concepts/overview/working-with-objects/names/#names) are restricted
170+
to contain only certain characters.
138171

139-
In the first example, each instance of the template had one parameter, and that parameter was also
140-
used as a label. However label keys are limited in [what characters they can
141-
contain](/docs/concepts/overview/working-with-objects/labels/#syntax-and-character-set).
172+
This slightly more complex example uses the
173+
[Jinja template language](https://palletsprojects.com/p/jinja/) to generate manifests
174+
and then objects from those manifests, with a multiple parameters for each Job.
142175

143-
This slightly more complex example uses the jinja2 template language to generate our objects.
144-
We will use a one-line python script to convert the template to a file.
176+
For this part of the task, you are going to use a one-line Python script to
177+
convert the template to a set of manifests.
145178

146179
First, copy and paste the following template of a Job object, into a file called `job.yaml.jinja2`:
147180

148181

149182
```liquid
150-
{%- set params = [{ "name": "apple", "url": "https://www.orangepippin.com/varieties/apples", },
151-
{ "name": "banana", "url": "https://en.wikipedia.org/wiki/Banana", },
152-
{ "name": "raspberry", "url": "https://www.raspberrypi.org/" }]
183+
{%- set params = [{ "name": "apple", "url": "http://dbpedia.org/resource/Apple", },
184+
{ "name": "banana", "url": "http://dbpedia.org/resource/Banana", },
185+
{ "name": "cherry", "url": "http://dbpedia.org/resource/Cherry" }]
153186
%}
154187
{%- for p in params %}
155188
{%- set name = p["name"] %}
156189
{%- set url = p["url"] %}
190+
---
157191
apiVersion: batch/v1
158192
kind: Job
159193
metadata:
@@ -172,51 +206,108 @@ spec:
172206
image: busybox
173207
command: ["sh", "-c", "echo Processing URL {{ url }} && sleep 5"]
174208
restartPolicy: Never
175-
---
176209
{%- endfor %}
177-
178210
```
179211

180-
The above template defines parameters for each job object using a list of
181-
python dicts (lines 1-4). Then a for loop emits one job yaml object
182-
for each set of parameters (remaining lines).
183-
We take advantage of the fact that multiple yaml documents can be concatenated
184-
with the `---` separator (second to last line).
185-
.) We can pipe the output directly to kubectl to
186-
create the objects.
212+
The above template defines two parameters for each Job object using a list of
213+
python dicts (lines 1-4). A `for` loop emits one Job manifest for each
214+
set of parameters (remaining lines).
187215

188-
You will need the jinja2 package if you do not already have it: `pip install --user jinja2`.
189-
Now, use this one-line python program to expand the template:
216+
This example relies on a feature of YAML. One YAML file can contain multiple
217+
documents (Kubernetes manifests, in this case), separated by `---` on a line
218+
by itself.
219+
You can pipe the output directly to `kubectl` to create the Jobs.
220+
221+
Next, use this one-line Python program to expand the template:
190222

191223
```shell
192224
alias render_template='python -c "from jinja2 import Template; import sys; print(Template(sys.stdin.read()).render());"'
193225
```
194226

195-
196-
197-
The output can be saved to a file, like this:
227+
Use `render_template` to convert the parameters and template into a single
228+
YAML file containing Kubernetes manifests:
198229

199230
```shell
231+
# This requires the alias you defined earlier
200232
cat job.yaml.jinja2 | render_template > jobs.yaml
201233
```
202234

203-
Or sent directly to kubectl, like this:
235+
You can view `jobs.yaml` to verify that the `render_template` script worked
236+
correctly.
237+
238+
Once you are happy that `render_template` is working how you intend,
239+
you can pipe its output into `kubectl`:
204240

205241
```shell
206242
cat job.yaml.jinja2 | render_template | kubectl apply -f -
207243
```
208244

245+
Kubernetes accepts and runs the Jobs you created.
246+
247+
### Clean up {#cleanup-2}
248+
249+
```shell
250+
# Remove the Jobs you created
251+
# Your cluster automatically cleans up their Pods
252+
kubectl delete job -l jobgroup=jobexample
253+
```
254+
255+
{{% /capture %}}
256+
{{% capture discussion %}}
257+
258+
## Using Jobs in real workloads
259+
260+
In a real use case, each Job performs some substantial computation, such as rendering a frame
261+
of a movie, or processing a range of rows in a database. If you were rendering a movie
262+
you would set `$ITEM` to the frame number. If you were processing rows from a database
263+
table, you would set `$ITEM` to represent the range of database rows to process.
264+
265+
In the task, you ran a command to collect the output from Pods by fetching
266+
their logs. In a real use case, each Pod for a Job writes its output to
267+
durable storage before completing. You can use a PersistentVolume for each Job,
268+
or an external storage service. For example, if you are rendering frames for a movie,
269+
use HTTP to `PUT` the rendered frame data to a URL, using a different URL for each
270+
frame.
271+
272+
## Labels on Jobs and Pods
273+
274+
After you create a Job, Kubernetes automatically adds additional
275+
{{< glossary_tooltip text="labels" term_id="label" >}} that
276+
distinguish one Job's pods from another Job's pods.
277+
278+
In this example, each Job and its Pod template have a label:
279+
`jobgroup=jobexample`.
280+
281+
Kubernetes itself pays no attention to labels named `jobgroup`. Setting a label
282+
for all the Jobs you create from a template makes it convenient to operate on all
283+
those Jobs at once.
284+
In the [first example](#create-jobs-based-on-a-template) you used a template to
285+
create several Jobs. The template ensures that each Pod also gets the same label, so
286+
you can check on all Pods for these templated Jobs with a single command.
287+
288+
{{< note >}}
289+
The label key `jobgroup` is not special or reserved.
290+
You can pick your own labelling scheme.
291+
There are [recommended labels](/docs/concepts/overview/working-with-objects/common-labels/#labels)
292+
that you can use if you wish.
293+
{{< /note >}}
294+
209295
## Alternatives
210296

211-
If you have a large number of job objects, you may find that:
297+
If you plan to create a large number of Job objects, you may find that:
212298

213-
- Even using labels, managing so many Job objects is cumbersome.
214-
- You exceed resource quota when creating all the Jobs at once,
215-
and do not want to wait to create them incrementally.
216-
- Very large numbers of jobs created at once overload the
217-
Kubernetes apiserver, controller, or scheduler.
299+
- Even using labels, managing so many Jobs is cumbersome.
300+
- If you create many Jobs in a batch, you might place high load
301+
on the Kubernetes control plane. Alternatively, the Kubernetes API
302+
server could rate limit you, temporarily rejecting your requests with a 429 status.
303+
- You are limited by a {{< glossary_tooltip text="resource quota" term_id="resource-quota" >}}
304+
on Jobs: the API server permanently rejects some of your requests
305+
when you create a great deal of work in one batch.
218306

219-
In this case, you can consider one of the
220-
other [job patterns](/docs/concepts/jobs/run-to-completion-finite-workloads/#job-patterns).
307+
There are other [job patterns](/docs/concepts/jobs/run-to-completion-finite-workloads/#job-patterns)
308+
that you can use to process large amounts of work without creating very many Job
309+
objects.
221310

311+
You could also consider writing your own [controller](/docs/concepts/architecture/controller/)
312+
to manage Job objects automatically.
222313
{{% /capture %}}

0 commit comments

Comments
 (0)