1
1
---
2
2
title : Coarse Parallel Processing Using a Work Queue
3
- min-kubernetes-server-version : v1.8
4
3
content_type : task
5
4
weight : 20
6
5
---
7
6
8
7
9
8
<!-- overview -->
10
9
11
- In this example, we will run a Kubernetes Job with multiple parallel
10
+ In this example, you will run a Kubernetes Job with multiple parallel
12
11
worker processes.
13
12
14
13
In this example, as each pod is created, it picks up one unit of work
15
14
from a task queue, completes it, deletes it from the queue, and exits.
16
15
17
16
Here is an overview of the steps in this example:
18
17
19
- 1 . ** Start a message queue service.** In this example, we use RabbitMQ, but you could use another
18
+ 1 . ** Start a message queue service.** In this example, you use RabbitMQ, but you could use another
20
19
one. In practice you would set up a message queue service once and reuse it for many jobs.
21
20
1 . ** Create a queue, and fill it with messages.** Each message represents one task to be done. In
22
21
this example, a message is an integer that we will do a lengthy computation on.
@@ -26,11 +25,16 @@ Here is an overview of the steps in this example:
26
25
## {{% heading "prerequisites" %}}
27
26
28
27
29
- Be familiar with the basic,
28
+ You should already be familiar with the basic,
30
29
non-parallel, use of [ Job] ( /docs/concepts/workloads/controllers/job/ ) .
31
30
32
31
{{< include "task-tutorial-prereqs.md" >}}
33
32
33
+ You will need a container image registry where you can upload images to run in your cluster.
34
+
35
+ This task example also assumes that you have Docker installed locally.
36
+
37
+
34
38
<!-- steps -->
35
39
36
40
## Starting a message queue service
@@ -43,21 +47,20 @@ cluster and reuse it for many jobs, as well as for long-running services.
43
47
Start RabbitMQ as follows:
44
48
45
49
``` shell
46
- kubectl create -f https://raw.githubusercontent.com/kubernetes/kubernetes/release-1.3/examples/celery-rabbitmq/rabbitmq-service.yaml
50
+ # make a Service for the StatefulSet to use
51
+ kubectl create -f https://kubernetes.io/examples/application/job/rabbitmq-service.yaml
47
52
```
48
53
```
49
54
service "rabbitmq-service" created
50
55
```
51
56
52
57
``` shell
53
- kubectl create -f https://raw.githubusercontent.com/ kubernetes/kubernetes/release-1.3 /examples/celery-rabbitmq/ rabbitmq-controller .yaml
58
+ kubectl create -f https://kubernetes.io /examples/application/job/ rabbitmq-statefulset .yaml
54
59
```
55
60
```
56
- replicationcontroller "rabbitmq-controller " created
61
+ statefulset "rabbitmq" created
57
62
```
58
63
59
- We will only use the rabbitmq part from the [ celery-rabbitmq example] ( https://github.com/kubernetes/kubernetes/tree/release-1.3/examples/celery-rabbitmq ) .
60
-
61
64
## Testing the message queue service
62
65
63
66
Now, we can experiment with accessing the message queue. We will
@@ -68,7 +71,7 @@ First create a temporary interactive Pod.
68
71
69
72
``` shell
70
73
# Create a temporary interactive container
71
- kubectl run -i --tty temp --image ubuntu:18 .04
74
+ kubectl run -i --tty temp --image ubuntu:22 .04
72
75
```
73
76
```
74
77
Waiting for pod default/temp-loe07 to be running, status is Pending, pod ready: false
@@ -77,76 +80,82 @@ Waiting for pod default/temp-loe07 to be running, status is Pending, pod ready:
77
80
78
81
Note that your pod name and command prompt will be different.
79
82
80
- Next install the ` amqp-tools ` so we can work with message queues.
83
+ Next install the ` amqp-tools ` so you can work with message queues.
84
+ The next commands show what you need to run inside the interactive shell in that Pod:
81
85
82
86
``` shell
83
- # Install some tools
84
- root@temp-loe07:/# apt-get update
85
- .... [ lots of output ] ....
86
- root@temp-loe07:/# apt-get install -y curl ca-certificates amqp-tools python dnsutils
87
- .... [ lots of output ] ....
87
+ apt-get update && apt-get install -y curl ca-certificates amqp-tools python dnsutils
88
88
```
89
89
90
- Later, we will make a docker image that includes these packages.
90
+ Later, you will make a container image that includes these packages.
91
91
92
- Next, we will check that we can discover the rabbitmq service :
92
+ Next, you will check that you can discover the Service for RabbitMQ :
93
93
94
94
```
95
+ # Run these commands inside the Pod
95
96
# Note the rabbitmq-service has a DNS name, provided by Kubernetes:
96
-
97
- root@temp-loe07:/# nslookup rabbitmq-service
97
+ nslookup rabbitmq-service
98
+ ```
99
+ ```
98
100
Server: 10.0.0.10
99
101
Address: 10.0.0.10#53
100
102
101
103
Name: rabbitmq-service.default.svc.cluster.local
102
104
Address: 10.0.147.152
103
-
104
- # Your address will vary.
105
105
```
106
+ (the IP addresses will vary)
106
107
107
- If Kube-DNS is not set up correctly, the previous step may not work for you.
108
- You can also find the service IP in an env var :
108
+ If the kube-dns addon is not set up correctly, the previous step may not work for you.
109
+ You can also find the IP address for that Service in an environment variable :
109
110
111
+ ``` shell
112
+ # run this check inside the Pod
113
+ env | grep RABBITMQ_SERVICE | grep HOST
114
+ ```
110
115
```
111
- # env | grep RABBIT | grep HOST
112
116
RABBITMQ_SERVICE_SERVICE_HOST=10.0.147.152
113
- # Your address will vary.
114
117
```
118
+ (the IP address will vary)
115
119
116
- Next we will verify we can create a queue, and publish and consume messages.
120
+ Next you will verify that you can create a queue, and publish and consume messages.
117
121
118
122
``` shell
123
+ # Run these commands inside the Pod
119
124
# In the next line, rabbitmq-service is the hostname where the rabbitmq-service
120
125
# can be reached. 5672 is the standard port for rabbitmq.
121
-
122
- root@temp-loe07:/# export BROKER_URL=amqp://guest:guest@rabbitmq-service:5672
126
+ export BROKER_URL=amqp://guest:guest@rabbitmq-service:5672
123
127
# If you could not resolve "rabbitmq-service" in the previous step,
124
128
# then use this command instead:
125
- # root@temp-loe07:/# BROKER_URL=amqp://guest:guest@$RABBITMQ_SERVICE_SERVICE_HOST:5672
129
+ BROKER_URL=amqp://guest:guest@$RABBITMQ_SERVICE_SERVICE_HOST :5672
126
130
127
131
# Now create a queue:
128
132
129
- root@temp-loe07:/# /usr/bin/amqp-declare-queue --url=$BROKER_URL -q foo -d
133
+ /usr/bin/amqp-declare-queue --url=$BROKER_URL -q foo -d
134
+ ```
135
+ ```
130
136
foo
137
+ ```
131
138
132
- # Publish one message to it :
133
-
134
- root@temp-loe07:/ # /usr/bin/amqp-publish --url=$BROKER_URL -r foo -p -b Hello
139
+ Publish one message to the queue :
140
+ ``` shell
141
+ /usr/bin/amqp-publish --url=$BROKER_URL -r foo -p -b Hello
135
142
136
143
# And get it back.
137
144
138
- root@temp-loe07:/# /usr/bin/amqp-consume --url=$BROKER_URL -q foo -c 1 cat && echo
145
+ /usr/bin/amqp-consume --url=$BROKER_URL -q foo -c 1 cat && echo 1>&2
146
+ ```
147
+ ```
139
148
Hello
140
- root@temp-loe07:/#
141
149
```
142
150
143
- In the last command, the ` amqp-consume ` tool takes one message (` -c 1 ` )
144
- from the queue, and passes that message to the standard input of an arbitrary command. In this case, the program ` cat ` prints out the characters read from standard input, and the echo adds a carriage
145
- return so the example is readable.
151
+ In the last command, the ` amqp-consume ` tool took one message (` -c 1 ` )
152
+ from the queue, and passes that message to the standard input of an arbitrary command.
153
+ In this case, the program ` cat ` prints out the characters read from standard input, and
154
+ the echo adds a carriage return so the example is readable.
146
155
147
- ## Filling the Queue with tasks
156
+ ## Fill the queue with tasks
148
157
149
- Now let's fill the queue with some " tasks" . In our example, our tasks are strings to be
158
+ Now, fill the queue with some simulated tasks. In this example, the tasks are strings to be
150
159
printed.
151
160
152
161
In a practice, the content of the messages might be:
@@ -157,33 +166,37 @@ In a practice, the content of the messages might be:
157
166
- configuration parameters to a simulation
158
167
- frame numbers of a scene to be rendered
159
168
160
- In practice, if there is large data that is needed in a read-only mode by all pods
161
- of the Job, you will typically put that in a shared file system like NFS and mount
162
- that readonly on all the pods, or the program in the pod will natively read data from
163
- a cluster file system like HDFS.
169
+ If there is large data that is needed in a read-only mode by all pods
170
+ of the Job, you typically put that in a shared file system like NFS and mount
171
+ that readonly on all the pods, or write the program in the pod so that it can natively read
172
+ data from a cluster file system (for example: HDFS) .
164
173
165
- For our example, we will create the queue and fill it using the amqp command line tools.
166
- In practice, you might write a program to fill the queue using an amqp client library.
174
+ For this example, you will create the queue and fill it using the AMQP command line tools.
175
+ In practice, you might write a program to fill the queue using an AMQP client library.
167
176
168
177
``` shell
178
+ # Run this on your computer, not in the Pod
169
179
/usr/bin/amqp-declare-queue --url=$BROKER_URL -q job1 -d
180
+ ```
181
+ ```
170
182
job1
171
183
```
184
+ Add items to the queue:
172
185
``` shell
173
186
for f in apple banana cherry date fig grape lemon melon
174
187
do
175
188
/usr/bin/amqp-publish --url=$BROKER_URL -r job1 -p -b $f
176
189
done
177
190
```
178
191
179
- So, we filled the queue with 8 messages .
192
+ You added 8 messages to the queue .
180
193
181
- ## Create an Image
194
+ ## Create a container image
182
195
183
- Now we are ready to create an image that we will run as a job .
196
+ Now you are ready to create an image that you will run as a Job .
184
197
185
- We will use the ` amqp-consume ` utility to read the message
186
- from the queue and run our actual program . Here is a very simple
198
+ The job will use the ` amqp-consume ` utility to read the message
199
+ from the queue and run the actual work . Here is a very simple
187
200
example program:
188
201
189
202
{{% code_sample language="python" file="application/job/rabbitmq/worker.py" %}}
@@ -194,9 +207,7 @@ Give the script execution permission:
194
207
chmod +x worker.py
195
208
```
196
209
197
- Now, build an image. If you are working in the source
198
- tree, then change directory to ` examples/job/work-queue-1 ` .
199
- Otherwise, make a temporary directory, change to it,
210
+ Now, build an image. Make a temporary directory, change to it,
200
211
download the [ Dockerfile] ( /examples/application/job/rabbitmq/Dockerfile ) ,
201
212
and [ worker.py] ( /examples/application/job/rabbitmq/worker.py ) . In either case,
202
213
build the image with this command:
@@ -214,33 +225,27 @@ docker tag job-wq-1 <username>/job-wq-1
214
225
docker push < username> /job-wq-1
215
226
```
216
227
217
- If you are using [ Google Container
218
- Registry] ( https://cloud.google.com/tools/container-registry/ ) , tag
219
- your app image with your project ID, and push to GCR. Replace
220
- ` <project> ` with your project ID.
221
-
222
- ``` shell
223
- docker tag job-wq-1 gcr.io/< project> /job-wq-1
224
- gcloud docker -- push gcr.io/< project> /job-wq-1
225
- ```
228
+ If you are using an alternative container image registry, tag the
229
+ image and push it there instead.
226
230
227
231
## Defining a Job
228
232
229
- Here is a job definition . You'll need to make a copy of the Job and edit the
230
- image to match the name you used, and call it ` ./job.yaml ` .
231
-
233
+ Here is a manifest for a Job . You'll need to make a copy of the Job manifest
234
+ ( call it ` ./job.yaml ` ),
235
+ and edit the name of the container image to match the name you used.
232
236
233
237
{{% code_sample file="application/job/rabbitmq/job.yaml" %}}
234
238
235
239
In this example, each pod works on one item from the queue and then exits.
236
240
So, the completion count of the Job corresponds to the number of work items
237
- done. So we set, ` .spec.completions: 8 ` for the example, since we put 8 items in the queue .
241
+ done. That is why the example manifest has ` .spec.completions ` set to ` 8 ` .
238
242
239
243
## Running the Job
240
244
241
- So, now run the Job:
245
+ Now, run the Job:
242
246
243
247
``` shell
248
+ # this assumes you downloaded and then edited the manifest already
244
249
kubectl apply -f ./job.yaml
245
250
```
246
251
@@ -264,14 +269,14 @@ Labels: controller-uid=41d75705-92df-11e7-b85e-fa163ee3c11f
264
269
Annotations: <none>
265
270
Parallelism: 2
266
271
Completions: 8
267
- Start Time: Wed, 06 Sep 2017 16:42:02 +0800
272
+ Start Time: Wed, 06 Sep 2022 16:42:02 +0000
268
273
Pods Statuses: 0 Running / 8 Succeeded / 0 Failed
269
274
Pod Template:
270
275
Labels: controller-uid=41d75705-92df-11e7-b85e-fa163ee3c11f
271
276
job-name=job-wq-1
272
277
Containers:
273
278
c:
274
- Image: gcr.io /causal-jigsaw-637/job-wq-1
279
+ Image: container-registry.example /causal-jigsaw-637/job-wq-1
275
280
Port:
276
281
Environment:
277
282
BROKER_URL: amqp://guest:guest@rabbitmq-service:5672
@@ -293,30 +298,31 @@ Events:
293
298
294
299
295
300
296
- All the pods for that Job succeeded. Yay.
297
-
301
+ All the pods for that Job succeeded! You're done.
298
302
299
303
300
304
<!-- discussion -->
301
305
302
306
## Alternatives
303
307
304
- This approach has the advantage that you
305
- do not need to modify your "worker" program to be aware that there is a work queue.
308
+ This approach has the advantage that you do not need to modify your "worker" program to be
309
+ aware that there is a work queue. You can include the worker program unmodified in your container
310
+ image.
306
311
307
- It does require that you run a message queue service.
312
+ Using this approach does require that you run a message queue service.
308
313
If running a queue service is inconvenient, you may
309
314
want to consider one of the other [ job patterns] ( /docs/concepts/workloads/controllers/job/#job-patterns ) .
310
315
311
316
This approach creates a pod for every work item. If your work items only take a few seconds,
312
317
though, creating a Pod for every work item may add a lot of overhead. Consider another
313
- [ example] ( /docs/tasks/job/fine-parallel-processing-work-queue/ ) , that executes multiple work items per Pod.
318
+ design, such as in the [ fine parallel work queue example] ( /docs/tasks/job/fine-parallel-processing-work-queue/ ) ,
319
+ that executes multiple work items per Pod.
314
320
315
- In this example, we use the ` amqp-consume ` utility to read the message
316
- from the queue and run our actual program. This has the advantage that you
321
+ In this example, you used the ` amqp-consume ` utility to read the message
322
+ from the queue and run the actual program. This has the advantage that you
317
323
do not need to modify your program to be aware of the queue.
318
- A [ different example] ( /docs/tasks/job/fine-parallel-processing-work-queue/ ) , shows how to
319
- communicate with the work queue using a client library.
324
+ The [ fine parallel work queue example] ( /docs/tasks/job/fine-parallel-processing-work-queue/ )
325
+ shows how to communicate with the work queue using a client library.
320
326
321
327
## Caveats
322
328
@@ -327,11 +333,11 @@ If the number of completions is set to more than the number of items in the queu
327
333
then the Job will not appear to be completed, even though all items in the queue
328
334
have been processed. It will start additional pods which will block waiting
329
335
for a message.
336
+ You would need to make your own mechanism to spot when there is work
337
+ to do and measure the size of the queue, setting the number of completions to match.
330
338
331
339
There is an unlikely race with this pattern. If the container is killed in between the time
332
- that the message is acknowledged by the amqp-consume command and the time that the container
340
+ that the message is acknowledged by the ` amqp-consume ` command and the time that the container
333
341
exits with success, or if the node crashes before the kubelet is able to post the success of the pod
334
- back to the api- server, then the Job will not appear to be complete, even though all items
342
+ back to the API server, then the Job will not appear to be complete, even though all items
335
343
in the queue have been processed.
336
-
337
-
0 commit comments