Skip to content

Commit 295d2f6

Browse files
authored
backout switching gpu workers to docker + new worker image (#1120)
* backout switching gpu workers to docker + new worker image * upgrade to taskgraph 14.2.1 This picks up taskcluster/taskgraph#691, which is needed to revert back to the old GPU worker image.
1 parent b7a7f77 commit 295d2f6

File tree

27 files changed

+214
-383
lines changed

27 files changed

+214
-383
lines changed

.taskcluster.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -227,7 +227,7 @@ tasks:
227227
taskclusterProxy: true
228228
chainOfTrust: true
229229

230-
image: mozillareleases/taskgraph:decision-v14.1.1@sha256:a3f03dbaf6b52733e16e52cdf6fc547193839ed825e1c3740d2b0515f65c7d73
230+
image: mozillareleases/taskgraph:decision-v14.2.1@sha256:f4e3a22df9ec0017a2534b3a7b4cd9b60318f86619e0c2156c12c1ec1a0e32cb
231231
maxRunTime: 1800
232232
onExitStatus:
233233
retry:

docs/training/task-cluster.md

Lines changed: 12 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -136,16 +136,24 @@ To start an interactive task, follow these steps:
136136

137137
5. Reduce the maxRunTime to a best guess at how long you'll need the task and worker running for. (We pay for every minute a worker runs - so they should not be kept running, eg: overnight.)
138138

139-
6. Adjust the payload to simply run bash and sleep (instead of a full pipeline step):
139+
6. Adjust the payload to simply run bash and sleep (instead of a full pipeline step). For docker-worker tasks use something like:
140140
```
141141
command:
142142
- bash
143143
- '-c'
144144
- 'sleep 7200'
145145
```
146146

147-
7. Click "Create Task"
147+
For generic-worker tasks (those needing a GPU), use:
148+
```
149+
command:
150+
- - bash
151+
- '-c'
152+
- 'sleep 7200'
153+
```
154+
155+
(docker-worker tasks have an `image` section in the payload)
148156

149-
After a few minutes you should be able to get a shell (a link will show up in the tab when it's ready). This shell should drop you inside of docker container as root, running the same image as the task you started this process with. Most tasks drop privileges to the `worker` user before doing any work, so you may want to run `su - worker` before doing anything of note.
157+
7. Click "Create Task"
150158

151-
When you are done with the worker you can use "Cancel" from the three dots menu to immediately shut it down. (This should happen within a few minutes of closing your last shell to the worker, but it's good practice to do it yourself to minimize costs.)
159+
After a few minutes you should be able to get a shell (a link will show up in the tab when it's ready).

pipeline/bicleaner/bicleaner.sh

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -13,6 +13,10 @@ echo "###### Bicleaner filtering"
1313
test -v SRC
1414
test -v TRG
1515
test -v CUDA_DIR
16+
test -v CUDNN_DIR
17+
18+
# cuda and cudnn libs
19+
export LD_LIBRARY_PATH=${CUDA_DIR}/lib64:${CUDNN_DIR}:${LD_LIBRARY_PATH:+LD_LIBRARY_PATH:}
1620

1721
corpus_prefix=$1
1822
output_prefix=$2
Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,3 @@
11
ctranslate2==4.3.1
22
sentencepiece==0.2.0
33
gpustat==1.1.1
4-
requests==2.32.3

pipeline/translate/requirements/translate-ctranslate2.txt

Lines changed: 120 additions & 242 deletions
Large diffs are not rendered by default.

poetry.lock

Lines changed: 5 additions & 10 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

pyproject.toml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -44,7 +44,7 @@ PyGithub="2.4.0"
4444
pyperclip="1.9.0"
4545
ruamel-yaml = "^0.18.6"
4646
taskcluster = "^56.0.3"
47-
taskcluster-taskgraph = "^14.1.1"
47+
taskcluster-taskgraph = "^14.2.1"
4848
kinto-http="11.7.1"
4949
# Use an outdated version of pydantic due to dependency requirements conflict.
5050
pydantic="1.10.19"
@@ -69,7 +69,7 @@ requests-mock = "^1.11.0"
6969
sh = "^2.0.6"
7070
zstandard = "^0.22.0"
7171
translations_parser = {path="./tracking/", develop=true}
72-
taskcluster-taskgraph = "^14.1.1"
72+
taskcluster-taskgraph = "^14.2.1"
7373
translations_taskgraph = {path="./taskcluster/", develop=true}
7474
sacremoses = "0.1.1"
7575
hanzidentifier = "1.2.0"

taskcluster/config.yml

Lines changed: 14 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -94,39 +94,39 @@ workers:
9494
worker-type: 'b-linux-large-gcp-1tb-64-512-std-d2g'
9595
b-linux-v100-gpu:
9696
provisioner: '{trust-domain}-{level}'
97-
implementation: docker-worker
97+
implementation: generic-worker
9898
os: linux
99-
worker-type: 'b-linux-v100-gpu-d2g'
99+
worker-type: '{alias}'
100100
b-linux-v100-gpu-4:
101101
provisioner: '{trust-domain}-{level}'
102-
implementation: docker-worker
102+
implementation: generic-worker
103103
os: linux
104-
worker-type: 'b-linux-v100-gpu-d2g-4'
104+
worker-type: '{alias}'
105105
b-linux-v100-gpu-4-300gb:
106106
provisioner: '{trust-domain}-{level}'
107-
implementation: docker-worker
107+
implementation: generic-worker
108108
os: linux
109-
worker-type: 'b-linux-v100-gpu-d2g-4-300gb'
109+
worker-type: '{alias}'
110110
b-linux-v100-gpu-4-300gb-standard:
111111
provisioner: '{trust-domain}-{level}'
112-
implementation: docker-worker
112+
implementation: generic-worker
113113
os: linux
114-
worker-type: 'b-linux-v100-gpu-d2g-4-300gb-standard'
114+
worker-type: '{alias}'
115115
b-linux-v100-gpu-4-1tb:
116116
provisioner: '{trust-domain}-{level}'
117-
implementation: docker-worker
117+
implementation: generic-worker
118118
os: linux
119-
worker-type: 'b-linux-v100-gpu-d2g-4-1tb'
119+
worker-type: '{alias}'
120120
b-linux-v100-gpu-4-2tb:
121121
provisioner: '{trust-domain}-{level}'
122-
implementation: docker-worker
122+
implementation: generic-worker
123123
os: linux
124-
worker-type: 'b-linux-v100-gpu-d2g-4-2tb'
124+
worker-type: '{alias}'
125125
b-linux-v100-gpu-4-1tb-standard:
126126
provisioner: '{trust-domain}-{level}'
127-
implementation: docker-worker
127+
implementation: generic-worker
128128
os: linux
129-
worker-type: 'b-linux-v100-gpu-d2g-4-1tb-standard'
129+
worker-type: '{alias}'
130130
images:
131131
provisioner: '{trust-domain}-{level}'
132132
implementation: docker-worker

taskcluster/docker/base/Dockerfile

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -59,6 +59,6 @@ ENV SHELL=/bin/bash \
5959
TERM=hterm-256color
6060

6161
VOLUME /builds/worker/checkouts
62-
VOLUME /builds/worker/.task-cache/pip
62+
VOLUME /builds/worker/.cache
6363

6464
USER root

taskcluster/docker/inference/Dockerfile

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -68,6 +68,6 @@ ENV SHELL=/bin/bash \
6868
PATH="/builds/worker/.local/bin:$PATH"
6969

7070
VOLUME /builds/worker/checkouts
71-
VOLUME /builds/worker/.task-cache/pip
71+
VOLUME /builds/worker/.cache
7272

7373
USER root

0 commit comments

Comments
 (0)