Skip to content

Commit 2133010

Browse files
akihironittalantiga
authored andcommitted
CI/CD: Add CUDA version to docker image tags (#13831)
* append cuda version to tags * revertme: push to hub * Update docker readme * Build base-conda-py3.9-torch1.12-cuda11.3.1 * Use new images in conda tests * revertme: push to hub * Revert "revertme: push to hub" This reverts commit 0f7d534. * Revert "revertme: push to hub" This reverts commit 46a05fc. * Run conda if workflow edited * Run gpu testing if workflow edited * Use new tags in release/Dockerfile * Build base-cuda and PL release images with all combinations * Update release docker * Update conda from py3.9-torch1.12 to py3.10-torch.1.12 * Fix ubuntu version * Revert conda * revertme: push to hub * Don't build Python 3.10 for now... * Fix pl release builder * updating version contribute to the error? docker/buildx#456 * Update actions' versions * Update slack user to notify * Don't use 11.6.0 to avoid bagua incompatibility * Don't use 11.1, and use 11.1.1 * Update .github/workflows/ci-pytorch_test-conda.yml Co-authored-by: Luca Medeiros <[email protected]> * Update trigger * Ignore artfacts from tutorials * Trim docker images to distribute * Add an image for tutorials * Update conda image 3.8x1.10 * Try different conda variants * No need to set cuda for conda jobs * Update who to notify ipu failure * Don't push * update filenaem Co-authored-by: Luca Medeiros <[email protected]> (cherry picked from commit d5f35ec)
1 parent 3d1054c commit 2133010

File tree

8 files changed

+87
-89
lines changed

8 files changed

+87
-89
lines changed

.azure/gpu-benchmark.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -28,7 +28,7 @@ jobs:
2828
cancelTimeoutInMinutes: "2"
2929
pool: azure-jirka-spot
3030
container:
31-
image: "pytorchlightning/pytorch_lightning:base-cuda-py3.9-torch1.11"
31+
image: "pytorchlightning/pytorch_lightning:base-cuda-py3.9-torch1.12-cuda11.3.1"
3232
options: "--runtime=nvidia -e NVIDIA_VISIBLE_DEVICES=all --shm-size=32g"
3333
workspace:
3434
clean: all

.azure/gpu-tests.yml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -26,7 +26,7 @@ jobs:
2626
strategy:
2727
matrix:
2828
'PyTorch - stable':
29-
image: "pytorchlightning/pytorch_lightning:base-cuda-py3.9-torch1.11"
29+
image: "pytorchlightning/pytorch_lightning:base-cuda-py3.9-torch1.12-cuda11.3.1"
3030
# how long to run the job before automatically cancelling
3131
timeoutInMinutes: "80"
3232
# how much time to give 'run always even if cancelled tasks' before stopping them
@@ -44,7 +44,7 @@ jobs:
4444

4545
- bash: |
4646
CHANGED_FILES=$(git diff --name-status origin/master -- . | awk '{print $2}')
47-
FILTER='src/pytorch_lightning|requirements/pytorch|tests/tests_pytorch|examples/pl_*'
47+
FILTER='src/pytorch_lightning|requirements/pytorch|tests/tests_pytorch|examples/pl_*|.azure/gpu-tests.yml'
4848
echo $CHANGED_FILES > changed_files.txt
4949
MATCHES=$(cat changed_files.txt | grep -E $FILTER)
5050
echo $MATCHES

.github/workflows/ci-pytorch_test-conda.yml

Lines changed: 1 addition & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -22,13 +22,11 @@ jobs:
2222
strategy:
2323
fail-fast: false
2424
matrix:
25-
# nightly: add when there's a release candidate
2625
include:
2726
- {python-version: "3.8", pytorch-version: "1.9"}
2827
- {python-version: "3.8", pytorch-version: "1.10"}
2928
- {python-version: "3.9", pytorch-version: "1.11"}
3029
- {python-version: "3.9", pytorch-version: "1.12"}
31-
3230
timeout-minutes: 30
3331

3432
steps:
@@ -45,7 +43,7 @@ jobs:
4543
id: skip
4644
shell: bash -l {0}
4745
run: |
48-
FILTER='src/pytorch_lightning|requirements/pytorch|tests/tests_pytorch|examples/pl_*'
46+
FILTER='src/pytorch_lightning|requirements/pytorch|tests/tests_pytorch|examples/pl_*|.github/workflows/ci-pytorch-test-conda.yml'
4947
echo "${{ steps.changed-files.outputs.all_changed_files }}" | tr " " "\n" > changed_files.txt
5048
MATCHES=$(cat changed_files.txt | grep -E $FILTER)
5149
echo $MATCHES

.github/workflows/cicd-pytorch_dockers.yml

Lines changed: 42 additions & 39 deletions
Original file line numberDiff line numberDiff line change
@@ -30,17 +30,22 @@ jobs:
3030
strategy:
3131
fail-fast: false
3232
matrix:
33-
# the config used in '.azure-pipelines/gpu-tests.yml' since the Dockerfile uses the cuda image
34-
python_version: ["3.9"]
35-
pytorch_version: ["1.10", "1.11"]
33+
include:
34+
# We only release one docker image per PyTorch version.
35+
# The matrix here is the same as the one in release-docker.yml.
36+
- {python_version: "3.9", pytorch_version: "1.9", cuda_version: "11.1.1"}
37+
- {python_version: "3.9", pytorch_version: "1.10", cuda_version: "11.3.1"}
38+
- {python_version: "3.9", pytorch_version: "1.11", cuda_version: "11.3.1"}
39+
- {python_version: "3.9", pytorch_version: "1.12", cuda_version: "11.3.1"}
3640
steps:
37-
- uses: actions/checkout@v2
41+
- uses: actions/checkout@v3
3842
- uses: docker/setup-buildx-action@v2
39-
- uses: docker/build-push-action@v2
43+
- uses: docker/build-push-action@v3
4044
with:
4145
build-args: |
4246
PYTHON_VERSION=${{ matrix.python_version }}
4347
PYTORCH_VERSION=${{ matrix.pytorch_version }}
48+
CUDA_VERSION=${{ matrix.cuda_version }}
4449
file: dockers/release/Dockerfile
4550
push: false # pushed in release-docker.yml only when PL is released
4651
timeout-minutes: 50
@@ -54,14 +59,14 @@ jobs:
5459
python_version: ["3.7"]
5560
xla_version: ["1.11"]
5661
steps:
57-
- uses: actions/checkout@v2
62+
- uses: actions/checkout@v3
5863
- uses: docker/setup-buildx-action@v2
59-
- uses: docker/login-action@v1
64+
- uses: docker/login-action@v2
6065
if: env.PUSH_TO_HUB == 'true'
6166
with:
6267
username: ${{ secrets.DOCKER_USERNAME }}
6368
password: ${{ secrets.DOCKER_PASSWORD }}
64-
- uses: docker/build-push-action@v2
69+
- uses: docker/build-push-action@v3
6570
with:
6671
build-args: |
6772
PYTHON_VERSION=${{ matrix.python_version }}
@@ -86,31 +91,31 @@ jobs:
8691
fail-fast: false
8792
matrix:
8893
include:
89-
# the config used in '.azure-pipelines/gpu-tests.yml'
90-
- {python_version: "3.7", pytorch_version: "1.10", cuda_version: "11.1", ubuntu_version: "20.04"}
91-
- {python_version: "3.7", pytorch_version: "1.11", cuda_version: "11.3.1", ubuntu_version: "20.04"}
92-
# latest (used in Tutorials)
93-
- {python_version: "3.8", pytorch_version: "1.9", cuda_version: "11.1", ubuntu_version: "20.04"}
94-
- {python_version: "3.9", pytorch_version: "1.10", cuda_version: "11.1", ubuntu_version: "20.04"}
95-
- {python_version: "3.9", pytorch_version: "1.11", cuda_version: "11.3.1", ubuntu_version: "20.04"}
94+
# These are the base images for PL release docker images,
95+
# so include at least all of the combinations in release-dockers.yml.
96+
- {python_version: "3.9", pytorch_version: "1.9", cuda_version: "11.1.1"}
97+
- {python_version: "3.9", pytorch_version: "1.10", cuda_version: "11.3.1"}
98+
- {python_version: "3.9", pytorch_version: "1.11", cuda_version: "11.3.1"}
99+
- {python_version: "3.9", pytorch_version: "1.12", cuda_version: "11.3.1"}
100+
# Used in Lightning-AI/tutorials
101+
- {python_version: "3.8", pytorch_version: "1.9", cuda_version: "11.1.1"}
96102
steps:
97-
- uses: actions/checkout@v2
103+
- uses: actions/checkout@v3
98104
- uses: docker/setup-buildx-action@v2
99-
- uses: docker/login-action@v1
105+
- uses: docker/login-action@v2
100106
if: env.PUSH_TO_HUB == 'true'
101107
with:
102108
username: ${{ secrets.DOCKER_USERNAME }}
103109
password: ${{ secrets.DOCKER_PASSWORD }}
104-
- uses: docker/build-push-action@v2
110+
- uses: docker/build-push-action@v3
105111
with:
106112
build-args: |
107113
PYTHON_VERSION=${{ matrix.python_version }}
108114
PYTORCH_VERSION=${{ matrix.pytorch_version }}
109115
CUDA_VERSION=${{ matrix.cuda_version }}
110-
UBUNTU_VERSION=${{ matrix.ubuntu_version }}
111116
file: dockers/base-cuda/Dockerfile
112117
push: ${{ env.PUSH_TO_HUB }}
113-
tags: pytorchlightning/pytorch_lightning:base-cuda-py${{ matrix.python_version }}-torch${{ matrix.pytorch_version }}
118+
tags: pytorchlightning/pytorch_lightning:base-cuda-py${{ matrix.python_version }}-torch${{ matrix.pytorch_version }}-cuda${{ matrix.cuda_version }}
114119
timeout-minutes: 95
115120
- uses: ravsamhq/notify-slack-action@v1
116121
if: failure() && env.PUSH_TO_HUB == 'true'
@@ -128,25 +133,23 @@ jobs:
128133
fail-fast: false
129134
matrix:
130135
include:
131-
- {python_version: "3.8", pytorch_version: "1.9", cuda_version: "11.1"}
132-
- {python_version: "3.8", pytorch_version: "1.10", cuda_version: "11.1"}
133-
- {python_version: "3.9", pytorch_version: "1.11", cuda_version: "11.3.1"}
134-
# nightly: add when there's a release candidate
135-
# - {python_version: "3.9", pytorch_version: "1.12"}
136+
- {python_version: "3.8", pytorch_version: "1.9"}
137+
- {python_version: "3.8", pytorch_version: "1.10"}
138+
- {python_version: "3.9", pytorch_version: "1.11"}
139+
- {python_version: "3.9", pytorch_version: "1.12"}
136140
steps:
137-
- uses: actions/checkout@v2
141+
- uses: actions/checkout@v3
138142
- uses: docker/setup-buildx-action@v2
139-
- uses: docker/login-action@v1
143+
- uses: docker/login-action@v2
140144
if: env.PUSH_TO_HUB == 'true'
141145
with:
142146
username: ${{ secrets.DOCKER_USERNAME }}
143147
password: ${{ secrets.DOCKER_PASSWORD }}
144-
- uses: docker/build-push-action@v2
148+
- uses: docker/build-push-action@v3
145149
with:
146150
build-args: |
147151
PYTHON_VERSION=${{ matrix.python_version }}
148152
PYTORCH_VERSION=${{ matrix.pytorch_version }}
149-
CUDA_VERSION=${{ matrix.cuda_version }}
150153
file: dockers/base-conda/Dockerfile
151154
push: ${{ env.PUSH_TO_HUB }}
152155
tags: pytorchlightning/pytorch_lightning:base-conda-py${{ matrix.python_version }}-torch${{ matrix.pytorch_version }}
@@ -170,14 +173,14 @@ jobs:
170173
# the config used in 'dockers/ci-runner-ipu/Dockerfile'
171174
- {python_version: "3.9", pytorch_version: "1.9"}
172175
steps:
173-
- uses: actions/checkout@v2
176+
- uses: actions/checkout@v3
174177
- uses: docker/setup-buildx-action@v2
175-
- uses: docker/login-action@v1
178+
- uses: docker/login-action@v2
176179
if: env.PUSH_TO_HUB == 'true'
177180
with:
178181
username: ${{ secrets.DOCKER_USERNAME }}
179182
password: ${{ secrets.DOCKER_PASSWORD }}
180-
- uses: docker/build-push-action@v2
183+
- uses: docker/build-push-action@v3
181184
with:
182185
build-args: |
183186
PYTHON_VERSION=${{ matrix.python_version }}
@@ -186,7 +189,7 @@ jobs:
186189
push: ${{ env.PUSH_TO_HUB }}
187190
tags: pytorchlightning/pytorch_lightning:base-ipu-py${{ matrix.python_version }}-torch${{ matrix.pytorch_version }}
188191
timeout-minutes: 100
189-
- uses: docker/build-push-action@v2
192+
- uses: docker/build-push-action@v3
190193
with:
191194
build-args: |
192195
PYTHON_VERSION=${{ matrix.python_version }}
@@ -201,7 +204,7 @@ jobs:
201204
status: ${{ job.status }}
202205
token: ${{ secrets.GITHUB_TOKEN }}
203206
notification_title: ${{ format('IPU; {0} py{1} for *{2}*', runner.os, matrix.python_version, matrix.pytorch_version) }}
204-
message_format: '{emoji} *{workflow}* {status_message}, see <{run_url}|detail>, cc: <@U01BULUS2BG>' # SeanNaren
207+
message_format: '{emoji} *{workflow}* {status_message}, see <{run_url}|detail>, cc: <@U01GD29QCAV>' # kaushikb11
205208
env:
206209
SLACK_WEBHOOK_URL: ${{ secrets.SLACK_WEBHOOK_URL }}
207210

@@ -214,14 +217,14 @@ jobs:
214217
# the config used in 'dockers/ci-runner-hpu/Dockerfile'
215218
- {gaudi_version: "1.5.0", pytorch_version: "1.11.0"}
216219
steps:
217-
- uses: actions/checkout@v2
220+
- uses: actions/checkout@v3
218221
- uses: docker/setup-buildx-action@v2
219-
- uses: docker/login-action@v1
222+
- uses: docker/login-action@v2
220223
if: env.PUSH_TO_HUB == 'true'
221224
with:
222225
username: ${{ secrets.DOCKER_USERNAME }}
223226
password: ${{ secrets.DOCKER_PASSWORD }}
224-
- uses: docker/build-push-action@v2
227+
- uses: docker/build-push-action@v3
225228
with:
226229
build-args: |
227230
DIST=latest
@@ -245,10 +248,10 @@ jobs:
245248
runs-on: ubuntu-20.04
246249
steps:
247250
- name: Checkout
248-
uses: actions/checkout@v2
251+
uses: actions/checkout@v3
249252
- name: Build Conda Docker
250253
# publish master/release
251-
uses: docker/build-push-action@v2
254+
uses: docker/build-push-action@v3
252255
with:
253256
file: dockers/nvidia/Dockerfile
254257
push: false

.github/workflows/release-docker.yml

Lines changed: 22 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,5 @@
11
name: Docker
2-
# https://www.docker.com/blog/first-docker-github-action-is-here
3-
# https://github.com/docker/build-push-action
2+
43
on:
54
push:
65
branches: [master, "release/*"]
@@ -15,8 +14,12 @@ jobs:
1514
strategy:
1615
fail-fast: false
1716
matrix:
18-
python_version: ["3.7", "3.8", "3.9"]
19-
pytorch_version: ["1.9", "1.10"]
17+
include:
18+
# We only release one docker image per PyTorch version.
19+
- {python_version: "3.9", pytorch_version: "1.9", cuda_version: "11.1.1"}
20+
- {python_version: "3.9", pytorch_version: "1.10", cuda_version: "11.3.1"}
21+
- {python_version: "3.9", pytorch_version: "1.11", cuda_version: "11.3.1"}
22+
- {python_version: "3.9", pytorch_version: "1.12", cuda_version: "11.3.1"}
2023
steps:
2124
- name: Checkout
2225
uses: actions/checkout@v2
@@ -32,19 +35,29 @@ jobs:
3235
username: ${{ secrets.DOCKER_USERNAME }}
3336
password: ${{ secrets.DOCKER_PASSWORD }}
3437
dockerfile: dockers/release/Dockerfile
35-
build_args: PYTHON_VERSION=${{ matrix.python_version }},PYTORCH_VERSION=${{ matrix.pytorch_version }},LIGHTNING_VERSION=${{ steps.get_version.outputs.RELEASE_VERSION }}
36-
tags: "${{ steps.get_version.outputs.RELEASE_VERSION }}-py${{ matrix.python_version }}-torch${{ matrix.pytorch_version }},latest-py${{ matrix.python_version }}-torch${{ matrix.pytorch_version }}"
38+
build_args: |
39+
PYTHON_VERSION=${{ matrix.python_version }}
40+
PYTORCH_VERSION=${{ matrix.pytorch_version }}
41+
CUDA_VERSION=${{ matrix.cuda_version }}
42+
LIGHTNING_VERSION=${{ steps.get_version.outputs.RELEASE_VERSION }}
43+
tags: |
44+
${{ steps.get_version.outputs.RELEASE_VERSION }}-py${{ matrix.python_version }}-torch${{ matrix.pytorch_version }}-cuda${{ matrix.cuda_version }}
45+
latest-py${{ matrix.python_version }}-torch${{ matrix.pytorch_version }}-cuda${{ matrix.cuda_version }}
3746
timeout-minutes: 55
3847

3948
- name: Publish Latest to Docker
4049
uses: docker/[email protected]
41-
# only on releases and latest Python and PyTorch
42-
if: matrix.python_version == '3.9' && matrix.pytorch_version == '1.10'
50+
# Only latest Python and PyTorch
51+
if: matrix.python_version == '3.9' && matrix.pytorch_version == '1.12'
4352
with:
4453
repository: pytorchlightning/pytorch_lightning
4554
username: ${{ secrets.DOCKER_USERNAME }}
4655
password: ${{ secrets.DOCKER_PASSWORD }}
4756
dockerfile: dockers/release/Dockerfile
48-
build_args: PYTHON_VERSION=${{ matrix.python_version }},PYTORCH_VERSION=${{ matrix.pytorch_version }},LIGHTNING_VERSION=${{ steps.get_version.outputs.RELEASE_VERSION }}
57+
build_args: |
58+
PYTHON_VERSION=${{ matrix.python_version }}
59+
PYTORCH_VERSION=${{ matrix.pytorch_version }}
60+
CUDA_VERSION=${{ matrix.cuda_version }}
61+
LIGHTNING_VERSION=${{ steps.get_version.outputs.RELEASE_VERSION }}
4962
tags: "latest"
5063
timeout-minutes: 55

.gitignore

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -160,3 +160,9 @@ tags
160160
.tags
161161
src/lightning_app/ui/*
162162
*examples/template_react_ui*
163+
164+
# tutorials
165+
our_model.tar
166+
test.png
167+
saved_models
168+
data/

dockers/README.md

Lines changed: 11 additions & 34 deletions
Original file line numberDiff line numberDiff line change
@@ -1,36 +1,17 @@
11
# Docker images
22

3-
## Builds images form attached Dockerfiles
3+
## Build images from Dockerfiles
44

55
You can build it on your own, note it takes lots of time, be prepared.
66

77
```bash
8-
git clone <git-repository>
9-
docker image build -t pytorch-lightning:latest -f dockers/conda/Dockerfile .
10-
```
11-
12-
or with specific arguments
13-
14-
```bash
15-
git clone <git-repository>
16-
docker image build \
17-
-t pytorch-lightning:base-cuda-py3.9-pt1.10 \
18-
-f dockers/base-cuda/Dockerfile \
19-
--build-arg PYTHON_VERSION=3.9 \
20-
--build-arg PYTORCH_VERSION=1.10 \
21-
.
22-
```
8+
git clone https://github.com/Lightning-AI/lightning.git
239

24-
or nightly version from Conda
10+
# build with the default arguments
11+
docker image build -t pytorch-lightning:latest -f dockers/base-cuda/Dockerfile .
2512

26-
```bash
27-
git clone <git-repository>
28-
docker image build \
29-
-t pytorch-lightning:base-conda-py3.9-pt1.11 \
30-
-f dockers/base-conda/Dockerfile \
31-
--build-arg PYTHON_VERSION=3.9 \
32-
--build-arg PYTORCH_VERSION=1.11 \
33-
.
13+
# build with specific arguments
14+
docker image build -t pytorch-lightning:base-cuda-py3.9-torch1.11-cuda11.3.1 -f dockers/base-cuda/Dockerfile --build-arg PYTHON_VERSION=3.9 --build-arg PYTORCH_VERSION=1.11 --build-arg CUDA_VERSION=11.3.1 .
3415
```
3516

3617
To run your docker use
@@ -49,7 +30,7 @@ docker image rm pytorch-lightning:latest
4930

5031
## Run docker image with GPUs
5132

52-
To run docker image with access to you GPUs you need to install
33+
To run docker image with access to your GPUs, you need to install
5334

5435
```bash
5536
# Add the package repositories
@@ -61,10 +42,10 @@ sudo apt-get update && sudo apt-get install -y nvidia-container-toolkit
6142
sudo systemctl restart docker
6243
```
6344

64-
and later run the docker image with `--gpus all` so for example
45+
and later run the docker image with `--gpus all`. For example,
6546

6647
```
67-
docker run --rm -it --gpus all pytorchlightning/pytorch_lightning:base-cuda-py3.9-torch1.10
48+
docker run --rm -it --gpus all pytorchlightning/pytorch_lightning:base-cuda-py3.9-torch1.11-cuda11.3.1
6849
```
6950

7051
## Run Jupyter server
@@ -73,15 +54,11 @@ Inspiration comes from https://u.group/thinking/how-to-put-jupyter-notebooks-in-
7354

7455
1. Build the docker image:
7556
```bash
76-
docker image build \
77-
-t pytorch-lightning:v1.3.1 \
78-
-f dockers/nvidia/Dockerfile \
79-
--build-arg LIGHTNING_VERSION=1.3.1 \
80-
.
57+
docker image build -t pytorch-lightning:v1.6.5 -f dockers/nvidia/Dockerfile --build-arg LIGHTNING_VERSION=1.6.5 .
8158
```
8259
1. start the server and map ports:
8360
```bash
84-
docker run --rm -it --runtime=nvidia -e NVIDIA_VISIBLE_DEVICES=all -p 8888:8888 pytorch-lightning:v1.3.1
61+
docker run --rm -it --runtime=nvidia -e NVIDIA_VISIBLE_DEVICES=all -p 8888:8888 pytorch-lightning:v1.6.5
8562
```
8663
1. Connect in local browser:
8764
- copy the generated path e.g. `http://hostname:8888/?token=0719fa7e1729778b0cec363541a608d5003e26d4910983c6`

0 commit comments

Comments
 (0)