Skip to content

Commit f08226e

Browse files
authored
Retry docker build step to handle network error (#5759)
I'm frequently seeing network errors when building Docker image which were subsequently fixed by retrying. So, it's a good idea to support retry in this step. For example, * Failure https://github.com/pytorch/pytorch/actions/runs/11329053976/job/31503695304#step:6:5836 * Retry https://github.com/pytorch/pytorch/actions/runs/11329053976/job/31509384226 ### Testing Using PyTorch PR pytorch/pytorch#137896 to test the change at https://github.com/pytorch/pytorch/actions/runs/11331057887
1 parent c7e5294 commit f08226e

File tree

1 file changed

+38
-11
lines changed

1 file changed

+38
-11
lines changed

.github/actions/calculate-docker-image/action.yml

Lines changed: 38 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -137,15 +137,11 @@ runs:
137137
138138
echo "rebuild=true" >> "${GITHUB_OUTPUT}"
139139
140-
- name: Build and push docker image
140+
- name: Login to ECR
141+
if: ${{ steps.calculate-image.outputs.skip != 'true' && (inputs.always-rebuild || steps.check-image.outputs.rebuild) }}
141142
shell: bash
142143
working-directory: ${{ inputs.working-directory }}/${{ inputs.docker-build-dir }}
143-
if: ${{ steps.calculate-image.outputs.skip != 'true' && (inputs.always-rebuild || steps.check-image.outputs.rebuild) }}
144144
env:
145-
REPO_NAME: ${{ github.event.repository.name }}
146-
DOCKER_PUSH: ${{ inputs.push }}
147-
DOCKER_FORCE_PUSH: ${{ inputs.force-push }}
148-
DOCKER_IMAGE: ${{ steps.calculate-image.outputs.docker-image }}
149145
DOCKER_REGISTRY: ${{ inputs.docker-registry }}
150146
run: |
151147
set -x
@@ -161,11 +157,42 @@ runs:
161157
162158
retry login "${DOCKER_REGISTRY}"
163159
164-
set -e
165-
166-
IMAGE_NAME=$(echo ${DOCKER_IMAGE#"${DOCKER_REGISTRY}/${REPO_NAME}/"} | awk -F '[:,]' '{print $1}')
167-
# Build new image
168-
./build.sh "${IMAGE_NAME}" -t "${DOCKER_IMAGE}"
160+
- name: Build docker image
161+
if: ${{ steps.calculate-image.outputs.skip != 'true' && (inputs.always-rebuild || steps.check-image.outputs.rebuild) }}
162+
env:
163+
REPO_NAME: ${{ github.event.repository.name }}
164+
DOCKER_IMAGE: ${{ steps.calculate-image.outputs.docker-image }}
165+
DOCKER_REGISTRY: ${{ inputs.docker-registry }}
166+
WORKING_DIRECTORY: ${{ inputs.working-directory }}/${{ inputs.docker-build-dir }}
167+
# NB: Retry here as this step frequently fails with network error downloading various stuffs
168+
uses: nick-fields/[email protected]
169+
with:
170+
shell: bash
171+
timeout_minutes: 90
172+
max_attempts: 3
173+
retry_wait_seconds: 90
174+
command: |
175+
set -ex
176+
177+
# NB: Setting working directory on the step doesn't work with nick-fields/retry https://github.com/nick-fields/retry/issues/89
178+
pushd "${WORKING_DIRECTORY}"
179+
180+
IMAGE_NAME=$(echo ${DOCKER_IMAGE#"${DOCKER_REGISTRY}/${REPO_NAME}/"} | awk -F '[:,]' '{print $1}')
181+
# Build new image
182+
./build.sh "${IMAGE_NAME}" -t "${DOCKER_IMAGE}"
183+
184+
popd
185+
186+
- name: Push to ECR
187+
if: ${{ steps.calculate-image.outputs.skip != 'true' && (inputs.always-rebuild || steps.check-image.outputs.rebuild) }}
188+
shell: bash
189+
working-directory: ${{ inputs.working-directory }}/${{ inputs.docker-build-dir }}
190+
env:
191+
DOCKER_PUSH: ${{ inputs.push }}
192+
DOCKER_FORCE_PUSH: ${{ inputs.force-push }}
193+
DOCKER_IMAGE: ${{ steps.calculate-image.outputs.docker-image }}
194+
run: |
195+
set -ex
169196
170197
if [ "${DOCKER_PUSH:-false}" == "true" ]; then
171198
# Only push if docker image doesn't exist already

0 commit comments

Comments
 (0)