Skip to content

Commit 07da8a4

Browse files
chuanqi129pytorchmergebot
authored andcommitted
[CI] fix xpu-smi hang issue on some xpu runners (pytorch#155194)
To workaround xpu-smi hang issue on some XPU runners, refer https://github.com/pytorch/pytorch/actions/runs/15431583674/job/43431289026?pr=154962 Pull Request resolved: pytorch#155194 Approved by: https://github.com/EikanWang, https://github.com/malfet
1 parent e694280 commit 07da8a4

File tree

1 file changed

+2
-2
lines changed

1 file changed

+2
-2
lines changed

.github/actions/setup-xpu/action.yml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -29,13 +29,13 @@ runs:
2929
if: always()
3030
shell: bash
3131
run: |
32-
xpu-smi discovery
32+
timeout 30 xpu-smi discovery || true
3333
3434
- name: Runner health check GPU count
3535
if: always()
3636
shell: bash
3737
run: |
38-
ngpu=$(xpu-smi discovery | grep -c -E 'Device Name')
38+
ngpu=$(timeout 30 xpu-smi discovery | grep -c -E 'Device Name')
3939
msg="Please file an issue on pytorch/pytorch reporting the faulty runner. Include a link to the runner logs so the runner can be identified"
4040
if [[ $ngpu -eq 0 ]]; then
4141
echo "Error: Failed to detect any GPUs on the runner"

0 commit comments

Comments
 (0)