[CI] Auto detect the available GPU devices and distribute them with CPUs #2173

mengfei25 · 2025-10-14T09:14:46Z

disable_build

…2171) https://github.com/intel/torch-xpu-ops/actions/runs/18370317029/job/52569740992?pr=1992#step:4:1135 pytest does not run but the result is pass disable_e2e disable_ut

clean skip list in skip_list_common.py, removed passed cases. disable_e2e disable_distributed Signed-off-by: chunhuanMeng <[email protected]> Co-authored-by: chunhuanMeng <[email protected]> Co-authored-by: Daisy Deng <[email protected]>

1. all huggingface models list file for nightly 2. all timm models list file for nightly 3. all torchbench models file for nightly 4. remove huggingface CamemBert due to removed in [pytorch/pytorch/pull/164815](https://github.com/pytorch/pytorch/pull/164815/files#diff-004303ad6116d64ab2a8356469ccb11b32d8caca702e1ad65cc0538600a76d2dL170) 5. add '--disable-cudagraphs' to reduce the impact of CUDA for XPU tests 6. align models list with pytorch for CI tests disable_build disable_ut disable_distributed

Stack from [ghstack](https://github.com/ezyang/ghstack) (oldest at bottom): * __->__ #2179

# Motivation I would like to clarify that, according to the [FP64 Partial Emulation Proposal](https://intel.sharepoint.com/:w:/s/MLTSHdGPU/EaroFZY371hOqNL9182g2_EBOe83qGYTriAavPB6WTWXYg?e=XSRnKt), the SYCL compiler and IGC only perform FP64 conversion on the DG2 and ATS-M architectures, and this is only available when AOT is enabled. If AOT is not enabled, many warnings like the following will be emitted: ```bash icx: warning: '-fsycl-fp64-conv-emu' option is supported only for AOT compilation of Intel GPUs. It will be ignored for other targets ``` To avoid these warnings, the `-fsycl-fp64-conv-emu` flag should only be added when AOT is enabled for the specific target architectures.

Since more FP8 Ops will be supported on XPU recently, basic FP8 cases should be activated. This PR will remove the following cases from skip list: ``` TestCommonXPU::test_compare_cpu_torch__scaled_mm_xpu_float8_e4m3fn SKIPPED (test doesn't work on XPU backend) TestCommonXPU::test_out_torch__scaled_mm_xpu_float8_e4m3fn SKIPPED (Skipped!) TestCommonXPU::test_python_ref__refs_eye_xpu_float8_e4m3fn SKIPPED (test doesn't work on XPU backend) TestCommonXPU::test_python_ref__refs_eye_xpu_float8_e4m3fnuz SKIPPED (test doesn't work on XPU backend) TestCommonXPU::test_python_ref__refs_eye_xpu_float8_e5m2 SKIPPED (test doesn't work on XPU backend) TestCommonXPU::test_python_ref__refs_eye_xpu_float8_e5m2fnuz SKIPPED (test doesn't work on XPU backend) TestCommonXPU::test_python_ref_executor__refs_eye_executor_aten_xpu_float8_e4m3fn PASSED TestCommonXPU::test_python_ref_executor__refs_eye_executor_aten_xpu_float8_e4m3fnuz PASSED TestCommonXPU::test_python_ref_executor__refs_eye_executor_aten_xpu_float8_e5m2 PASSED TestCommonXPU::test_python_ref_executor__refs_eye_executor_aten_xpu_float8_e5m2fnuz PASSED TestCommonXPU::test_python_ref_meta__refs_eye_xpu_float8_e4m3fn SKIPPED (test doesn't work on XPU backend) TestCommonXPU::test_python_ref_meta__refs_eye_xpu_float8_e4m3fnuz SKIPPED (test doesn't work on XPU backend) TestCommonXPU::test_python_ref_meta__refs_eye_xpu_float8_e5m2 SKIPPED (test doesn't work on XPU backend) TestCommonXPU::test_python_ref_meta__refs_eye_xpu_float8_e5m2fnuz SKIPPED (test doesn't work on XPU backend) TestCommonXPU::test_python_ref_torch_fallback__refs_eye_xpu_float8_e4m3fn SKIPPED (test doesn't work on XPU backend) TestCommonXPU::test_python_ref_torch_fallback__refs_eye_xpu_float8_e4m3fnuz SKIPPED (test doesn't work on XPU backend) TestCommonXPU::test_python_ref_torch_fallback__refs_eye_xpu_float8_e5m2 SKIPPED (test doesn't work on XPU backend) TestCommonXPU::test_python_ref_torch_fallback__refs_eye_xpu_float8_e5m2fnuz SKIPPED (test doesn't work on XPU backend) ```

mengfei25 and others added 11 commits October 14, 2025 16:04

update

53f477b

cleanup code

62d8c86

remove .github/scripts/detect_intel_gpu.py

0436df4

update

7bd589d

[CI] Enhance distibuted summary in case the test case does not run (#…

3a9f503

…2171) https://github.com/intel/torch-xpu-ops/actions/runs/18370317029/job/52569740992?pr=1992#step:4:1135 pytest does not run but the result is pass disable_e2e disable_ut

clean skip list in skip_list_common.py (#1825)

bb0c059

clean skip list in skip_list_common.py, removed passed cases. disable_e2e disable_distributed Signed-off-by: chunhuanMeng <[email protected]> Co-authored-by: chunhuanMeng <[email protected]> Co-authored-by: Daisy Deng <[email protected]>

Remove unused and obsolete build flag (#2179)

a46b282

Stack from [ghstack](https://github.com/ezyang/ghstack) (oldest at bottom): * __->__ #2179

Merge branch 'main' into mengfeil/device-detect

57fc232

mengfei25 marked this pull request as draft October 16, 2025 07:01

mengfei25 closed this Oct 16, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[CI] Auto detect the available GPU devices and distribute them with CPUs #2173

[CI] Auto detect the available GPU devices and distribute them with CPUs #2173

mengfei25 commented Oct 14, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

[CI] Auto detect the available GPU devices and distribute them with CPUs #2173

[CI] Auto detect the available GPU devices and distribute them with CPUs #2173

Conversation

mengfei25 commented Oct 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

mengfei25 commented Oct 14, 2025 •

edited

Loading