Skip to content

[distributed] test_c10d_spawn_nccl.py RuntimeError: Backend not supported! #2369

@zxd1997066

Description

@zxd1997066

🐛 Describe the bug

please get wheels from https://github.com/intel/torch-xpu-ops/actions/runs/19357390650 or use gh download

gh run download 19357390650 --repo intel/torch-xpu-ops --name Torch-XPU-Wheel-1826 --dir path --pattern "*.zip"
git clone -b distributed_2.10 https://github.com/daisyden/pytorch.git
cd pytorch
pip install -r requirements.txt
pip install pytest expecttest
pytest -v test/distributed/test_c10d_spawn_nccl.py::TestDistributedNNFunctionsNccl::test_all_gather_base
FAILED [3.9087s] ../../../../test/distributed/test_c10d_spawn_nccl.py::TestDistributedNNFunctionsNccl::test_all_gather_base - RuntimeError: Process 0 exited with error code 10 and exception:
Traceback (most recent call last):
  File "/tmp/xpu-tool/Python/3.10.19/x64/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 968, in run_test
    getattr(self, test_name)()
  File "/tmp/xpu-tool/Python/3.10.19/x64/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 816, in wrapper
    fn()
  File "/tmp/xpu-tool/Python/3.10.19/x64/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3336, in wrapper
    method(*args, **kwargs)
  File "/tmp/xpu-tool/Python/3.10.19/x64/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 233, in wrapper
    return func(*args, **kwargs)
  File "/home/jenkins/actions-runner/_work/torch-xpu-ops/torch-xpu-ops/pytorch/test/distributed/test_c10d_spawn_nccl.py", line 194, in test_all_gather_base
    z.backward()
  File "/tmp/xpu-tool/Python/3.10.19/x64/lib/python3.10/site-packages/torch/_tensor.py", line 629, in backward
    torch.autograd.backward(
  File "/tmp/xpu-tool/Python/3.10.19/x64/lib/python3.10/site-packages/torch/autograd/__init__.py", line 364, in backward
    _engine_run_backward(
  File "/tmp/xpu-tool/Python/3.10.19/x64/lib/python3.10/site-packages/torch/autograd/graph.py", line 865, in _engine_run_backward
    return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
  File "/tmp/xpu-tool/Python/3.10.19/x64/lib/python3.10/site-packages/torch/autograd/function.py", line 317, in apply
    return user_fn(self, *args)
  File "/tmp/xpu-tool/Python/3.10.19/x64/lib/python3.10/site-packages/torch/distributed/nn/functional.py", line 382, in backward
    raise RuntimeError("Backend not supported!")
RuntimeError: Backend not supported!

Versions

https://github.com/daisyden/pytorch/tree/distributed_2.10

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingmodule: distributedFor distributed feature issue

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions