-
Notifications
You must be signed in to change notification settings - Fork 62
Open
Labels
bugSomething isn't workingSomething isn't workingmodule: distributedFor distributed feature issueFor distributed feature issue
Description
🐛 Describe the bug
please get wheels from https://github.com/intel/torch-xpu-ops/actions/runs/19357390650 or use gh download
gh run download 19357390650 --repo intel/torch-xpu-ops --name Torch-XPU-Wheel-1826 --dir path --pattern "*.zip"
git clone -b distributed_2.10 https://github.com/daisyden/pytorch.git
cd pytorch
pip install -r requirements.txt
pip install pytest expecttest
pytest -v test/distributed/tensor/test_convolution_ops.py::DistConvolutionOpsTest::test_conv_backward_none_grad_inp
Traceback (most recent call last):
File "/tmp/xpu-tool/Python/3.10.19/x64/lib/python3.10/unittest/case.py", line 59, in testPartExecutor
yield
File "/tmp/xpu-tool/Python/3.10.19/x64/lib/python3.10/unittest/case.py", line 591, in run
self._callTestMethod(testMethod)
File "/tmp/xpu-tool/Python/3.10.19/x64/lib/python3.10/unittest/case.py", line 549, in _callTestMethod
method()
File "/tmp/xpu-tool/Python/3.10.19/x64/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 814, in wrapper
self._join_processes(fn)
File "/tmp/xpu-tool/Python/3.10.19/x64/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1082, in _join_processes
self._check_return_codes(fn, elapsed_time)
File "/tmp/xpu-tool/Python/3.10.19/x64/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1122, in _check_return_codes
raise RuntimeError(error)
RuntimeError: Process 0 exited with error code 10 and exception:
Traceback (most recent call last):
File "/tmp/xpu-tool/Python/3.10.19/x64/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 968, in run_test
getattr(self, test_name)()
File "/tmp/xpu-tool/Python/3.10.19/x64/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 816, in wrapper
fn()
File "/tmp/xpu-tool/Python/3.10.19/x64/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3336, in wrapper
method(*args, **kwargs)
File "/tmp/xpu-tool/Python/3.10.19/x64/lib/python3.10/site-packages/torch/testing/_internal/distributed/_tensor/common_dtensor.py", line 534, in wrapper
raise e
File "/tmp/xpu-tool/Python/3.10.19/x64/lib/python3.10/site-packages/torch/testing/_internal/distributed/_tensor/common_dtensor.py", line 531, in wrapper
func(self, *args, **kwargs) # type: ignore[misc]
File "/tmp/xpu-tool/Python/3.10.19/x64/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 233, in wrapper
return func(*args, **kwargs)
File "/home/jenkins/actions-runner/_work/torch-xpu-ops/torch-xpu-ops/pytorch/test/distributed/tensor/test_convolution_ops.py", line 202, in test_conv_backward_none_grad_inp
res.backward(dres)
File "/tmp/xpu-tool/Python/3.10.19/x64/lib/python3.10/site-packages/torch/_tensor.py", line 629, in backward
torch.autograd.backward(
File "/tmp/xpu-tool/Python/3.10.19/x64/lib/python3.10/site-packages/torch/autograd/__init__.py", line 364, in backward
_engine_run_backward(
File "/tmp/xpu-tool/Python/3.10.19/x64/lib/python3.10/site-packages/torch/autograd/graph.py", line 865, in _engine_run_backward
return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass
File "/tmp/xpu-tool/Python/3.10.19/x64/lib/python3.10/site-packages/torch/distributed/tensor/_tp_conv.py", line 291, in convolution_backward_handler
return dtensor.DTensor._op_dispatcher.wrap(
File "/tmp/xpu-tool/Python/3.10.19/x64/lib/python3.10/site-packages/torch/distributed/tensor/_dispatch.py", line 558, in wrap
res_list.append(OpDispatcher.wrap(e, s))
File "/tmp/xpu-tool/Python/3.10.19/x64/lib/python3.10/site-packages/torch/distributed/tensor/_dispatch.py", line 550, in wrap
assert res.ndim == 0, "output tensor should be scalar!"
AssertionError: output tensor should be scalar!
Versions
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't workingmodule: distributedFor distributed feature issueFor distributed feature issue