-
Notifications
You must be signed in to change notification settings - Fork 62
Open
Labels
bugSomething isn't workingSomething isn't workingmodule: distributedFor distributed feature issueFor distributed feature issue
Description
🐛 Describe the bug
please get wheels from https://github.com/intel/torch-xpu-ops/actions/runs/19357390650 or use gh download
gh run download 19357390650 --repo intel/torch-xpu-ops --name Torch-XPU-Wheel-1826 --dir path --pattern "*.zip"
git clone -b distributed_2.10 https://github.com/daisyden/pytorch.git
cd pytorch
pip install -r requirements.txt
pip install pytest expecttest
pytest -v test/distributed/test_composability.py::ComposabilityTest::test_pp_ddp_ScheduleClass0
Traceback (most recent call last):
File "/tmp/xpu-tool/Python/3.10.19/x64/lib/python3.10/unittest/case.py", line 59, in testPartExecutor
yield
File "/tmp/xpu-tool/Python/3.10.19/x64/lib/python3.10/unittest/case.py", line 591, in run
self._callTestMethod(testMethod)
File "/tmp/xpu-tool/Python/3.10.19/x64/lib/python3.10/unittest/case.py", line 549, in _callTestMethod
method()
File "/tmp/xpu-tool/Python/3.10.19/x64/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1968, in wrapper
raise rv
RuntimeError: Exception in worker process:
Traceback (most recent call last):
File "/tmp/xpu-tool/Python/3.10.19/x64/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1817, in _worker_loop
cls._run_test_given_id(test_id)
File "/tmp/xpu-tool/Python/3.10.19/x64/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1790, in _run_test_given_id
test_fn(**kwargs)
File "/tmp/xpu-tool/Python/3.10.19/x64/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1977, in wrapper
fn()
File "/tmp/xpu-tool/Python/3.10.19/x64/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3336, in wrapper
method(*args, **kwargs)
File "/tmp/xpu-tool/Python/3.10.19/x64/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 587, in instantiated_test
test(self, **param_kwargs)
File "/tmp/xpu-tool/Python/3.10.19/x64/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 233, in wrapper
return func(*args, **kwargs)
File "/home/jenkins/actions-runner/_work/torch-xpu-ops/torch-xpu-ops/pytorch/test/distributed/test_composability.py", line 273, in test_pp_ddp
torch.testing.assert_close(p.grad, ref_p.grad)
File "/tmp/xpu-tool/Python/3.10.19/x64/lib/python3.10/site-packages/torch/testing/_comparison.py", line 1600, in assert_close
raise error_metas[0].to_error(msg)
AssertionError: Tensor-likes are not close!
Mismatched elements: 18 / 100 (18.0%)
Greatest absolute difference: 1.758510188665241e-05 at index (3, 1) (up to 1e-05 allowed)
Greatest relative difference: 4.611908912658691 at index (9, 6) (up to 1.3e-06 allowed)
Versions
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't workingmodule: distributedFor distributed feature issueFor distributed feature issue