-
Notifications
You must be signed in to change notification settings - Fork 62
Open
Labels
bugSomething isn't workingSomething isn't workingmodule: distributedFor distributed feature issueFor distributed feature issue
Description
🐛 Describe the bug
please get wheels from https://github.com/intel/torch-xpu-ops/actions/runs/19357390650 or use gh download
gh run download 19357390650 --repo intel/torch-xpu-ops --name Torch-XPU-Wheel-1826 --dir path --pattern "*.zip"
git clone -b distributed_2.10 https://github.com/daisyden/pytorch.git
cd pytorch
pip install -r requirements.txt
pip install pytest expecttest
pytest -v test/distributed/algorithms/test_join.py::TestJoin::test_join_kwargs
pytest -v test/distributed/algorithms/test_join.py::TestJoin::test_multiple_joinables
pytest -v test/distributed/algorithms/test_join.py::TestJoin::test_single_joinable
pytest -v test/distributed/algorithms/test_join.py::TestJoin::test_single_joinable_main_hooks
FAILED [15.4239s] ../../../../test/distributed/algorithms/test_join.py::TestJoin::test_join_kwargs - RuntimeError: Process 3 exited with error code 10 and exception:
Traceback (most recent call last):
File "/tmp/xpu-tool/Python/3.10.19/x64/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 968, in run_test
getattr(self, test_name)()
File "/tmp/xpu-tool/Python/3.10.19/x64/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 816, in wrapper
fn()
File "/tmp/xpu-tool/Python/3.10.19/x64/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3336, in wrapper
method(*args, **kwargs)
File "/tmp/xpu-tool/Python/3.10.19/x64/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 233, in wrapper
return func(*args, **kwargs)
File "/home/jenkins/actions-runner/_work/torch-xpu-ops/torch-xpu-ops/pytorch/test/distributed/algorithms/test_join.py", line 498, in test_join_kwargs
self._test_join_base(
File "/home/jenkins/actions-runner/_work/torch-xpu-ops/torch-xpu-ops/pytorch/test/distributed/algorithms/test_join.py", line 276, in _test_join_base
self.assertEqual(allreduce_total, expected_total)
File "/tmp/xpu-tool/Python/3.10.19/x64/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4291, in assertEqual
raise error_metas.pop()[0].to_error( # type: ignore[index]
AssertionError: Scalars are not close!
Expected 36 but got 42.0.
Absolute difference: 6.0 (up to 1e-05 allowed)
Relative difference: 0.16666666666666666 (up to 1.3e-06 allowed)
To execute this test, run the following from the base repo dir:
PYTORCH_TEST_WITH_SLOW=1 python test/distributed/algorithms/test_join.py TestJoin.test_join_kwargs
Versions
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't workingmodule: distributedFor distributed feature issueFor distributed feature issue