Skip to content

Commit 4aecc4c

Browse files
MoFHekarhdong
authored andcommitted
[fix] Lack of HorovodJoin CPU kernels when install Horovod with NCCL, which make unable to run horovod_sync_train_test.
1 parent 265e1ed commit 4aecc4c

File tree

1 file changed

+12
-0
lines changed

1 file changed

+12
-0
lines changed

tools/testing/build_and_run_tests.sh

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -46,8 +46,20 @@ if ! [ -x "$(command -v nvidia-smi)" ]; then
4646
EXTRA_ARGS="-n auto"
4747
fi
4848

49+
# Lack of HorovodJoin CPU kernels when install Horovod with NCCL
50+
if [ "$(uname)" != "Darwin" ]; then
51+
# Mac only with MPI
52+
python -m pip uninstall horovod -y
53+
bash /install/install_horovod.sh $HOROVOD_VERSION --only-cpu
54+
fi
4955
# TODO(jamesrong): Test on GPU.
5056
CUDA_VISIBLE_DEVICES="" mpirun -np 2 -H localhost:2 --allow-run-as-root pytest -v ./tensorflow_recommenders_addons/dynamic_embedding/python/kernel_tests/horovod_sync_train_test.py
57+
# Reinstall Horovod after tests
58+
if [ "$(uname)" != "Darwin" ]; then
59+
# Mac only with MPI
60+
python -m pip uninstall horovod -y
61+
bash /install/install_horovod.sh $HOROVOD_VERSION
62+
fi
5163

5264
# Only use GPU 0 if available.
5365
if [ -x "$(command -v nvidia-smi)" ]; then

0 commit comments

Comments
 (0)