-
Hi, Command -- CUDA_VISIBLE_DEVICES=4,5,6,7 mpirun -np 4 dp train --mpi-log=workers input.json Output -- Problem:- Unable to do parallel training via mpirun command. Horovod installation giving an error that's why trying via mpirun only. System Configuration -- DGX V100 GPU |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment
-
In the documentation, it is clearly written that without HOROVOD installation it will fall back to serial mode training. |
Beta Was this translation helpful? Give feedback.
In the documentation, it is clearly written that without HOROVOD installation it will fall back to serial mode training.
Link -- https://docs.deepmodeling.com/projects/deepmd/en/master/install/install-from-source.html