ERROR About DeepMD-Kit :dp train #986
Unanswered
walkjoker-c
asked this question in
Q&A
Replies: 2 comments
-
The error command is: nohup dp train input.json 1>>train.log 2>>train.log & |
Beta Was this translation helpful? Give feedback.
0 replies
-
RTX3090 is not compatible with CUDA 10.1. Use CUDA Toolkit 11.3 instead. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
I configured a workstation with an RTX3090, installed centos Linux system and graphics card driver: NVIDIA-Linux-x86_64-470.63.01, I did not install any other software. I downloaded deepMD-kit and used the DP train command, but there were some errors that prevented me from continuing to train the model.
DeepMD-kit versions: deepmd-kit-2.0.0.b2-cuda10.1_gpu-Linux-x86_64
Way to install: ./deepmd-kit-2.0.0.b2-cuda10.1_gpu-Linux-x86_64.sh
ERROR : cuda assert: DeePMD-kit: illegal mbor list sorting /home/conda/feedstock_ root/build artifacts/libdeepmd_1623852038677/work/source/lib/src/cuda/prod env_mat.cu 509
DeepMD-kit versions: deepmd-kit-2.0.0.b4-cuda10.1_gpu-Linux-x86_64
Way to install: ./deepmd-kit-2.0.0.b4-cuda10.1_gpu-Linux-x86_64.sh
Warning : WARNING:deepmd.train.run_options:Switch to serial execution due to lack of horovod module
ERROR: cuda assert: no kernel image is available for execution on the device /home/conda/feedstock_root/build_artifacts/libdeepmd_1628190504746/work/source/lib/src/cuda/coord.cu 367
Is my system missing relevant components and what should I do??
Beta Was this translation helpful? Give feedback.
All reactions