The calculation speed is reduced to one-fourth when I compile deepmd-kit and lammps for performing molecular dynamics (MD) simulations. #3885

Dustglaxy · 2024-06-18T12:30:46Z

Dustglaxy
Jun 18, 2024

Dear developers,

Here, I have compiled the deepmd-kit and lammps using the following commands. However, the molecular dynamics (MD) speed is only 25% compared to when I use a conda installation directly. Since I have utilized a modified plumed, I had to compile them myself. Therefore, I kindly request your assistance in identifying and addressing the underlying issue.

Installation commands:

conda create -n cuda11
conda activate cuda11
conda install python==3.11.5
conda install cuda-nvcc

pip install --upgrade pip
pip install nvidia-cudnn-cu11==8.6.0.163 protobuf==4.23.4 tensorflow==2.13.*

#open a new terminal
CUDNN_PATH=$(dirname $(python -c "import nvidia.cudnn;print(nvidia.cudnn.file)"))
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/home/usergpu/soft/anaconda/install/envs/cuda11/lib/:$CUDNN_PATH/lib

## deepmd-kit
tar
cd source
mkdir build
cd build

export PATH=/home/usergpu/soft/cmake/cmake-3.30.0-rc2-linux-x86_64/bin:$PATH

cmake -DUSE_TF_PYTHON_LIBS=TRUE -DCMAKE_INSTALL_PREFIX=/home/usergpu/soft/deepmd-kit/install/ -DTENSORFLOW_ROOT=/home/usergpu/soft/anaconda/install/envs/cuda11/lib/python3.11/site-packages/tensorflow/ ..

make -j12
make install -j12
make lammps

lammps

cd lammps-stable_2Aug2023_update2/
cd src/
cp -r /home/usergpu/soft/deepmd-kit/deepmd-kit-2.2.7/source/build/USER-DEEPMD/ .
make yes-kspace
make yes-extra-fix
make yes-user-deepmd
source /home/usergpu/soft/plumed-2.8.1/sourceme.sh
make lib-plumed args='-p /home/usergpu/xyliu/soft/plumed-2.8.1/bilud/ -m shared'
make yes-user-deepmd
make mpi -j 12

the output in screen when I submit a task

2024-06-18 20:02:33.307208: I tensorflow/core/util/port.cc:110] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable TF_ENABLE_ONEDNN_OPTS=0.
2024-06-18 20:02:33.344105: I tensorflow/core/util/port.cc:110] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable TF_ENABLE_ONEDNN_OPTS=0.
DeePMD-kit WARNING: Environmental variable TF_INTRA_OP_PARALLELISM_THREADS is not set. Tune TF_INTRA_OP_PARALLELISM_THREADS for the best performance. See https://deepmd.rtfd.io/parallelism/ for more information.
DeePMD-kit WARNING: Environmental variable TF_INTER_OP_PARALLELISM_THREADS is not set. Tune TF_INTER_OP_PARALLELISM_THREADS for the best performance. See https://deepmd.rtfd.io/parallelism/ for more information.
DeePMD-kit WARNING: Environmental variable OMP_NUM_THREADS is not set. Tune OMP_NUM_THREADS for the best performance. See https://deepmd.rtfd.io/parallelism/ for more information.
DeePMD-kit WARNING: Environmental variable TF_INTRA_OP_PARALLELISM_THREADS is not set. Tune TF_INTRA_OP_PARALLELISM_THREADS for the best performance. See https://deepmd.rtfd.io/parallelism/ for more information.
DeePMD-kit WARNING: Environmental variable TF_INTER_OP_PARALLELISM_THREADS is not set. Tune TF_INTER_OP_PARALLELISM_THREADS for the best performance. See https://deepmd.rtfd.io/parallelism/ for more information.
DeePMD-kit WARNING: Environmental variable OMP_NUM_THREADS is not set. Tune OMP_NUM_THREADS for the best performance. See https://deepmd.rtfd.io/parallelism/ for more information.
2024-06-18 20:02:34.312498: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX512F AVX512_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
DeePMD-kit WARNING: Environmental variable TF_INTRA_OP_PARALLELISM_THREADS is not set. Tune TF_INTRA_OP_PARALLELISM_THREADS for the best performance. See https://deepmd.rtfd.io/parallelism/ for more information.
DeePMD-kit WARNING: Environmental variable TF_INTER_OP_PARALLELISM_THREADS is not set. Tune TF_INTER_OP_PARALLELISM_THREADS for the best performance. See https://deepmd.rtfd.io/parallelism/ for more information.
DeePMD-kit WARNING: Environmental variable OMP_NUM_THREADS is not set. Tune OMP_NUM_THREADS for the best performance. See https://deepmd.rtfd.io/parallelism/ for more information.
2024-06-18 20:02:34.333658: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX512F AVX512_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-06-18 20:02:35.269270: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1639] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 36881 MB memory: -> device: 0, name: NVIDIA A800 80GB PCIe, pci bus id: 0000:18:00.0, compute capability: 8.0
2024-06-18 20:02:35.276366: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1639] Created device /job:localhost/replica:0/task:0/device:GPU:1 with 77825 MB memory: -> device: 1, name: NVIDIA A800 80GB PCIe, pci bus id: 0000:86:00.0, compute capability: 8.0
2024-06-18 20:02:35.289328: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1639] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 36881 MB memory: -> device: 0, name: NVIDIA A800 80GB PCIe, pci bus id: 0000:18:00.0, compute capability: 8.0
2024-06-18 20:02:35.291170: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1639] Created device /job:localhost/replica:0/task:0/device:GPU:1 with 77825 MB memory: -> device: 1, name: NVIDIA A800 80GB PCIe, pci bus id: 0000:86:00.0, compute capability: 8.0
2024-06-18 20:02:35.325452: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:375] MLIR V1 optimization pass is not enabled
2024-06-18 20:02:35.358886: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:375] MLIR V1 optimization pass is not enabled
2024-06-18 20:02:35.508553: I tensorflow/compiler/xla/stream_executor/cuda/cuda_driver.cc:753] failed to allocate 36.02GiB (38673055744 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2024-06-18 20:02:35.512437: I tensorflow/compiler/xla/stream_executor/cuda/cuda_driver.cc:753] failed to allocate 32.42GiB (34805747712 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2024-06-18 20:02:35.516274: I tensorflow/compiler/xla/stream_executor/cuda/cuda_driver.cc:753] failed to allocate 29.17GiB (31325171712 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2024-06-18 20:02:35.520040: I tensorflow/compiler/xla/stream_executor/cuda/cuda_driver.cc:753] failed to allocate 26.26GiB (28192653312 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2024-06-18 20:02:35.524024: I tensorflow/compiler/xla/stream_executor/cuda/cuda_driver.cc:753] failed to allocate 23.63GiB (25373386752 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2024-06-18 20:02:35.528826: I tensorflow/compiler/xla/stream_executor/cuda/cuda_driver.cc:753] failed to allocate 21.27GiB (22836047872 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2024-06-18 20:02:35.534870: I tensorflow/compiler/xla/stream_executor/cuda/cuda_driver.cc:753] failed to allocate 19.14GiB (20552441856 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2024-06-18 20:02:35.540422: I tensorflow/compiler/xla/stream_executor/cuda/cuda_driver.cc:753] failed to allocate 17.23GiB (18497198080 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2024-06-18 20:02:35.544968: I tensorflow/compiler/xla/stream_executor/cuda/cuda_driver.cc:753] failed to allocate 15.50GiB (16647478272 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2024-06-18 20:02:35.548758: I tensorflow/compiler/xla/stream_executor/cuda/cuda_driver.cc:753] failed to allocate 13.95GiB (14982729728 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2024-06-18 20:02:35.552603: I tensorflow/compiler/xla/stream_executor/cuda/cuda_driver.cc:753] failed to allocate 12.56GiB (13484455936 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2024-06-18 20:02:35.556746: I tensorflow/compiler/xla/stream_executor/cuda/cuda_driver.cc:753] failed to allocate 11.30GiB (12136009728 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2024-06-18 20:02:35.562263: I tensorflow/compiler/xla/stream_executor/cuda/cuda_driver.cc:753] failed to allocate 10.17GiB (10922408960 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2024-06-18 20:02:35.567612: I tensorflow/compiler/xla/stream_executor/cuda/cuda_driver.cc:753] failed to allocate 9.15GiB (9830167552 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2024-06-18 20:02:35.571641: I tensorflow/compiler/xla/stream_executor/cuda/cuda_driver.cc:753] failed to allocate 8.24GiB (8847150080 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2024-06-18 20:02:35.576848: I tensorflow/compiler/xla/stream_executor/cuda/cuda_driver.cc:753] failed to allocate 7.42GiB (7962435072 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2024-06-18 20:02:35.580606: I tensorflow/compiler/xla/stream_executor/cuda/cuda_driver.cc:753] failed to allocate 6.67GiB (7166191616 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2024-06-18 20:02:35.584758: I tensorflow/compiler/xla/stream_executor/cuda/cuda_driver.cc:753] failed to allocate 6.01GiB (6449572352 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2024-06-18 20:02:35.588573: I tensorflow/compiler/xla/stream_executor/cuda/cuda_driver.cc:753] failed to allocate 5.41GiB (5804615168 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2024-06-18 20:02:35.594112: I tensorflow/compiler/xla/stream_executor/cuda/cuda_driver.cc:753] failed to allocate 4.87GiB (5224153600 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2024-06-18 20:02:35.599475: I tensorflow/compiler/xla/stream_executor/cuda/cuda_driver.cc:753] failed to allocate 4.38GiB (4701737984 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2024-06-18 20:02:35.603644: I tensorflow/compiler/xla/stream_executor/cuda/cuda_driver.cc:753] failed to allocate 3.94GiB (4231564032 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2024-06-18 20:02:35.607448: I tensorflow/compiler/xla/stream_executor/cuda/cuda_driver.cc:753] failed to allocate 3.55GiB (3808407552 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2024-06-18 20:02:35.611319: I tensorflow/compiler/xla/stream_executor/cuda/cuda_driver.cc:753] failed to allocate 3.19GiB (3427566592 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2024-06-18 20:02:35.615339: I tensorflow/compiler/xla/stream_executor/cuda/cuda_driver.cc:753] failed to allocate 2.87GiB (3084809728 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2024-06-18 20:02:35.620897: I tensorflow/compiler/xla/stream_executor/cuda/cuda_driver.cc:753] failed to allocate 2.58GiB (2776328704 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2024-06-18 20:02:35.626355: I tensorflow/compiler/xla/stream_executor/cuda/cuda_driver.cc:753] failed to allocate 2.33GiB (2498695680 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2024-06-18 20:02:35.631273: I tensorflow/compiler/xla/stream_executor/cuda/cuda_driver.cc:753] failed to allocate 2.09GiB (2248826112 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2024-06-18 20:02:35.635073: I tensorflow/compiler/xla/stream_executor/cuda/cuda_driver.cc:753] failed to allocate 1.88GiB (2023943424 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2024-06-18 20:02:35.638871: I tensorflow/compiler/xla/stream_executor/cuda/cuda_driver.cc:753] failed to allocate 1.70GiB (1821549056 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2024-06-18 20:02:35.643169: I tensorflow/compiler/xla/stream_executor/cuda/cuda_driver.cc:753] failed to allocate 1.53GiB (1639394048 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory
DeePMD-kit WARNING: Environmental variable TF_INTRA_OP_PARALLELISM_THREADS is not set. Tune TF_INTRA_OP_PARALLELISM_THREADS for the best performance. See https://deepmd.rtfd.io/parallelism/ for more information.
DeePMD-kit WARNING: Environmental variable TF_INTER_OP_PARALLELISM_THREADS is not set. Tune TF_INTER_OP_PARALLELISM_THREADS for the best performance. See https://deepmd.rtfd.io/parallelism/ for more information.
DeePMD-kit WARNING: Environmental variable OMP_NUM_THREADS is not set. Tune OMP_NUM_THREADS for the best performance. See https://deepmd.rtfd.io/parallelism/ for more information.
DeePMD-kit WARNING: Environmental variable TF_INTRA_OP_PARALLELISM_THREADS is not set. Tune TF_INTRA_OP_PARALLELISM_THREADS for the best performance. See https://deepmd.rtfd.io/parallelism/ for more information.
DeePMD-kit WARNING: Environmental variable TF_INTER_OP_PARALLELISM_THREADS is not set. Tune TF_INTER_OP_PARALLELISM_THREADS for the best performance. See https://deepmd.rtfd.io/parallelism/ for more information.
DeePMD-kit WARNING: Environmental variable OMP_NUM_THREADS is not set. Tune OMP_NUM_THREADS for the best performance. See https://deepmd.rtfd.io/parallelism/ for more information.
2024-06-18 20:02:36.492733: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1639] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 36881 MB memory: -> device: 0, name: NVIDIA A800 80GB PCIe, pci bus id: 0000:18:00.0, compute capability: 8.0
2024-06-18 20:02:36.494282: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1639] Created device /job:localhost/replica:0/task:0/device:GPU:1 with 77825 MB memory: -> device: 1, name: NVIDIA A800 80GB PCIe, pci bus id: 0000:86:00.0, compute capability: 8.0
2024-06-18 20:02:36.497800: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1639] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 36881 MB memory: -> device: 0, name: NVIDIA A800 80GB PCIe, pci bus id: 0000:18:00.0, compute capability: 8.0
2024-06-18 20:02:36.499288: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1639] Created device /job:localhost/replica:0/task:0/device:GPU:1 with 77825 MB memory: -> device: 1, name: NVIDIA A800 80GB PCIe, pci bus id: 0000:86:00.0, compute capability: 8.0
2024-06-18 20:02:36.532222: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1639] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 36881 MB memory: -> device: 0, name: NVIDIA A800 80GB PCIe, pci bus id: 0000:18:00.0, compute capability: 8.0
2024-06-18 20:02:36.549077: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1639] Created device /job:localhost/replica:0/task:0/device:GPU:1 with 77825 MB memory: -> device: 1, name: NVIDIA A800 80GB PCIe, pci bus id: 0000:86:00.0, compute capability: 8.0
2024-06-18 20:02:36.558621: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1639] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 36881 MB memory: -> device: 0, name: NVIDIA A800 80GB PCIe, pci bus id: 0000:18:00.0, compute capability: 8.0
2024-06-18 20:02:36.560058: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1639] Created device /job:localhost/replica:0/task:0/device:GPU:1 with 77825 MB memory: -> device: 1, name: NVIDIA A800 80GB PCIe, pci bus id: 0000:86:00.0, compute capability: 8.0

log. file

Info of deepmd-kit:
installed to: /home/usergpu/xyliu/soft/deepmd-kit/install2/installation
source:
source branch:
source commit:
source commit at:
surpport model ver.:1.1
build variant: cpu
build with tf inc: /home/usergpu/soft/anaconda/install/envs/cuda11_2/lib/python3.11/site-packages/tensorflow/include;/home/usergpu/soft/anaconda/install/envs/cuda11_2/lib/python3.11/site-packages/tensorflow/include
build with tf lib: /home/usergpu/soft/anaconda/install/envs/cuda11_2/lib/python3.11/site-packages/tensorflow/libtensorflow_cc.so.2
set tf intra_op_parallelism_threads: 0
set tf inter_op_parallelism_threads: 0
Info of lammps module:
use deepmd-kit at: /home/usergpu/xyliu/soft/deepmd-kit/install2/installation
source:
source branch:
source commit:
source commit at:
build float prec: double
build with tf inc: /home/usergpu/soft/anaconda/install/envs/cuda11_2/lib/python3.11/site-packages/tensorflow/include;/home/usergpu/soft/anaconda/install/envs/cuda11_2/lib/python3.11/site-packages/tensorflow/include
build with tf lib: /home/usergpu/soft/anaconda/install/envs/cuda11_2/lib/python3.11/site-packages/tensorflow/libtensorflow_cc.so.2
INVALID_ARGUMENT: Tensor spin_attr/ntypes_spin:0, specified in either feed_devices or fetch_devices was not found in the Graph
INVALID_ARGUMENT: Tensor spin_attr/ntypes_spin:0, specified in either feed_devices or fetch_devices was not found in the Graph
INVALID_ARGUMENT: Tensor spin_attr/ntypes_spin:0, specified in either feed_devices or fetch_devices was not found in the Graph
INVALID_ARGUMENT: Tensor spin_attr/ntypes_spin:0, specified in either feed_devices or fetch_devices was not found in the Graph

njzjz · 2024-06-19T21:28:36Z

njzjz
Jun 19, 2024
Maintainer

It seems to me you got an out-of-memory error, so the GPU may not be used.

2024-06-18 20:02:35.643169: I tensorflow/compiler/xla/stream_executor/cuda/cuda_driver.cc:753] failed to allocate 1.53GiB (1639394048 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory

By the way, it looks strange that your two cards have different memory.

1 reply

Dustglaxy Jun 23, 2024
Author

Thank you very much. The strangeness in memory occurs because I have not installed a task management system and multiple tasks use
one GPU simultaneously.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

The calculation speed is reduced to one-fourth when I compile deepmd-kit and lammps for performing molecular dynamics (MD) simulations. #3885

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment 1 reply

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

The calculation speed is reduced to one-fourth when I compile deepmd-kit and lammps for performing molecular dynamics (MD) simulations. #3885

Uh oh!

Dustglaxy Jun 18, 2024

lammps

the output in screen when I submit a task

log. file

Replies: 1 comment · 1 reply

Uh oh!

njzjz Jun 19, 2024 Maintainer

Uh oh!

Dustglaxy Jun 23, 2024 Author

Dustglaxy
Jun 18, 2024

Replies: 1 comment 1 reply

njzjz
Jun 19, 2024
Maintainer

Dustglaxy Jun 23, 2024
Author