Some problems encountered in using GPU to accelerate lammps #3033
Unanswered
SEU-NiuWenLong
asked this question in
Q&A
Replies: 1 comment 1 reply
-
@Yi-FanLi Is this the error you got? |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
I have two GPU cards, and when I use GPU acceleration, lammps always breaks off after running for a while and reports an error.
Possible remote error message: ESC[31m==> /home/gcniu/workspace/deepmd/23-32/run/temp/81c427e9fd55ff100029be97075854c91642ee29/task.002.000055/model_devi
.log <==
ibdeepmd_1697184996481/work/source/lib/src/gpu/prod_env_mat.cu: 625, in file /home/conda/feedstock_root/build_artifacts/libdeepmd_1697184996481/work/sour
ce/op/custom_op.cc:18
[[{{node ProdEnvMatA}}]]
[[o_energy/_31]]
(1) INTERNAL: Operation received an exception: DeePMD-kit Error: CUDA Runtime library throws an error: an illegal memory access was encountered, in file /home/conda/feedstock_root/build_artifacts/libdeepmd_1697184996481/work/source/lib/src/gpu/prod_env_mat.cu: 625, in file /home/conda/feedstock_root/build_artifacts/libdeepmd_1697184996481/work/source/op/custom_op.cc:18
[[{{node ProdEnvMatA}}]]
0 successful operations.
0 derived errors ignored. (/home/conda/feedstock_root/build_artifacts/libdeepmd_1697184996481/work/source/lmp/pair_deepmd.cpp:634)
Last command: run ${NSTEPS} upto
application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0
[unset]: write_line error; fd=-1 buf=:cmd=abort exitcode=1
:
system msg for write_line failure : Bad file descriptor
ESC[0m
This is my machine.json file:
"model_devi": [
{
"command": "lmp",
"machine": {
"context_type": "local",
"batch_type": "Slurm",
"local_root": "./",
"remote_root": "/home/gcniu/workspace/deepmd/23-32/run/temp"
},
"resources": {
"number_node": 1,
"cpu_per_node": 16,
"gpu_per_node": 2,
"queue_name": "GPU",
"strategy":{"if_cuda_multi_devices":true},
"custom_flags" : [
"#SBATCH -J gcniu",
"#SBATCH -n 16",
"#SBATCH -o %j.log",
"#SBATCH -e %j.log"
],
"group_size": 1000,
"_source_list": ["/home/gcniu/workspace/deepmd/23-32/run/envs.sh"]
}
}
]
Is there something wrong with my parameter file configuration? Or is it something else?
Beta Was this translation helpful? Give feedback.
All reactions