Replies: 1 comment
-
https://github.com/deepmodeling/dpgen/blob/master/doc/run/overview-of-the-run-process.md |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Hi folks,
I am just a new user. I am running an unfinished DPA-2 model by DP-GEN. I am setting
"stop_batch": 500000,
. So I have already finished001
and002
models.003
was still running but was stopped and000
has not started yet.The command I am using in machine.json is
"command": "CUDA_VISIBLE_DEVICES=0,1,2,3 torchrun --nnodes=1 --nproc_per_node=auto --no-python dp --pt"
. If I would like to restart the DP-GEN job, what should I do in this case?Is this
CUDA_VISIBLE_DEVICES=0,1,2,3 torchrun --nnodes=1 --nproc_per_node=auto --no-python dp --restart --pt
enough? Do I need to specify themodel.ckpt.pt
? How to specify themodel.ckpt.pt
precisely?I know for DeePMD-kit, I should use "dp train --restart model.ckpt". I am just not sure what should I do in DP-GEN. I am REALLY confused.
Could anyone please give me one example to show how to set a restart task in DP-GEN?
One more question, if the DP-GEN task was stopped at the stage of LAMMPS or VASP. What should I do to restart them then?
Thanks for your time.
The tree structure of DeePMD-kit is here.
Beta Was this translation helpful? Give feedback.
All reactions