Replies: 1 comment 2 replies
-
About this problem, you adopted the old version of You can refer to https://github.com/deepmodeling/dpgen/blob/master/examples/run/dp2.x-lammps-vasp/param_CH4_deepmd-kit-2.0.1.json and https://github.com/deepmodeling/deepmd-kit/blob/master/examples/water/se_e2_a/input.json, and make a comparison. Besides, could you provide the reference of your |
Beta Was this translation helpful? Give feedback.
2 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Teacher, when I enter the command "dpgen run param.json machine.json", I get an error.
"Traceback (most recent call last):
File "/public/home/duanxiangmei/softwore/dpgen/lib/python3.8/site-packages/dpdispatcher/submission.py", line 215, in handle_unexpected_submission_state
job.handle_unexpected_job_state()
File "/public/home/duanxiangmei/softwore/dpgen/lib/python3.8/site-packages/dpdispatcher/submission.py", line 532, in handle_unexpected_job_state
raise RuntimeError(f"job:{self.job_hash} {self.job_id} failed {self.fail_count} times.job_detail:{self}")
RuntimeError: job:72173a39f8ec32e18711bd340dfc0bcf6068dacc 20661 failed 6 times.job_detail:{'72173a39f8ec32e18711bd340dfc0bcf6068dacc': {'job_task_list': [{'command': "/bin/sh -c '{ if [ ! -f model.ckpt.index ]; then dp train input.json; else dp train input.json --restart model.ckpt; fi }'&&dp freeze", 'task_work_path': '002', 'forward_files': ['input.json'], 'backward_files': ['frozen_model.pb', 'lcurve.out', 'train.log', 'model.ckpt.meta', 'model.ckpt.index', 'model.ckpt.data-00000-of-00001', 'checkpoint'], 'outlog': 'train.log', 'errlog': 'train.log'}], 'resources': {'number_node': 1, 'cpu_per_node': 4, 'gpu_per_node': 0, 'queue_name': 'train', 'group_size': 1, 'custom_flags': [], 'strategy': {'if_cuda_multi_devices': False}, 'para_deg': 1, 'module_unload_list': [], 'module_list': [], 'source_list': ['/public/software/profile.d/compiler_intel-compiler-2021.3.0.sh', '/public/software/profile.d/mpi_intelmpi-2021.3.0.sh', '/public/home/duanxiangmei/.bashrc'], 'envs': {}, 'kwargs': {}}, 'job_state': <JobStatus.terminated: 4>, 'job_id': 20661, 'fail_count': 6}}
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/public/home/duanxiangmei/softwore/dpgen/bin/dpgen", line 10, in
sys.exit(main())
File "/public/home/duanxiangmei/softwore/dpgen/lib/python3.8/site-packages/dpgen/main.py", line 175, in main
args.func(args)
File "/public/home/duanxiangmei/softwore/dpgen/lib/python3.8/site-packages/dpgen/generator/run.py", line 2944, in gen_run
run_iter (args.PARAM, args.MACHINE)
File "/public/home/duanxiangmei/softwore/dpgen/lib/python3.8/site-packages/dpgen/generator/run.py", line 2909, in run_iter
run_train (ii, jdata, mdata)
File "/public/home/duanxiangmei/softwore/dpgen/lib/python3.8/site-packages/dpgen/generator/run.py", line 607, in run_train
submission.run_submission()
File "/public/home/duanxiangmei/softwore/dpgen/lib/python3.8/site-packages/dpdispatcher/submission.py", line 164, in run_submission
self.handle_unexpected_submission_state()
File "/public/home/duanxiangmei/softwore/dpgen/lib/python3.8/site-packages/dpdispatcher/submission.py", line 219, in handle_unexpected_submission_state
f"Meet errors will handle unexpected submission state.\n"
AttributeError: 'Submission' object has no attribute 'remote_root'"
Then I check the train.log file of ch4/run/work/789b60381b5f7811b1e59f4b19fcbb340b2316a6/000
"WARNING:tensorflow:From /public/home/duanxiangmei/softwore/deepmd-kit/lib/python3.10/site-packages/tensorflow/python/compat/v2_compat.py:107: disable_resource_variables (from tensorflow.python.ops.variable_scope) is deprecated and will be removed in a future version.
Instructions for updating:
non-resource variables are not supported in the long term
WARNING:root:To get the best performance, it is recommended to adjust the number of threads by setting the environment variables OMP_NUM_THREADS, TF_INTRA_OP_PARALLELISM_THREADS, and TF_INTER_OP_PARALLELISM_THREADS.
WARNING:root:Environment variable KMP_BLOCKTIME is empty. Use the default value 0
WARNING:root:Environment variable KMP_AFFINITY is empty. Use the default value granularity=fine,verbose,compact,1,0
/public/home/duanxiangmei/softwore/deepmd-kit/lib/python3.10/importlib/init.py:169: UserWarning: The NumPy module was reloaded (imported a second time). This can in some cases result in small but subtle issues and is discouraged.
_bootstrap._exec(spec, module)
/public/home/duanxiangmei/softwore/deepmd-kit/lib/python3.10/site-packages/deepmd/utils/compat.py:316: UserWarning: It seems that you are using a deepmd-kit input of version 1.x.x, which is deprecated. we have converted the input to >2.0.0 compatible
warnings.warn(msg)
Traceback (most recent call last):
File "/public/home/duanxiangmei/softwore/deepmd-kit/bin/dp", line 10, in
sys.exit(main())
File "/public/home/duanxiangmei/softwore/deepmd-kit/lib/python3.10/site-packages/deepmd/entrypoints/main.py", line 562, in main
train_dp(**dict_args)
File "/public/home/duanxiangmei/softwore/deepmd-kit/lib/python3.10/site-packages/deepmd/entrypoints/train.py", line 91, in train
jdata = normalize(jdata)
File "/public/home/duanxiangmei/softwore/deepmd-kit/lib/python3.10/site-packages/deepmd/utils/argcheck.py", line 782, in normalize
base.check_value(data, strict=True)
File "/public/home/duanxiangmei/softwore/deepmd-kit/lib/python3.10/site-packages/dargs/dargs.py", line 278, in check_value
self.traverse_value(argdict,
File "/public/home/duanxiangmei/softwore/deepmd-kit/lib/python3.10/site-packages/dargs/dargs.py", line 241, in traverse_value
self._traverse_sub(value,
File "/public/home/duanxiangmei/softwore/deepmd-kit/lib/python3.10/site-packages/dargs/dargs.py", line 260, in _traverse_sub
subarg.traverse(value,
File "/public/home/duanxiangmei/softwore/deepmd-kit/lib/python3.10/site-packages/dargs/dargs.py", line 228, in traverse
self.traverse_value(value,
File "/public/home/duanxiangmei/softwore/deepmd-kit/lib/python3.10/site-packages/dargs/dargs.py", line 241, in traverse_value
self._traverse_sub(value,
File "/public/home/duanxiangmei/softwore/deepmd-kit/lib/python3.10/site-packages/dargs/dargs.py", line 256, in _traverse_sub
sub_hook(self, value, path)
File "/public/home/duanxiangmei/softwore/deepmd-kit/lib/python3.10/site-packages/dargs/dargs.py", line 307, in _check_strict
raise ArgumentKeyError(path,
dargs.dargs.ArgumentKeyError: [at location
training
] undefined keystop_batch
is not allowed in strict mode"I hope you can give me some tips on what to do next, thank you.
Beta Was this translation helpful? Give feedback.
All reactions