Error at the first step when run MD simulation using i-pi with DeePMD model #2655
Unanswered
plumbum082
asked this question in
Q&A
Replies: 1 comment
-
In your code, you initialize |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Hello developers,
I have a problem when I run MD simulations using i-pi with my DeePMD model, it gives an error at the first step, but when I do single point calculation with the same code for DeepPot, there is no problem. I have confirmed that the shape of the data recieved by DPDriver.grad is correct and checked that using this data as input of single point calculation the result is also correct. I can not understand why it can not run. And I have checked using 'nvidia-smi' that my driver is compatible with cudatoolkit 11.3.1. I put the client_DP.py and error file below. Could you help me with it? Thank you very much!
this is my client_DP.py,
import os
import sys
import driver
import numpy as np
from deepmd.infer import DeepPot
class DPDriver(driver.BaseDriver):
def init(self, addr, port, socktype):
driver.BaseDriver.init(self, port, addr, socktype)
return
if name == 'main':
addr = sys.argv[1]
port = int(sys.argv[2])
socktype = sys.argv[3]
driver_dp = DPDriver(addr, port, socktype)
while True:
driver_dp.parse()
the bug says,
2023-07-04 12:01:51.234067: W tensorflow/core/framework/op_kernel.cc:1830] OP_REQUIRES failed at custom_op.cc:15 : INTERNAL: Operation received an exception: DeePMD-kit Error: CUDA Assert, in file /project/source/op/custom_op.cc:17
2023-07-04 12:01:51.234106: I tensorflow/core/common_runtime/executor.cc:1197] [/job:localhost/replica:0/task:0/device:GPU:0] (DEBUG INFO) Executor start aborting (this does not indicate an error and you can ignore this message): INTERNAL: Operation received an exception: DeePMD-kit Error: CUDA Assert, in file /project/source/op/custom_op.cc:17
[[{{node load/ProdEnvMatA}}]]
2023-07-04 12:01:51.234125: I tensorflow/core/common_runtime/executor.cc:1197] [/job:localhost/replica:0/task:0/device:CPU:0] (DEBUG INFO) Executor start aborting (this does not indicate an error and you can ignore this message): INTERNAL: Operation received an exception: DeePMD-kit Error: CUDA Assert, in file /project/source/op/custom_op.cc:17
[[{{node load/ProdEnvMatA}}]]
[[load/o_virial/_27]]
Traceback (most recent call last):
File "/opt/mamba/lib/python3.10/site-packages/tensorflow/python/client/session.py", line 1378, in _do_call
return fn(*args)
File "/opt/mamba/lib/python3.10/site-packages/tensorflow/python/client/session.py", line 1361, in _run_fn
return self._call_tf_sessionrun(options, feed_dict, fetch_list,
File "/opt/mamba/lib/python3.10/site-packages/tensorflow/python/client/session.py", line 1454, in _call_tf_sessionrun
return tf_session.TF_SessionRun_wrapper(self._session, options, feed_dict,
tensorflow.python.framework.errors_impl.InternalError: 2 root error(s) found.
(0) INTERNAL: Operation received an exception: DeePMD-kit Error: CUDA Assert, in file /project/source/op/custom_op.cc:17
[[{{node load/ProdEnvMatA}}]]
[[load/o_virial/_27]]
(1) INTERNAL: Operation received an exception: DeePMD-kit Error: CUDA Assert, in file /project/source/op/custom_op.cc:17
[[{{node load/ProdEnvMatA}}]]
0 successful operations.
0 derived errors ignored.
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/input_lbg-12765-7738088/client_DP.py", line 58, in
driver_dp.parse()
File "/home/input_lbg-12765-7738088/driver.py", line 195, in parse
self.posdata()
File "/home/input_lbg-12765-7738088/driver.py", line 143, in posdata
energy, grad, virial = self.grad(self.crd, self.cell)
File "/home/input_lbg-12765-7738088/client_DP.py", line 34, in grad
energy, grad, virial = dp.eval(coord, box, atype)
File "/opt/mamba/lib/python3.10/site-packages/deepmd/infer/deep_pot.py", line 373, in eval
output = self._eval_func(self._eval_inner, numb_test, natoms)(
File "/opt/mamba/lib/python3.10/site-packages/deepmd/infer/deep_pot.py", line 288, in eval_func
return self.auto_batch_size.execute_all(
File "/opt/mamba/lib/python3.10/site-packages/deepmd/utils/batch_size.py", line 191, in execute_all
n_batch, result = self.execute(execute_with_batch_size, index, natoms)
File "/opt/mamba/lib/python3.10/site-packages/deepmd/utils/batch_size.py", line 103, in execute
n_batch, result = callable(
File "/opt/mamba/lib/python3.10/site-packages/deepmd/utils/batch_size.py", line 169, in execute_with_batch_size
return (end_index - start_index), callable(
File "/opt/mamba/lib/python3.10/site-packages/deepmd/infer/deep_pot.py", line 526, in _eval_inner
v_out = run_sess(self.sess, t_out, feed_dict=feed_dict_test)
File "/opt/mamba/lib/python3.10/site-packages/deepmd/utils/sess.py", line 30, in run_sess
return sess.run(*args, **kwargs)
File "/opt/mamba/lib/python3.10/site-packages/tensorflow/python/client/session.py", line 968, in run
result = self._run(None, fetches, feed_dict, options_ptr,
File "/opt/mamba/lib/python3.10/site-packages/tensorflow/python/client/session.py", line 1191, in _run
results = self._do_run(handle, final_targets, final_fetches,
File "/opt/mamba/lib/python3.10/site-packages/tensorflow/python/client/session.py", line 1371, in _do_run
return self._do_call(_run_fn, feeds, fetches, targets, options,
File "/opt/mamba/lib/python3.10/site-packages/tensorflow/python/client/session.py", line 1397, in _do_call
raise type(e)(node_def, op, message) # pylint: disable=no-value-for-parameter
tensorflow.python.framework.errors_impl.InternalError: Graph execution error:
Detected at node 'load/ProdEnvMatA' defined at (most recent call last):
Node: 'load/ProdEnvMatA'
Detected at node 'load/ProdEnvMatA' defined at (most recent call last):
Node: 'load/ProdEnvMatA'
2 root error(s) found.
(0) INTERNAL: Operation received an exception: DeePMD-kit Error: CUDA Assert, in file /project/source/op/custom_op.cc:17
[[{{node load/ProdEnvMatA}}]]
[[load/o_virial/_27]]
(1) INTERNAL: Operation received an exception: DeePMD-kit Error: CUDA Assert, in file /project/source/op/custom_op.cc:17
[[{{node load/ProdEnvMatA}}]]
0 successful operations.
0 derived errors ignored.
Original stack trace for 'load/ProdEnvMatA':
2023-07-04 12:02:06.641526: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
WARNING:tensorflow:From /opt/mamba/lib/python3.10/site-packages/tensorflow/python/compat/v2_compat.py:107: disable_resource_variables (from tensorflow.python.ops.variable_scope) is deprecated and will be removed in a future version.
Instructions for updating:
non-resource variables are not supported in the long term
WARNING:root:To get the best performance, it is recommended to adjust the number of threads by setting the environment variables OMP_NUM_THREADS, TF_INTRA_OP_PARALLELISM_THREADS, and TF_INTER_OP_PARALLELISM_THREADS. See https://deepmd.rtfd.io/parallelism/ for more information.
WARNING:root:Environment variable KMP_BLOCKTIME is empty. Use the default value 0
WARNING:root:Environment variable KMP_AFFINITY is empty. Use the default value granularity=fine,verbose,compact,1,0
/opt/mamba/lib/python3.10/importlib/init.py:169: UserWarning: The NumPy module was reloaded (imported a second time). This can in some cases result in small but subtle issues and is discouraged.
_bootstrap._exec(spec, module)
Beta Was this translation helpful? Give feedback.
All reactions