An error occurred while running task07 #785
-
err.txt In the process of using dp-gen to iterate a system, I set up 8 MD processes. After submitting the task using dpgen run param.json machine.json, the program will report an error when it goes to the task07 part, sometimes it is in iter0, sometimes it is iter1, sometimes iter2. The corresponding input file and err file are attached. I checked the gasussian part, there is a task that is not finished. But I don't know the reason, I hope you can help me check it. INFO:dpgen:-------------------------iter.000000 task 07-------------------------- The above exception was the direct cause of the following exception: Traceback (most recent call last): |
Beta Was this translation helpful? Give feedback.
Replies: 5 comments 12 replies
-
There's something wrong with dpdispatcher. |
Beta Was this translation helpful? Give feedback.
-
Relevant documents such as attachments。 |
Beta Was this translation helpful? Give feedback.
-
More relevant running results I have uploaded to Baidu network disk. |
Beta Was this translation helpful? Give feedback.
-
I checked your log file, it shows that in task07, 8 model_devi_jobs are running parallelly, while 2 of them failed and were resubmitted at the same time. Since they were from the same directory, their submission_hash were same. Thus the error was raised. It is a bug in dpdispatcher and the contributors have already noticed that. |
Beta Was this translation helpful? Give feedback.
-
Hi, I got into the same problem like yours. Could you explain more about how you solve this problem? I am little bit confused about https://docs.deepmodeling.com/projects/dpdispatcher/en/latest/examples/g16.html. It seems like I should add some new command into machine.json. But where should it be? I don't have "task" class in machine.json. I have tried some ways but it's still not working, so maybe could you share some experience with me. thanks. |
Beta Was this translation helpful? Give feedback.
I checked your log file, it shows that in task07, 8 model_devi_jobs are running parallelly, while 2 of them failed and were resubmitted at the same time. Since they were from the same directory, their submission_hash were same. Thus the error was raised. It is a bug in dpdispatcher and the contributors have already noticed that.