Low CPU usage while training #1207
-
When I was using DP 1.3.3 version, "dp" process was using all CPUs for training. (Maybe OpenMP?) However, dp v2.0.2 uses only 1 CPU resource and it takes half an hour to start training. (Also when restarts from a ckpt.) I set environment variables like below. 3 [2021-10-12 22:37:49,429] DEEPMD INFO deepmd.entrypoints.train _____ _____ __ __ _____ _ _ _ |
Beta Was this translation helpful? Give feedback.
Replies: 4 comments 3 replies
-
The log shows you are using a GPU instead |
Beta Was this translation helpful? Give feedback.
-
Do you mean that inactivate OpenMP and half an hour pending before training is perfectly normal?
But it isn't in v1.3.3 (I'm still using this version of DP too).
Below is the log of v1.3.3, and it uses all CPUs (6400%) and training starts without 1 second of pending.
All training parameters are set identical.
1 # DEEPMD: _____ _____ __ __ _____ _
_ _
2 # DEEPMD: | __ \ | __ \ | \/ || __ \ | |
(_)| |
3 # DEEPMD: | | | | ___ ___ | |__) || \ / || | | | ______ | | __
_ | |_
4 # DEEPMD: | | | | / _ \ / _ \| ___/ | |\/| || | | ||______|| |/ /|
|| __|
5 # DEEPMD: | |__| || __/| __/| | | | | || |__| | | < |
|| |_
6 # DEEPMD: |_____/ \___| \___||_| |_| |_||_____/
|_|\_\|_| \__|
7 # DEEPMD:
8 # DEEPMD: Please read and cite:
9 # DEEPMD: Wang, Zhang, Han and E, Comput.Phys.Comm. 228, 178-184
(2018)
10 # DEEPMD:
11 # DEEPMD: ---Summary of the
training---------------------------------------
12 # DEEPMD: installed to:
/home/grad/ssrokyz/src/deepmd-kit/_skbuild/linux-x86_64-3.8/cmake-install
13 # DEEPMD: source : v1.3.3
14 # DEEPMD: source brach: HEAD
15 # DEEPMD: source commit: 3a59596
16 # DEEPMD: source commit at: 2021-03-20 00:53:44 +0800
17 # DEEPMD: build float prec: double
18 # DEEPMD: build with tf inc:
/home/grad/ssrokyz/.conda/envs/dp2/lib/python3.8/site-packages/tensorflow/include;/home/grad/ssrokyz/.conda/envs/dp2/lib/python3.8/site-package
s/tensorflow/include
19 # DEEPMD: build with tf lib:
20 # DEEPMD: running on: pcs_gpu9
21 # DEEPMD: CUDA_VISIBLE_DEVICES: [0]
22 # DEEPMD: num_intra_threads: 64
23 # DEEPMD: num_inter_threads: 1
24 # DEEPMD:
-----------------------------------------------------------------
|
Beta Was this translation helpful? Give feedback.
-
The operators on CPUs are all moved into GPUs now if you have a GPU. |
Beta Was this translation helpful? Give feedback.
-
That totally makes sense.
Sincerely thank you for your answer!
|
Beta Was this translation helpful? Give feedback.
The operators on CPUs are all moved into GPUs now if you have a GPU.
The new version also adds an extra step to check if
sel
is enough, and this may pend for some time if you have a lot of data... We may consider adding an option to skip this step in the following version.