How to improve the performance of multi-gpu #1690
Replies: 4 comments 2 replies
-
It is similar in my case. The GPU power usage is only 40-50%. When the power usage is low, there is some room for optimization as mentioned in this post: The optimization may take months, I think |
Beta Was this translation helpful? Give feedback.
-
For the usage of multi GPUs, refer to https://docs.deepmodeling.com/projects/deepmd/en/master/train/parallel-training.html. |
Beta Was this translation helpful? Give feedback.
-
"it cost about 10s per 100 steps" What's the type of your GPU cards ? Also, what's the training parameters? It seems the speed is slower than the expectation. |
Beta Was this translation helpful? Give feedback.
-
Thanks to all of you, I think the slow speed might be because the GPU cards is so outdated, which I used four 1080Ti cards for trainning. When I perform it on the Tesla v100 it only takes about 2 seconds to train 100 steps. Thanks again for your concern. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
deepmd-kit version 2.1.1
First of all, Thank you for your concern of my question. I have 4 GPU cards in one node, and 2 cpus with 12 cores for each one. I want to use this respectable code to train a model but I don't think my computing resources are being fully utilized, the usage of my computer is low and it cost about 10s per 100 steps, I think this is very inefficient and unacceptable.

I have tried to modify the bach_size to change the utilization rate, but an error emerged:
"batch_size" is not defined in the strict model
So I have to annotate it out to keep the code running.
I have also set the environment vaiables as follow:
export OMP_NUM_THREADS=24; export TF_INTRA_OP_PARALLELISM_THREADS=12; export TF_INTER_OP_PARALLELISM_THREADS=2; export CUDA_VISIBLE_DEVICES=0,1,2,3
So my question is how can i improve the performance of GPU cards based on my existing computer configuration . Thank you for your reply.
Beta Was this translation helpful? Give feedback.
All reactions