Tuning CPU training performance #2210
Unanswered
hacksparr0w
asked this question in
Q&A
Replies: 1 comment
-
You can use |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Hello,
I'm trying to utilize a 36-CPU node to train my DeepMD model, but the performance gains I'm getting compared to a 12-CPU node are quite poor, in my opinion.
I'm using the DeepMD-kit Docker image
deepmodeling/deepmd-kit:2.1.1_cpu
to train my model on both machines.When running the training process on the 12-CPU node, I'm using the following command:
Using this setup, I can get the following timings:
When running the training process on the 36-CPU node, I'm using the the following command:
The best training times I'm able to get with this setup are around
33.14 s
.I've followed this document while trying to optimize the
OMP_NUM_THREADS
,TF_INTRA_OP_PARALLELISM_THREADS
andTF_INTER_OP_PARALLELISM_THREADS
parameters, but have also done some empirical measurements to arrive at the exact numbers I'm currently using. I've also tried using horovod, but haven't had much luck with it (as far as I understand, horovod is meant to be used in multi-node setups).I'd like to ask whether I have some more options to try when tuning the performance. I hoped for much lower training times on the 36-CPU node, but perhaps my expectations for this were inaccurate?
The following is the output of
lscpu
on the 36-CPU node:This is the output of
lscpu
on the 12-CPU node:I'm also including the configuration of my model:
Any comments would be highly appreciated.
Beta Was this translation helpful? Give feedback.
All reactions