ProdEnvMatA
#1714
Replies: 2 comments 1 reply
-
@denghuilu Please take a look a the question. Thanks! |
Beta Was this translation helpful? Give feedback.
0 replies
-
This Op computes the environmental matrix. See eq. 5 in this page. I strongly agree that it can be further optimized. |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Good Afternoon,
I hope you are doing well.

I noticed something interesting in the Tensorboard trace using se_e2_a as the descriptor. After tuning OMP, TF_Intra_OP, and TF_Inter_OP Parallelism a remaining bottleneck is ProdEnvMatA. Attached please find the respective tensorboard trace.
This was run on a CPU node with 8 workers and a local batch size of 16. As one can see, all tf_computes are held up initially until ProdEnvMatA is complete. May I please ask 2 questions regarding this op:
Thank you very much!
Beta Was this translation helpful? Give feedback.
All reactions