You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
where stateless_update is a complex jitted function. I try to use profile trace to understand the most time-consuming part of the code.
However, it can be seen that there are many small gaps between GPU kernels in the GPU stream, and the sum of them (~4.7ms) is, of course, much smaller than the total function calling time (~11.5ms). Since I expect that most computation should take place on GPU, what am I missing here? Am I misinterpreting the results?
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
I collected profile trace with the following snippet
where
stateless_update
is a complex jitted function. I try to use profile trace to understand the most time-consuming part of the code.However, it can be seen that there are many small gaps between GPU kernels in the GPU stream, and the sum of them (~4.7ms) is, of course, much smaller than the total function calling time (~11.5ms). Since I expect that most computation should take place on GPU, what am I missing here? Am I misinterpreting the results?
Beta Was this translation helpful? Give feedback.
All reactions