torch.Tensor.grad spike in compute time on Gaudi #1
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Summary
The purpose of this PR is to demonstrate the spikes in compute time when calling

torch.Tensor.grad
on Gaudi. The plot below show these spikes on a full completion of this benchmark:This benchmark uses GradCAM to explain ResNet50 image classifications on the Intel Image Classification dataset. I've isolated this spike in compute time down to the
input_img.grad
execution inguided_backprop.py:104
. The screen shot below shows the output for 3 individual examples collected in the middle of the benchmark. Gaudi metrics were collected for each execution ofinput_img.grad
. The first and third examples show regular compute times with no spike. The second example shows a spike. Highlighted in blue is the graph compilation metrics where it took 1.341 seconds to compile a graph during the gradient calculation. Notice in the other two examples that graphs were not compiled and thus had an autograd time of only .0044 seconds.What could be causing this sporadic re-compiling of a graph and is there a way to reduce or avoid it from happening?
Steps to reproduce
pytorch-grad-cam
. We will only be usingseg-pred
for this benchmark.cam_multiple_images.py
is a modification ofcam.py
to include multiple GradCAM calls on a Gaudi machine.The first minute of examples take longer than usual because of warmup but you can still begin to see the spike examples similar to what is shown in the screen shot.