Skip to content

Conversation

daniel-de-leon-user293
Copy link
Owner

@daniel-de-leon-user293 daniel-de-leon-user293 commented Dec 5, 2024

Summary

The purpose of this PR is to demonstrate the spikes in compute time when calling torch.Tensor.grad on Gaudi. The plot below show these spikes on a full completion of this benchmark:
image

This benchmark uses GradCAM to explain ResNet50 image classifications on the Intel Image Classification dataset. I've isolated this spike in compute time down to the input_img.grad execution in guided_backprop.py:104. The screen shot below shows the output for 3 individual examples collected in the middle of the benchmark. Gaudi metrics were collected for each execution of input_img.grad. The first and third examples show regular compute times with no spike. The second example shows a spike. Highlighted in blue is the graph compilation metrics where it took 1.341 seconds to compile a graph during the gradient calculation. Notice in the other two examples that graphs were not compiled and thus had an autograd time of only .0044 seconds.

image

What could be causing this sporadic re-compiling of a graph and is there a way to reduce or avoid it from happening?

Steps to reproduce

  1. Clone this branch.
git clone -b daniel/ht_core https://github.com/daniel-de-leon-user293/pytorch-grad-cam.git
  1. Download the Intel Image Classification dataset from Kaggle (https://www.kaggle.com/datasets/puneet6060/intel-image-classification) and save in the root of pytorch-grad-cam. We will only be using seg-pred for this benchmark.
  2. Run an interactive Gaudi container.
cd pytorch-grad-cam

docker run -d --rm -it --name gradcam-bm \
-v ${PWD}:/workdir \
--runtime=habana \
-e http_proxy=$http_proxy \
-e https_proxy=$https_proxy \
-e HABANA_VISIBLE_DEVICES=all \
-e OMPI_MCA_btl_vader_single_copy_mechanism=none \
--cap-add=sys_nice \
--net=host \
--ipc=host \
vault.habana.ai/gaudi-docker/1.16.2/ubuntu22.04/habanalabs/pytorch-installer-2.2.2:latest
  1. Exec into container.
docker exec -it gradcam-bm bash
  1. Navigate repo and run benchmark. cam_multiple_images.py is a modification of cam.py to include multiple GradCAM calls on a Gaudi machine.
cd workdir
pip install -e .
python cam_multiple_images.py --image-path intel_image_classification_dataset/seg_pred/seg_pred/ --device hpu

The first minute of examples take longer than usual because of warmup but you can still begin to see the spike examples similar to what is shown in the screen shot.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant