How to delete all the gradients in between operations #13728

malfonsoarquimea · 2022-07-19T08:43:29Z

malfonsoarquimea
Jul 19, 2022

Hi! I am currently working on a project where, for a given trained model, I perform inference batches of inputs and compute (and store) the gradients of the output with respect to the inputs.
My code is something like this

def test_step(self, batch,batch_idx):
   with torch.set_grad_enabled(True):
      gradients_list=[]
      for batch_of_inputs in batches:
         batch_of_inputs.requires_grad_()
         output=self(batch_of_inputs)
         gradients = torch.autograd.grad(
            outputs=output,
            inputs=batch_of_inputs,
            grad_outputs=torch.ones_like(output),
            retain_graph=False,
           )
         gradients_list.append(gradients.detach_())

The thing is that the used memory increases and increases until OOM error rises. I have tried to use
del gradients,batch_of_inputs,output
and the problem persisted.
What would you suggest?
Thanks in advance

Answered by rohitgr7

Jul 19, 2022

The thing is that the used memory increases and increases until OOM error rises

if the OOM error is on GPU, you can move your gradients from GPU to CPU while storing them to release some GPU memory.

View full answer

rohitgr7 · 2022-07-19T10:49:27Z

rohitgr7
Jul 19, 2022

The thing is that the used memory increases and increases until OOM error rises

if the OOM error is on GPU, you can move your gradients from GPU to CPU while storing them to release some GPU memory.

8 replies

malfonsoarquimea Jul 19, 2022
Author

First of all, you are right, it was a mistake and it is actually y output=self(batch_of_inputs) because the model is the actual lightningmodule.
On the other hand, I already tried model.zero_grad(set_to_none=True) or, in this case, self.zero_grad(set_to_none=True) but it is not working, as the gradients that I am storing are gradients of the inputs with respect to the outputs, so the parameters of the model do not require ore have any gradients (unless I am understanding it all wrong)

rohitgr7 Jul 19, 2022

you have set with torch.set_grad_enabled(True) which might compute gradients for model weights since you are computing gradients for input wrt the output.

but if it's not working, then something else is causing the memory issues.

malfonsoarquimea Jul 20, 2022
Author

Hi! I finally managed to make it work by adding not only model.zero_grad(set_to_none=True) but also model.requires_grad_(False) at the end of each iteration. To reduce vram usage, I also move the gradients_list to CPU

rohitgr7 Jul 20, 2022

cool, but it's still confusing why memory keeps increasing rather than being constant across testing.
do you see this behavior after model.requires_grad_(False)?

malfonsoarquimea Jul 20, 2022
Author

after model.requires_grad_(False), model.zero_grad(set_to_none=True) and moving the output to the CPU as it is being generated, the GPU memory usage remains constant

How to delete all the gradients in between operations #13728

Uh oh!

Uh oh!

malfonsoarquimea Jul 19, 2022

Replies: 1 comment · 8 replies

Uh oh!

rohitgr7 Jul 19, 2022

Uh oh!

Uh oh!

malfonsoarquimea Jul 19, 2022 Author

Uh oh!

rohitgr7 Jul 19, 2022

Uh oh!

Uh oh!

malfonsoarquimea Jul 20, 2022 Author

Uh oh!

rohitgr7 Jul 20, 2022

Uh oh!

malfonsoarquimea Jul 20, 2022 Author

malfonsoarquimea
Jul 19, 2022

Replies: 1 comment 8 replies

rohitgr7
Jul 19, 2022

malfonsoarquimea Jul 19, 2022
Author

malfonsoarquimea Jul 20, 2022
Author

malfonsoarquimea Jul 20, 2022
Author