Skip to content

Problem with inplace operation when training with Sketchy dataset #30

@minhkhoi1026

Description

@minhkhoi1026

Hi sir,

When searching for an interesting image retrieval idea, I meet your project. Your project was wonderful, I tried to test your model and it works like charm!

However, the problem came up when I try to train with the Sketchy dataset. Base on your instruction in README, I tried to train with the following command:
>>> python3 train.py --dataset Sketchy --dim-out 64 --semantic-models word2vec-google-news --epochs 1 --early-stop 10 --lr 0.0001

Here, I got a weird problem, it announce me with the below message:
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [288, 64]], which is output 0 of AsStridedBackward0, is at version 2; expected version 1 instead. Hint: the backtrace further above shows the operation that failed to compute its gradient

I tried to fix it myself, but it didn't work. Can you help me with this problem? Thank you so much!!!

My workspace is Colab, with Pytorch 1.10.0+cu111.

The detailed error message (with torch.autograd.set_detect_anomaly(True)):

Parameters:	Namespace(batch_size=128, dataset='Sketchy', dim_out=64, early_stop=10, epoch_size=100, epochs=1, filter_sketch=False, gamma=0.1, gzs_sbir=False, im_sz=224, lambda_disc_im=0.5, lambda_disc_se=0.25, lambda_disc_sk=0.5, lambda_gen_adv=1.0, lambda_gen_cls=1.0, lambda_gen_cyc=1.0, lambda_gen_reg=0.1, lambda_im=10.0, lambda_regular=0.001, lambda_se=10.0, lambda_sk=10.0, log_interval=1, lr=0.0001, milestones=[], momentum=0.9, ngpu=1, num_workers=4, number_qualit_results=200, save_best_results=False, save_image_results=False, semantic_models=['word2vec-google-news'], sk_sz=224, split_eccv_2018=False, test=False)
Checkpoint path: /content/drive/MyDrive/sem-pcyc/auxs/CheckPoints/Sketchy/word2vec-google-news/64
Logger path: /content/drive/MyDrive/sem-pcyc/auxs/LogFiles/Sketchy/word2vec-google-news/64
Result path: /content/drive/MyDrive/sem-pcyc/auxs/Results/Sketchy/word2vec-google-news/64
Loading data...Done
/usr/local/lib/python3.7/dist-packages/torch/utils/data/dataloader.py:481: UserWarning: This DataLoader will create 4 worker processes in total. Our suggested max number of worker in current system is 2, which is smaller than what this DataLoader is going to create. Please be aware that excessive worker creation might get DataLoader running slow or even freeze, lower the worker number to avoid potential slowness/freeze if necessary.
  cpuset_checked))
Initializing model variables...Done
Initializing trainable models...Done
Defining optimizers...Done
Defining losses...Done
Initializing variables...Done
Setting logger...Done
Checking cuda...*Cuda exists*...Done
***Train***
/usr/local/lib/python3.7/dist-packages/torch/optim/lr_scheduler.py:134: UserWarning: Detected call of `lr_scheduler.step()` before `optimizer.step()`. In PyTorch 1.1.0 and later, you should call them in the opposite order: `optimizer.step()` before `lr_scheduler.step()`.  Failure to do this will result in PyTorch skipping the first value of the learning rate schedule. See more details at https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate
  "https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate", UserWarning)
[W python_anomaly_mode.cpp:104] Warning: Error detected in AddmmBackward0. Traceback of forward call that caused the error:
  File "src/train.py", line 358, in <module>
    main()
  File "src/train.py", line 230, in main
    losses = train(train_loader, sem_pcyc_model, epoch, args)
  File "src/train.py", line 323, in train
    loss = sem_pcyc_model.optimize_params(sk, im, cl)
  File "/content/drive/My Drive/sem-pcyc/src/models.py", line 368, in optimize_params
    self.forward(sk, im, se)
  File "/content/drive/My Drive/sem-pcyc/src/models.py", line 259, in forward
    self.sk2se_em = self.gen_sk2se(self.sk_fe)
  File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "/content/drive/My Drive/sem-pcyc/src/models.py", line 64, in forward
    return self.gen(x)
  File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/container.py", line 141, in forward
    input = module(input)
  File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/linear.py", line 103, in forward
    return F.linear(input, self.weight, self.bias)
  File "/usr/local/lib/python3.7/dist-packages/torch/nn/functional.py", line 1848, in linear
    return torch._C._nn.linear(input, weight, bias)
 (function _print_stack)
Traceback (most recent call last):
  File "src/train.py", line 358, in <module>
    main()
  File "src/train.py", line 230, in main
    losses = train(train_loader, sem_pcyc_model, epoch, args)
  File "src/train.py", line 323, in train
    loss = sem_pcyc_model.optimize_params(sk, im, cl)
  File "/content/drive/My Drive/sem-pcyc/src/models.py", line 371, in optimize_params
    loss = self.backward(se, num_cls)
  File "/content/drive/My Drive/sem-pcyc/src/models.py", line 325, in backward
    loss_disc_se.backward(retain_graph=True)
  File "/usr/local/lib/python3.7/dist-packages/torch/_tensor.py", line 307, in backward
    torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
  File "/usr/local/lib/python3.7/dist-packages/torch/autograd/__init__.py", line 156, in backward
    allow_unreachable=True, accumulate_grad=True)  # allow_unreachable flag
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [288, 64]], which is output 0 of AsStridedBackward0, is at version 2; expected version 1 instead. Hint: the backtrace further above shows the operation that failed to compute its gradient. The variable in question was changed in there or anywhere later. Good luck!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions