Training Error

When I followed the GETTING_STARTED.md to finetune the VQModel on Windows, I found the following error:

Traceback (most recent call last):
  File "E:\codehub\LlamaGen\tokenizer\tokenizer_image\vq_train.py", line 329, in <module>
    main(args)
  File "E:\codehub\LlamaGen\tokenizer\tokenizer_image\vq_train.py", line 205, in main
    recons_imgs, codebook_loss = vq_model(imgs)
  File "D:\Webdownload\envs\StableDiff\lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "D:\Webdownload\envs\StableDiff\lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "D:\Webdownload\envs\StableDiff\lib\site-packages\torch\nn\parallel\distributed.py", line 1515, in forward
    inputs, kwargs = self._pre_forward(*inputs, **kwargs)
  File "D:\Webdownload\envs\StableDiff\lib\site-packages\torch\nn\parallel\distributed.py", line 1416, in _pre_forward
    self._sync_buffers()
  File "D:\Webdownload\envs\StableDiff\lib\site-packages\torch\nn\parallel\distributed.py", line 2041, in _sync_buffers
    self._sync_module_buffers(authoritative_rank)
  File "D:\Webdownload\envs\StableDiff\lib\site-packages\torch\nn\parallel\distributed.py", line 2045, in _sync_module_buffers
    self._default_broadcast_coalesced(authoritative_rank=authoritative_rank)
  File "D:\Webdownload\envs\StableDiff\lib\site-packages\torch\nn\parallel\distributed.py", line 2066, in _default_broadcast_coalesced
    self._distributed_broadcast_coalesced(bufs, bucket_size, authoritative_rank)
  File "D:\Webdownload\envs\StableDiff\lib\site-packages\torch\nn\parallel\distributed.py", line 1982, in _distributed_broadcast_coalesced
    dist._broadcast_coalesced(
RuntimeError: A view was created in no_grad mode and is being modified inplace with grad mode enabled. Given that this use case is ambiguous and error-prone, it is forbidden. 
You can clarify your code by moving both the view and the inplace either both inside the no_grad block (if you don't want the inplace to be tracked) or both outside (if you want the inplace to be tracked).

The PyTorch version I used was 2.1.0+cu118. However, I could not figure out what happen and how to solve this issue? Did anyone find the same issue?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Training Error #94

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Training Error #94

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions