Skip to content

Training Error #94

@Arknightpzb

Description

@Arknightpzb

When I followed the GETTING_STARTED.md to finetune the VQModel on Windows, I found the following error:

Traceback (most recent call last):
File "E:\codehub\LlamaGen\tokenizer\tokenizer_image\vq_train.py", line 329, in
main(args)
File "E:\codehub\LlamaGen\tokenizer\tokenizer_image\vq_train.py", line 205, in main
recons_imgs, codebook_loss = vq_model(imgs)
File "D:\Webdownload\envs\StableDiff\lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "D:\Webdownload\envs\StableDiff\lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "D:\Webdownload\envs\StableDiff\lib\site-packages\torch\nn\parallel\distributed.py", line 1515, in forward
inputs, kwargs = self._pre_forward(*inputs, **kwargs)
File "D:\Webdownload\envs\StableDiff\lib\site-packages\torch\nn\parallel\distributed.py", line 1416, in _pre_forward
self._sync_buffers()
File "D:\Webdownload\envs\StableDiff\lib\site-packages\torch\nn\parallel\distributed.py", line 2041, in _sync_buffers
self._sync_module_buffers(authoritative_rank)
File "D:\Webdownload\envs\StableDiff\lib\site-packages\torch\nn\parallel\distributed.py", line 2045, in _sync_module_buffers
self._default_broadcast_coalesced(authoritative_rank=authoritative_rank)
File "D:\Webdownload\envs\StableDiff\lib\site-packages\torch\nn\parallel\distributed.py", line 2066, in _default_broadcast_coalesced
self._distributed_broadcast_coalesced(bufs, bucket_size, authoritative_rank)
File "D:\Webdownload\envs\StableDiff\lib\site-packages\torch\nn\parallel\distributed.py", line 1982, in _distributed_broadcast_coalesced
dist._broadcast_coalesced(
RuntimeError: A view was created in no_grad mode and is being modified inplace with grad mode enabled. Given that this use case is ambiguous and error-prone, it is forbidden.
You can clarify your code by moving both the view and the inplace either both inside the no_grad block (if you don't want the inplace to be tracked) or both outside (if you want the inplace to be tracked).

The PyTorch version I used was 2.1.0+cu118. However, I could not figure out what happen and how to solve this issue? Did anyone find the same issue?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions