torch.AcceleratorError: CUDA error: invalid resource handle




... loading model from checkpoints/MASt3R_ViTLarge_BaseDecoder_512_catmlpdpt_metric.pth
2.8.0+cu128
True
12.8
Warning, cannot find cuda-compiled version of RoPE2D, using a slow pytorch version instead
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/home/ansel/anaconda3/envs/mast3r-slam/lib/python3.11/multiprocessing/spawn.py", line 122, in spawn_main
    exitcode = _main(fd, parent_sentinel)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ansel/anaconda3/envs/mast3r-slam/lib/python3.11/multiprocessing/spawn.py", line 132, in _main
    self = reduction.pickle.load(from_parent)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ansel/anaconda3/envs/mast3r-slam/lib/python3.11/site-packages/torch/multiprocessing/reductions.py", line 181, in rebuild_cuda_tensor
    storage = storage_cls._new_shared_cuda(
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ansel/anaconda3/envs/mast3r-slam/lib/python3.11/site-packages/torch/storage.py", line 1457, in _new_shared_cuda
    return torch.UntypedStorage._new_shared_cuda(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
torch.AcceleratorError: CUDA error: invalid resource handle
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

instantiating : AsymmetricMASt3R(enc_depth=24, dec_depth=12, enc_embed_dim=1024, dec_embed_dim=768, enc_num_heads=16, dec_num_heads=12, pos_embed='RoPE100',img_size=(512, 512), head_type='catmlp+dpt', output_mode='pts3d+desc24', depth_mode=('exp', -inf, inf), conf_mode=('exp', 1, inf), patch_embed_cls='PatchEmbedDust3R', two_confs=True, desc_conf_mode=('exp', 0, inf), landscape_only=False)
<All keys matched successfully>
2.8.0+cu128
True
12.8
Warning, cannot find cuda-compiled version of RoPE2D, using a slow pytorch version instead
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/home/ansel/anaconda3/envs/mast3r-slam/lib/python3.11/multiprocessing/spawn.py", line 122, in spawn_main
    exitcode = _main(fd, parent_sentinel)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ansel/anaconda3/envs/mast3r-slam/lib/python3.11/multiprocessing/spawn.py", line 132, in _main
    self = reduction.pickle.load(from_parent)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ansel/anaconda3/envs/mast3r-slam/lib/python3.11/site-packages/torch/multiprocessing/reductions.py", line 181, in rebuild_cuda_tensor
    storage = storage_cls._new_shared_cuda(
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ansel/anaconda3/envs/mast3r-slam/lib/python3.11/site-packages/torch/storage.py", line 1457, in _new_shared_cuda
    return torch.UntypedStorage._new_shared_cuda(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
torch.AcceleratorError: CUDA error: invalid resource handle
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

^CTraceback (most recent call last):
  File "/home/ansel/works/MASt3R-SLAM/main.py", line 229, in <module>
    backend.start()
  File "/home/ansel/anaconda3/envs/mast3r-slam/lib/python3.11/multiprocessing/process.py", line 121, in start
    self._popen = self._Popen(self)
                  ^^^^^^^^^^^^^^^^^
  File "/home/ansel/anaconda3/envs/mast3r-slam/lib/python3.11/multiprocessing/context.py", line 224, in _Popen
    return _default_context.get_context().Process._Popen(process_obj)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ansel/anaconda3/envs/mast3r-slam/lib/python3.11/multiprocessing/context.py", line 288, in _Popen
    return Popen(process_obj)
           ^^^^^^^^^^^^^^^^^^
  File "/home/ansel/anaconda3/envs/mast3r-slam/lib/python3.11/multiprocessing/popen_spawn_posix.py", line 32, in __init__
    super().__init__(process_obj)
  File "/home/ansel/anaconda3/envs/mast3r-slam/lib/python3.11/multiprocessing/popen_fork.py", line 19, in __init__
    self._launch(process_obj)
  File "/home/ansel/anaconda3/envs/mast3r-slam/lib/python3.11/multiprocessing/popen_spawn_posix.py", line 62, in _launch
    f.write(fp.getbuffer())
KeyboardInterrupt
[W1030 14:08:55.330710990 CudaIPCTypes.cpp:100] Producer process tried to deallocate over 1000 memory blocks referred by consumer processes. Deallocation might be significantly slowed down. We assume it will never going to be the case, but if it is, please file but to https://github.com/pytorch/pytorch

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

torch.AcceleratorError: CUDA error: invalid resource handle #125

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

torch.AcceleratorError: CUDA error: invalid resource handle #125

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions