Skip to content

torch.AcceleratorError: CUDA error: invalid resource handle #125

@IgnorAnsel

Description

@IgnorAnsel

... loading model from checkpoints/MASt3R_ViTLarge_BaseDecoder_512_catmlpdpt_metric.pth
2.8.0+cu128
True
12.8
Warning, cannot find cuda-compiled version of RoPE2D, using a slow pytorch version instead
Traceback (most recent call last):
File "", line 1, in
File "/home/ansel/anaconda3/envs/mast3r-slam/lib/python3.11/multiprocessing/spawn.py", line 122, in spawn_main
exitcode = _main(fd, parent_sentinel)
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/ansel/anaconda3/envs/mast3r-slam/lib/python3.11/multiprocessing/spawn.py", line 132, in _main
self = reduction.pickle.load(from_parent)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/ansel/anaconda3/envs/mast3r-slam/lib/python3.11/site-packages/torch/multiprocessing/reductions.py", line 181, in rebuild_cuda_tensor
storage = storage_cls._new_shared_cuda(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/ansel/anaconda3/envs/mast3r-slam/lib/python3.11/site-packages/torch/storage.py", line 1457, in _new_shared_cuda
return torch.UntypedStorage._new_shared_cuda(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
torch.AcceleratorError: CUDA error: invalid resource handle
Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.

instantiating : AsymmetricMASt3R(enc_depth=24, dec_depth=12, enc_embed_dim=1024, dec_embed_dim=768, enc_num_heads=16, dec_num_heads=12, pos_embed='RoPE100',img_size=(512, 512), head_type='catmlp+dpt', output_mode='pts3d+desc24', depth_mode=('exp', -inf, inf), conf_mode=('exp', 1, inf), patch_embed_cls='PatchEmbedDust3R', two_confs=True, desc_conf_mode=('exp', 0, inf), landscape_only=False)

2.8.0+cu128
True
12.8
Warning, cannot find cuda-compiled version of RoPE2D, using a slow pytorch version instead
Traceback (most recent call last):
File "", line 1, in
File "/home/ansel/anaconda3/envs/mast3r-slam/lib/python3.11/multiprocessing/spawn.py", line 122, in spawn_main
exitcode = _main(fd, parent_sentinel)
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/ansel/anaconda3/envs/mast3r-slam/lib/python3.11/multiprocessing/spawn.py", line 132, in _main
self = reduction.pickle.load(from_parent)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/ansel/anaconda3/envs/mast3r-slam/lib/python3.11/site-packages/torch/multiprocessing/reductions.py", line 181, in rebuild_cuda_tensor
storage = storage_cls._new_shared_cuda(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/ansel/anaconda3/envs/mast3r-slam/lib/python3.11/site-packages/torch/storage.py", line 1457, in _new_shared_cuda
return torch.UntypedStorage._new_shared_cuda(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
torch.AcceleratorError: CUDA error: invalid resource handle
Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.

^CTraceback (most recent call last):
File "/home/ansel/works/MASt3R-SLAM/main.py", line 229, in
backend.start()
File "/home/ansel/anaconda3/envs/mast3r-slam/lib/python3.11/multiprocessing/process.py", line 121, in start
self._popen = self._Popen(self)
^^^^^^^^^^^^^^^^^
File "/home/ansel/anaconda3/envs/mast3r-slam/lib/python3.11/multiprocessing/context.py", line 224, in _Popen
return _default_context.get_context().Process._Popen(process_obj)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/ansel/anaconda3/envs/mast3r-slam/lib/python3.11/multiprocessing/context.py", line 288, in _Popen
return Popen(process_obj)
^^^^^^^^^^^^^^^^^^
File "/home/ansel/anaconda3/envs/mast3r-slam/lib/python3.11/multiprocessing/popen_spawn_posix.py", line 32, in init
super().init(process_obj)
File "/home/ansel/anaconda3/envs/mast3r-slam/lib/python3.11/multiprocessing/popen_fork.py", line 19, in init
self._launch(process_obj)
File "/home/ansel/anaconda3/envs/mast3r-slam/lib/python3.11/multiprocessing/popen_spawn_posix.py", line 62, in _launch
f.write(fp.getbuffer())
KeyboardInterrupt
[W1030 14:08:55.330710990 CudaIPCTypes.cpp:100] Producer process tried to deallocate over 1000 memory blocks referred by consumer processes. Deallocation might be significantly slowed down. We assume it will never going to be the case, but if it is, please file but to https://github.com/pytorch/pytorch

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions