CUDA error: operation not permitted

The following error occasionally occurs when using CUDA for inference.
GPU：NVIDIA L20
```text
           2025-12-17 02:44:08,499 - __main__ - ERROR - [worker] Error while handling request: CUDA error: operation not permitted
Search for `cudaErrorNotPermitted' in https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html for more information.
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
Traceback (most recent call last):
  File "/opt/voxcpm/main.py", line 148, in worker
    for chunk in tts.generate_streaming(**kwargs):
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/voxcpm/src/voxcpm/core.py", line 269, in _generate
    for wav, _, _ in generate_result:
                     ^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/torch/utils/_contextlib.py", line 38, in generator_context
    response = gen.send(None)
               ^^^^^^^^^^^^^^
  File "/opt/voxcpm/src/voxcpm/model/voxcpm.py", line 670, in _generate_with_prompt_cache
    for latent_pred, pred_audio_feat in inference_result:
                                        ^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/torch/utils/_contextlib.py", line 38, in generator_context
    response = gen.send(None)
               ^^^^^^^^^^^^^^
  File "/opt/voxcpm/src/voxcpm/model/voxcpm.py", line 742, in _inference
    feat_embed = self.feat_encoder(feat)  # [b, t, h_feat]
                 ^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 414, in __call__
    return super().__call__(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1786, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 832, in compile_wrapper
    return fn(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1786, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/voxcpm/src/voxcpm/modules/locenc/local_encoder.py", line 17, in forward
    def forward(self, x):
  File "/usr/local/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn
    return fn(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/torch/_functorch/aot_autograd.py", line 1130, in forward
    return compiled_fn(full_args)
           ^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 353, in runtime_wrapper
    all_outs = call_func_at_runtime_with_args(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 129, in call_func_at_runtime_with_args
    out = normalize_as_list(f(args))
                            ^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 724, in inner_fn
    outs = compiled_fn(args)
           ^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 526, in wrapper
    return compiled_fn(runtime_args)
           ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/torch/_inductor/output_code.py", line 613, in __call__
    return self.current_callable(inputs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/tmp/torchinductor_root/u7/cu7iuglfv4fiy7yukmpkycnjjtrlt6ttrgfzj3ssjb5o5l4av6w4.py", line 2190, in call
    (buf182,) = self.partitions[0](partition0_args)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/torch/_inductor/compile_fx.py", line 1772, in run
    return compiled_fn(new_inputs)  # type: ignore[arg-type]
           ^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/torch/_inductor/cudagraph_trees.py", line 388, in deferred_cudagraphify
    return fn(inputs)
           ^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/torch/_inductor/utils.py", line 3017, in run
    out = model(new_inputs)
          ^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/torch/_inductor/cudagraph_trees.py", line 2012, in run
    out = self._run(new_inputs, function_id)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/torch/_inductor/cudagraph_trees.py", line 2182, in _run
    return self.record_function(new_inputs, function_id)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/torch/_inductor/cudagraph_trees.py", line 2219, in record_function
    node = CUDAGraphNode(
           ^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/torch/_inductor/cudagraph_trees.py", line 1037, in __init__
    self.recording_outputs: Optional[OutputType] = self._record(
                                                   ^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/torch/_inductor/cudagraph_trees.py", line 1268, in _record
    torch.cuda.graph(
  File "/usr/local/lib/python3.12/site-packages/torch/cuda/graphs.py", line 265, in __exit__
    self.cuda_graph.capture_end()
  File "/usr/local/lib/python3.12/site-packages/torch/cuda/graphs.py", line 128, in capture_end
    super().capture_end()
torch.AcceleratorError: CUDA error: operation not permitted
Search for `cudaErrorNotPermitted' in https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html for more information.
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CUDA error: operation not permitted #125

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

CUDA error: operation not permitted #125

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions