Skip to content

CUDA error: operation not permitted #125

@zengruizhao

Description

@zengruizhao

The following error occasionally occurs when using CUDA for inference.
GPU:NVIDIA L20

           2025-12-17 02:44:08,499 - __main__ - ERROR - [worker] Error while handling request: CUDA error: operation not permitted
Search for `cudaErrorNotPermitted' in https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html for more information.
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
Traceback (most recent call last):
  File "/opt/voxcpm/main.py", line 148, in worker
    for chunk in tts.generate_streaming(**kwargs):
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/voxcpm/src/voxcpm/core.py", line 269, in _generate
    for wav, _, _ in generate_result:
                     ^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/torch/utils/_contextlib.py", line 38, in generator_context
    response = gen.send(None)
               ^^^^^^^^^^^^^^
  File "/opt/voxcpm/src/voxcpm/model/voxcpm.py", line 670, in _generate_with_prompt_cache
    for latent_pred, pred_audio_feat in inference_result:
                                        ^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/torch/utils/_contextlib.py", line 38, in generator_context
    response = gen.send(None)
               ^^^^^^^^^^^^^^
  File "/opt/voxcpm/src/voxcpm/model/voxcpm.py", line 742, in _inference
    feat_embed = self.feat_encoder(feat)  # [b, t, h_feat]
                 ^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 414, in __call__
    return super().__call__(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1786, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 832, in compile_wrapper
    return fn(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1786, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/voxcpm/src/voxcpm/modules/locenc/local_encoder.py", line 17, in forward
    def forward(self, x):
  File "/usr/local/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn
    return fn(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/torch/_functorch/aot_autograd.py", line 1130, in forward
    return compiled_fn(full_args)
           ^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 353, in runtime_wrapper
    all_outs = call_func_at_runtime_with_args(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 129, in call_func_at_runtime_with_args
    out = normalize_as_list(f(args))
                            ^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 724, in inner_fn
    outs = compiled_fn(args)
           ^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 526, in wrapper
    return compiled_fn(runtime_args)
           ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/torch/_inductor/output_code.py", line 613, in __call__
    return self.current_callable(inputs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/tmp/torchinductor_root/u7/cu7iuglfv4fiy7yukmpkycnjjtrlt6ttrgfzj3ssjb5o5l4av6w4.py", line 2190, in call
    (buf182,) = self.partitions[0](partition0_args)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/torch/_inductor/compile_fx.py", line 1772, in run
    return compiled_fn(new_inputs)  # type: ignore[arg-type]
           ^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/torch/_inductor/cudagraph_trees.py", line 388, in deferred_cudagraphify
    return fn(inputs)
           ^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/torch/_inductor/utils.py", line 3017, in run
    out = model(new_inputs)
          ^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/torch/_inductor/cudagraph_trees.py", line 2012, in run
    out = self._run(new_inputs, function_id)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/torch/_inductor/cudagraph_trees.py", line 2182, in _run
    return self.record_function(new_inputs, function_id)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/torch/_inductor/cudagraph_trees.py", line 2219, in record_function
    node = CUDAGraphNode(
           ^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/torch/_inductor/cudagraph_trees.py", line 1037, in __init__
    self.recording_outputs: Optional[OutputType] = self._record(
                                                   ^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/torch/_inductor/cudagraph_trees.py", line 1268, in _record
    torch.cuda.graph(
  File "/usr/local/lib/python3.12/site-packages/torch/cuda/graphs.py", line 265, in __exit__
    self.cuda_graph.capture_end()
  File "/usr/local/lib/python3.12/site-packages/torch/cuda/graphs.py", line 128, in capture_end
    super().capture_end()
torch.AcceleratorError: CUDA error: operation not permitted
Search for `cudaErrorNotPermitted' in https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html for more information.
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions