Skip to content
This repository was archived by the owner on Jan 13, 2026. It is now read-only.

CUDA error: no kernel image is available for execution on the device #16

@XuWuLingYu

Description

@XuWuLingYu

python3 launch.py --config configs/$method.yaml --train --gpu 0 name="imagedream-sd21-shading" tag="astronaut" system.prompt_processor.prompt="an astronaut riding a horse" system.prompt_processor.image_path="${image_file}" system.guidance.ckpt_path="${ckpt_file}" system.guidance.config_path="${cfg_file}"
['0']
Seed set to 0
[INFO] Loading Multiview Diffusion ...
Setting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is None and using 5 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is 1024 and using 5 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is None and using 5 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is 1024 and using 5 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is None and using 10 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is 1024 and using 10 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is None and using 10 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is 1024 and using 10 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is None and using 20 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is 1024 and using 20 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is None and using 20 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is 1024 and using 20 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is None and using 20 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is 1024 and using 20 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is None and using 20 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is 1024 and using 20 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is None and using 20 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is 1024 and using 20 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is None and using 20 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is 1024 and using 20 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is None and using 10 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is 1024 and using 10 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is None and using 10 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is 1024 and using 10 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is None and using 10 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is 1024 and using 10 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is None and using 5 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is 1024 and using 5 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is None and using 5 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is 1024 and using 5 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is None and using 5 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is 1024 and using 5 heads.
making attention of type 'vanilla-xformers' with 512 in_channels
building MemoryEfficientAttnBlock with 512 in_channels...
Working with z of shape (1, 4, 32, 32) = 4096 dimensions.
making attention of type 'vanilla-xformers' with 512 in_channels
building MemoryEfficientAttnBlock with 512 in_channels...
[INFO] Loaded Multiview Diffusion!
[INFO] Using prompt [an astronaut riding a horse] and negative prompt [ugly, bad anatomy, blurry, pixelated obscure, unnatural colors, poor lighting, dull, and unclear, cropped, lowres, low quality, artifacts, duplicate, morbid, mutilated, poorly drawn face, deformed, dehydrated, bad proportions]
[INFO] Using view-dependent prompts [side]:[an astronaut riding a horse, side view] [front]:[an astronaut riding a horse, front view] [back]:[an astronaut riding a horse, back view] [overhead]:[an astronaut riding a horse, overhead view]
[INFO] Using 16bit Automatic Mixed Precision (AMP)
[INFO] GPU available: True (cuda), used: True
[INFO] TPU available: False, using: 0 TPU cores
[INFO] HPU available: False, using: 0 HPUs
[INFO] You are using a CUDA device ('NVIDIA A800-SXM4-80GB') that has Tensor Cores. To properly utilize them, you should set torch.set_float32_matmul_precision('medium' | 'high') which will trade-off precision for performance. For more details, read https://pytorch.org/docs/stable/generated/torch.set_float32_matmul_precision.html#torch.set_float32_matmul_precision
[INFO] LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]
fatal: detected dubious ownership in repository at '/root/paddlejob/workspace/env_run'
To add an exception for this directory, call:

    git config --global --add safe.directory /root/paddlejob/workspace/env_run

/root/paddlejob/workspace/env_run/threestudio/utils/callbacks.py:92: UserWarning: Code snapshot is not saved. Please make sure you have git installed and are in a git repository.
rank_zero_warn(
[INFO]
| Name | Type | Params | Mode

0 | geometry | ImplicitVolume | 12.6 M | train
1 | material | DiffuseWithPointLightMaterial | 0 | train
2 | background | NeuralEnvironmentMapBackground | 448 | train
3 | renderer | NeRFVolumeRenderer | 0 | train
4 | guidance | MultiviewDiffusionGuidance | 2.0 B | train

12.6 M Trainable params
2.0 B Non-trainable params
2.0 B Total params
8,096.164 Total estimated model params size (MB)
[INFO] Validation results will be saved to outputs/imagedream-sd21-shading/astronaut@20240805-231756/save
/root/anaconda3/envs/ImageDream/lib/python3.10/site-packages/pytorch_lightning/trainer/connectors/data_connector.py:424: The 'train_dataloader' does not have many workers which may be a bottleneck. Consider increasing the value of the num_workers argumenttonum_workers=127in theDataLoaderto improve performance. /root/anaconda3/envs/ImageDream/lib/python3.10/site-packages/pytorch_lightning/trainer/connectors/data_connector.py:424: The 'val_dataloader' does not have many workers which may be a bottleneck. Consider increasing the value of thenum_workers argument to num_workers=127 in the DataLoader to improve performance.
Epoch 0: | | 0/? [00:00<?, ?it/s]Traceback (most recent call last):
File "/root/paddlejob/workspace/env_run/launch.py", line 238, in
main(args, extras)
File "/root/paddlejob/workspace/env_run/launch.py", line 181, in main
trainer.fit(system, datamodule=dm, ckpt_path=cfg.resume)
File "/root/anaconda3/envs/ImageDream/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 543, in fit
call._call_and_handle_interrupt(
File "/root/anaconda3/envs/ImageDream/lib/python3.10/site-packages/pytorch_lightning/trainer/call.py", line 44, in _call_and_handle_interrupt
return trainer_fn(*args, **kwargs)
File "/root/anaconda3/envs/ImageDream/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 579, in _fit_impl
self._run(model, ckpt_path=ckpt_path)
File "/root/anaconda3/envs/ImageDream/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 986, in _run
results = self._run_stage()
File "/root/anaconda3/envs/ImageDream/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 1030, in _run_stage
self.fit_loop.run()
File "/root/anaconda3/envs/ImageDream/lib/python3.10/site-packages/pytorch_lightning/loops/fit_loop.py", line 205, in run
self.advance()
File "/root/anaconda3/envs/ImageDream/lib/python3.10/site-packages/pytorch_lightning/loops/fit_loop.py", line 363, in advance
self.epoch_loop.run(self._data_fetcher)
File "/root/anaconda3/envs/ImageDream/lib/python3.10/site-packages/pytorch_lightning/loops/training_epoch_loop.py", line 140, in run
self.advance(data_fetcher)
File "/root/anaconda3/envs/ImageDream/lib/python3.10/site-packages/pytorch_lightning/loops/training_epoch_loop.py", line 250, in advance
batch_output = self.automatic_optimization.run(trainer.optimizers[0], batch_idx, kwargs)
File "/root/anaconda3/envs/ImageDream/lib/python3.10/site-packages/pytorch_lightning/loops/optimization/automatic.py", line 190, in run
self._optimizer_step(batch_idx, closure)
File "/root/anaconda3/envs/ImageDream/lib/python3.10/site-packages/pytorch_lightning/loops/optimization/automatic.py", line 268, in _optimizer_step
call._call_lightning_module_hook(
File "/root/anaconda3/envs/ImageDream/lib/python3.10/site-packages/pytorch_lightning/trainer/call.py", line 159, in _call_lightning_module_hook
output = fn(*args, **kwargs)
File "/root/anaconda3/envs/ImageDream/lib/python3.10/site-packages/pytorch_lightning/core/module.py", line 1308, in optimizer_step
optimizer.step(closure=optimizer_closure)
File "/root/anaconda3/envs/ImageDream/lib/python3.10/site-packages/pytorch_lightning/core/optimizer.py", line 153, in step
step_output = self._strategy.optimizer_step(self._optimizer, closure, **kwargs)
File "/root/anaconda3/envs/ImageDream/lib/python3.10/site-packages/pytorch_lightning/strategies/strategy.py", line 238, in optimizer_step
return self.precision_plugin.optimizer_step(optimizer, model=model, closure=closure, **kwargs)
File "/root/anaconda3/envs/ImageDream/lib/python3.10/site-packages/pytorch_lightning/plugins/precision/amp.py", line 77, in optimizer_step
closure_result = closure()
File "/root/anaconda3/envs/ImageDream/lib/python3.10/site-packages/pytorch_lightning/loops/optimization/automatic.py", line 144, in call
self._result = self.closure(*args, **kwargs)
File "/root/anaconda3/envs/ImageDream/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/root/anaconda3/envs/ImageDream/lib/python3.10/site-packages/pytorch_lightning/loops/optimization/automatic.py", line 129, in closure
step_output = self._step_fn()
File "/root/anaconda3/envs/ImageDream/lib/python3.10/site-packages/pytorch_lightning/loops/optimization/automatic.py", line 317, in _training_step
training_step_output = call._call_strategy_hook(trainer, "training_step", *kwargs.values())
File "/root/anaconda3/envs/ImageDream/lib/python3.10/site-packages/pytorch_lightning/trainer/call.py", line 311, in _call_strategy_hook
output = fn(*args, **kwargs)
File "/root/anaconda3/envs/ImageDream/lib/python3.10/site-packages/pytorch_lightning/strategies/strategy.py", line 390, in training_step
return self.lightning_module.training_step(*args, **kwargs)
File "/root/paddlejob/workspace/env_run/threestudio/systems/imagedream.py", line 51, in training_step
out = self(batch)
File "/root/anaconda3/envs/ImageDream/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in call_impl
return forward_call(*args, **kwargs)
File "/root/paddlejob/workspace/env_run/threestudio/systems/imagedream.py", line 48, in forward
return self.renderer(**batch)
File "/root/anaconda3/envs/ImageDream/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in call_impl
return forward_call(*args, **kwargs)
File "/root/paddlejob/workspace/env_run/threestudio/models/renderers/nerf_volume_renderer.py", line 96, in forward
ray_indices, t_starts
, t_ends
= self.estimator.sampling(
File "/root/anaconda3/envs/ImageDream/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/root/anaconda3/envs/ImageDream/lib/python3.10/site-packages/nerfacc/estimators/occ_grid.py", line 164, in sampling
intervals, samples = traverse_grids(
File "/root/anaconda3/envs/ImageDream/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/root/anaconda3/envs/ImageDream/lib/python3.10/site-packages/nerfacc/grid.py", line 135, in traverse_grids
intervals, samples = _C.traverse_grids(
File "/root/anaconda3/envs/ImageDream/lib/python3.10/site-packages/nerfacc/cuda/init.py", line 13, in call_cuda
return getattr(_C, name)(*args, **kwargs)
RuntimeError: CUDA error: no kernel image is available for execution on the device
Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.

nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2022 NVIDIA Corporation
Built on Wed_Sep_21_10:33:58_PDT_2022
Cuda compilation tools, release 11.8, V11.8.89
Build cuda_11.8.r11.8/compiler.31833905_0

pip list |grep torch
WARNING: Ignoring invalid distribution -formers (/root/anaconda3/envs/ImageDream/lib/python3.10/site-packages)
open-clip-torch 2.7.0
pytorch-lightning 2.3.3
tinycudann 1.7 /root/paddlejob/workspace/env_run/copy_file/tiny-cuda-nn/bindings/torch
torch 2.0.1+cu118
torchmetrics 1.4.0.post0
torchvision 0.15.2+cu118

How can I solve this cuda issue?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions