CUDA error: the function failed to launch on the GPU

Just trying the example on the readme page with JuggernautXL, but it fails: 

sd.exe -m "G:\AI\Image\stable-diffusion-webui\models\Stable-diffusion\juggernautXL_juggXIByRundiffusion.safetensors" --cfg-scale 7.5 --steps 35 --sampling-method euler  -H 1024 -W 1024 --seed 42 --diffusion-fa -p "fantasy medieval village world inside a glass sphere , high detail, fantasy, realistic, light effect, hyper detail, volumetric lighting, cinematic, macro, depth of field, blur, red light and clouds from the back, highly detailed epic cinematic concept art cg render made in maya, blender and photoshop, octane render, excellent composition, dynamic dramatic cinematic lighting, aesthetic, very inspirational, world inside a glass sphere by james gurney by artgerm with james jean, joe fenton and tristan eaton by ross tran, fine details, 4k resolution"
ggml_cuda_init: GGML_CUDA_FORCE_MMQ:    no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 CUDA devices:
  Device 0: NVIDIA GeForce RTX 3070 Ti, compute capability 8.6, VMM: yes
[INFO ] stable-diffusion.cpp:195  - loading model from 'G:\AI\Image\stable-diffusion-webui\models\Stable-diffusion\juggernautXL_juggXIByRundiffusion.safetensors'
[INFO ] model.cpp:888  - load G:\AI\Image\stable-diffusion-webui\models\Stable-diffusion\juggernautXL_juggXIByRundiffusion.safetensors using safetensors format
[INFO ] stable-diffusion.cpp:242  - Version: SDXL
[INFO ] stable-diffusion.cpp:275  - Weight type:                 f16
[INFO ] stable-diffusion.cpp:276  - Conditioner weight type:     f16
[INFO ] stable-diffusion.cpp:277  - Diffusion model weight type: f16
[INFO ] stable-diffusion.cpp:278  - VAE weight type:             f32
[WARN ] stable-diffusion.cpp:289  - !!!It looks like you are using SDXL model. If you find that the generated images are completely black, try specifying SDXL VAE FP16 Fix with the --vae parameter. You can find it here: https://huggingface.co/madebyollin/sdxl-vae-fp16-fix/blob/main/sdxl_vae.safetensors
[INFO ] stable-diffusion.cpp:326  - Using flash attention in the diffusion model
  |==================================================| 2641/2641 - 333.33it/s
[INFO ] stable-diffusion.cpp:516  - total params memory size = 8113.89MB (VRAM 8113.89MB, RAM 0.00MB): clip 3119.36MB(VRAM), unet 4900.07MB(VRAM), vae 94.47MB(VRAM), controlnet 0.00MB(VRAM), pmid 0.00MB(VRAM)
[INFO ] stable-diffusion.cpp:520  - loading model from 'G:\AI\Image\stable-diffusion-webui\models\Stable-diffusion\juggernautXL_juggXIByRundiffusion.safetensors' completed, taking 6.22s
[INFO ] stable-diffusion.cpp:554  - running in eps-prediction mode
[INFO ] stable-diffusion.cpp:688  - Attempting to apply 0 LoRAs
[INFO ] stable-diffusion.cpp:1241 - apply_loras completed, taking 0.00s
[INFO ] stable-diffusion.cpp:1374 - get_learned_condition completed, taking 1138 ms
[INFO ] stable-diffusion.cpp:1397 - sampling using Euler method
[INFO ] stable-diffusion.cpp:1434 - generating image: 1/1 - seed 42
D:\a\stable-diffusion.cpp\stable-diffusion.cpp\ggml\src\ggml-cuda\template-instances\../fattn-wmma-f16.cuh:422: ERROR: CUDA kernel flash_attn_ext_f16 has no device code compatible with CUDA arch 600. ggml-cuda.cu was compiled for: 600
D:\a\stable-diffusion.cpp\stable-diffusion.cpp\ggml\src\ggml-cuda\template-instances\../fattn-wmma-f16.cuh:422: ERROR: CUDA kernel flash_attn_ext_f16 has no device code compatible with CUDA arch 600. ggml-cuda.cu was compiled for: 600
D:\a\stable-diffusion.cpp\stable-diffusion.cpp\ggml\src\ggml-cuda\template-instances\../fattn-wmma-f16.cuh:422: ERROR: CUDA kernel flash_attn_ext_f16 has no device code compatible with CUDA arch 600. ggml-cuda.cu was compiled for: 600
D:\a\stable-diffusion.cpp\stable-diffusion.cpp\ggml\src\ggml-cuda\template-instances\../fattn-wmma-f16.cuh:422: ERROR: CUDA kernel flash_attn_ext_f16 has no device code compatible with CUDA arch 600. ggml-cuda.cu was compiled for: 600
D:\a\stable-diffusion.cpp\stable-diffusion.cpp\ggml\src\ggml-cuda\template-instances\../fattn-wmma-f16.cuh:422: ERROR: CUDA kernel flash_attn_ext_f16 has no device code compatible with CUDA arch 600. ggml-cuda.cu was compiled for: 600
D:\a\stable-diffusion.cpp\stable-diffusion.cpp\ggml\src\ggml-cuda\template-instances\../fattn-wmma-f16.cuh:422: ERROR: CUDA kernel flash_attn_ext_f16 has no device code compatible with CUDA arch 600. ggml-cuda.cu was compiled for: 600
D:\a\stable-diffusion.cpp\stable-diffusion.cpp\ggml\src\ggml-cuda\template-instances\../fattn-wmma-f16.cuh:422: ERROR: CUDA kernel flash_attn_ext_f16 has no device code compatible with CUDA arch 600. ggml-cuda.cu was compiled for: 600
D:\a\stable-diffusion.cpp\stable-diffusion.cpp\ggml\src\ggml-cuda\template-instances\../fattn-wmma-f16.cuh:422: ERROR: CUDA kernel flash_attn_ext_f16 has no device code compatible with CUDA arch 600. ggml-cuda.cu was compiled for: 600
D:\a\stable-diffusion.cpp\stable-diffusion.cpp\ggml\src\ggml-cuda\template-instances\../fattn-wmma-f16.cuh:422: ERROR: CUDA kernel flash_attn_ext_f16 has no device code compatible with CUDA arch 600. ggml-cuda.cu was compiled for: 600
D:\a\stable-diffusion.cpp\stable-diffusion.cpp\ggml\src\ggml-cuda\template-instances\../fattn-wmma-f16.cuh:422: ERROR: CUDA kernel flash_attn_ext_f16 has no device code compatible with CUDA arch 600. ggml-cuda.cu was compiled for: 600
D:\a\stable-diffusion.cpp\stable-diffusion.cpp\ggml\src\ggml-cuda\template-instances\../fattn-wmma-f16.cuh:422: ERROR: CUDA kernel flash_attn_ext_f16 has no device code compatible with CUDA arch 600. ggml-cuda.cu was compiled for: 600
D:\a\stable-diffusion.cpp\stable-diffusion.cpp\ggml\src\ggml-cuda\template-instances\../fattn-wmma-f16.cuh:422: ERROR: CUDA kernel flash_attn_ext_f16 has no device code compatible with CUDA arch 600. ggml-cuda.cu was compiled for: 600
D:\a\stable-diffusion.cpp\stable-diffusion.cpp\ggml\src\ggml-cuda\template-instances\../fattn-wmma-f16.cuh:422: ERROR: CUDA kernel flash_attn_ext_f16 has no device code compatible with CUDA arch 600. ggml-cuda.cu was compiled for: 600
D:\a\stable-diffusion.cpp\stable-diffusion.cpp\ggml\src\ggml-cuda\template-instances\../fattn-wmma-f16.cuh:422: ERROR: CUDA kernel flash_attn_ext_f16 has no device code compatible with CUDA arch 600. ggml-cuda.cu was compiled for: 600
D:\a\stable-diffusion.cpp\stable-diffusion.cpp\ggml\src\ggml-cuda\template-instances\../fattn-wmma-f16.cuh:422: ERROR: CUDA kernel flash_attn_ext_f16 has no device code compatible with CUDA arch 600. ggml-cuda.cu was compiled for: 600
D:\a\stable-diffusion.cpp\stable-diffusion.cpp\ggml\src\ggml-cuda\template-instances\../fattn-wmma-f16.cuh:422: ERROR: CUDA kernel flash_attn_ext_f16 has no device code compatible with CUDA arch 600. ggml-cuda.cu was compiled for: 600
D:\a\stable-diffusion.cpp\stable-diffusion.cpp\ggml\src\ggml-cuda\template-instances\../fattn-wmma-f16.cuh:422: ERROR: CUDA kernel flash_attn_ext_f16 has no device code compatible with CUDA arch 600. ggml-cuda.cu was compiled for: 600
D:\a\stable-diffusion.cpp\stable-diffusion.cpp\ggml\src\ggml-cuda\template-instances\../fattn-wmma-f16.cuh:422: ERROR: CUDA kernel flash_attn_ext_f16 has no device code compatible with CUDA arch 600. ggml-cuda.cu was compiled for: 600
D:\a\stable-diffusion.cpp\stable-diffusion.cpp\ggml\src\ggml-cuda\template-instances\../fattn-wmma-f16.cuh:422: ERROR: CUDA kernel flash_attn_ext_f16 has no device code compatible with CUDA arch 600. ggml-cuda.cu was compiled for: 600
D:\a\stable-diffusion.cpp\stable-diffusion.cpp\ggml\src\ggml-cuda\template-instances\../fattn-wmma-f16.cuh:422: ERROR: CUDA kernel flash_attn_ext_f16 has no device code compatible with CUDA arch 600. ggml-cuda.cu was compiled for: 600
D:\a\stable-diffusion.cpp\stable-diffusion.cpp\ggml\src\ggml-cuda\template-instances\../fattn-wmma-f16.cuh:422: ERROR: CUDA kernel flash_attn_ext_f16 has no device code compatible with CUDA arch 600. ggml-cuda.cu was compiled for: 600
D:\a\stable-diffusion.cpp\stable-diffusion.cpp\ggml\src\ggml-cuda\template-instances\../fattn-wmma-f16.cuh:422: ERROR: CUDA kernel flash_attn_ext_f16 has no device code compatible with CUDA arch 600. ggml-cuda.cu was compiled for: 600
D:\a\stable-diffusion.cpp\stable-diffusion.cpp\ggml\src\ggml-cuda\template-instances\../fattn-wmma-f16.cuh:422: ERROR: CUDA kernel flash_attn_ext_f16 has no device code compatible with CUDA arch 600. ggml-cuda.cu was compiled for: 600
D:\a\stable-diffusion.cpp\stable-diffusion.cpp\ggml\src\ggml-cuda\template-instances\../fattn-wmma-f16.cuh:422: ERROR: CUDA kernel flash_attn_ext_f16 has no device code compatible with CUDA arch 600. ggml-cuda.cu was compiled for: 600
D:\a\stable-diffusion.cpp\stable-diffusion.cpp\ggml\src\ggml-cuda\template-instances\../fattn-wmma-f16.cuh:422: ERROR: CUDA kernel flash_attn_ext_f16 has no device code compatible with CUDA arch 600. ggml-cuda.cu was compiled for: 600
D:\a\stable-diffusion.cpp\stable-diffusion.cpp\ggml\src\ggml-cuda\template-instances\../fattn-wmma-f16.cuh:422: ERROR: CUDA kernel flash_attn_ext_f16 has no device code compatible with CUDA arch 600. ggml-cuda.cu was compiled for: 600
D:\a\stable-diffusion.cpp\stable-diffusion.cpp\ggml\src\ggml-cuda\template-instances\../fattn-wmma-f16.cuh:422: ERROR: CUDA kernel flash_attn_ext_f16 has no device code compatible with CUDA arch 600. ggml-cuda.cu was compiled for: 600
D:\a\stable-diffusion.cpp\stable-diffusion.cpp\ggml\src\ggml-cuda\template-instances\../fattn-wmma-f16.cuh:422: ERROR: CUDA kernel flash_attn_ext_f16 has no device code compatible with CUDA arch 600. ggml-cuda.cu was compiled for: 600
D:\a\stable-diffusion.cpp\stable-diffusion.cpp\ggml\src\ggml-cuda\template-instances\../fattn-wmma-f16.cuh:422: ERROR: CUDA kernel flash_attn_ext_f16 has no device code compatible with CUDA arch 600. ggml-cuda.cu was compiled for: 600
D:\a\stable-diffusion.cpp\stable-diffusion.cpp\ggml\src\ggml-cuda\template-instances\../fattn-wmma-f16.cuh:422: ERROR: CUDA kernel flash_attn_ext_f16 has no device code compatible with CUDA arch 600. ggml-cuda.cu was compiled for: 600
D:\a\stable-diffusion.cpp\stable-diffusion.cpp\ggml\src\ggml-cuda\template-instances\../fattn-wmma-f16.cuh:422: ERROR: CUDA kernel flash_attn_ext_f16 has no device code compatible with CUDA arch 600. ggml-cuda.cu was compiled for: 600
D:\a\stable-diffusion.cpp\stable-diffusion.cpp\ggml\src\ggml-cuda\template-instances\../fattn-wmma-f16.cuh:422: ERROR: CUDA kernel flash_attn_ext_f16 has no device code compatible with CUDA arch 600. ggml-cuda.cu was compiled for: 600
CUDA error: the function failed to launch on the GPU
  current device: 0, in function ggml_cuda_op_mul_mat_cublas at D:\a\stable-diffusion.cpp\stable-diffusion.cpp\ggml\src\ggml-cuda\ggml-cuda.cu:1151
  cublasSgemm_v2(ctx.cublas_handle(id), CUBLAS_OP_T, CUBLAS_OP_N, row_diff, src1_ncols, ne10, &alpha, src0_ddf_i, ne00, src1_ddf1_i, ne10, &beta, dst_dd_i, ldc)
D:\a\stable-diffusion.cpp\stable-diffusion.cpp\ggml\src\ggml-cuda\ggml-cuda.cu:70: CUDA error

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

CUDA error: the function failed to launch on the GPU #579

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

CUDA error: the function failed to launch on the GPU #579

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions