Skip to content

Conversation

ddwkim
Copy link
Contributor

@ddwkim ddwkim commented Aug 20, 2025

Make sure to read the contributing guidelines before submitting a PR

We have found that exp operation is not supported while implementing Vocos, a neural vocoder. The exp operation implementation in Vulkan results in expected output from our model, and the test results on Apple M4 MAX and NVIDIA A6000 are as follows.

test-backend-ops results on Apple M4 Max

./build/bin/test-backend-ops -o EXP
ggml_vulkan: Found 1 Vulkan devices:
ggml_vulkan: 0 = Apple M4 Max (MoltenVK) | uma: 1 | fp16: 1 | bf16: 0 | warp size: 32 | shared memory: 32768 | int dot: 0 | matrix cores: none
Testing 4 devices
...
Backend 1/4: Metal
  Device description: Apple M4 Max
  Device memory: 27648 MB (27642 MB free)

  EXP(type=f16,ne_a=[128,2,2,2],v=0): not supported [Metal]
  EXP(type=f16,ne_a=[5,7,11,13],v=0): not supported [Metal]
  EXP(type=f16,ne_a=[128,2,2,2],v=1): not supported [Metal]
  EXP(type=f16,ne_a=[5,7,11,13],v=1): not supported [Metal]
  EXP(type=f32,ne_a=[128,2,2,2],v=0): OK
  EXP(type=f32,ne_a=[5,7,11,13],v=0): OK
  EXP(type=f32,ne_a=[128,2,2,2],v=1): not supported [Metal]
  EXP(type=f32,ne_a=[5,7,11,13],v=1): not supported [Metal]
  10844/10844 tests passed
  Backend Metal: OK
ggml_metal_free: deallocating
ggml_metal_mem_pool_free: freeing memory pool, num heaps = 0 (total = 0)
ggml_metal_mem_pool_free: freeing memory pool, num heaps = 0 (total = 0)
ggml_metal_mem_pool_free: freeing memory pool, num heaps = 0 (total = 0)
ggml_metal_mem_pool_free: freeing memory pool, num heaps = 0 (total = 0)
ggml_metal_mem_pool_free: freeing memory pool, num heaps = 0 (total = 0)
ggml_metal_mem_pool_free: freeing memory pool, num heaps = 0 (total = 0)
ggml_metal_mem_pool_free: freeing memory pool, num heaps = 0 (total = 0)
ggml_metal_mem_pool_free: freeing memory pool, num heaps = 0 (total = 0)
Backend 2/4: Vulkan0
  Device description: Apple M4 Max
  Device memory: 36864 MB (36864 MB free)

  EXP(type=f16,ne_a=[128,2,2,2],v=0): OK
  EXP(type=f16,ne_a=[5,7,11,13],v=0): OK
  EXP(type=f16,ne_a=[128,2,2,2],v=1): not supported [Vulkan0]
  EXP(type=f16,ne_a=[5,7,11,13],v=1): not supported [Vulkan0]
  EXP(type=f32,ne_a=[128,2,2,2],v=0): OK
  EXP(type=f32,ne_a=[5,7,11,13],v=0): OK
  EXP(type=f32,ne_a=[128,2,2,2],v=1): not supported [Vulkan0]
  EXP(type=f32,ne_a=[5,7,11,13],v=1): not supported [Vulkan0]
  10844/10844 tests passed
  Backend Vulkan0: OK
Backend 3/4: BLAS
  Device description: Accelerate
  Device memory: 0 MB (0 MB free)

  EXP(type=f16,ne_a=[128,2,2,2],v=0): not supported [BLAS]
  EXP(type=f16,ne_a=[5,7,11,13],v=0): not supported [BLAS]
  EXP(type=f16,ne_a=[128,2,2,2],v=1): not supported [BLAS]
  EXP(type=f16,ne_a=[5,7,11,13],v=1): not supported [BLAS]
  EXP(type=f32,ne_a=[128,2,2,2],v=0): not supported [BLAS]
  EXP(type=f32,ne_a=[5,7,11,13],v=0): not supported [BLAS]
  EXP(type=f32,ne_a=[128,2,2,2],v=1): not supported [BLAS]
  EXP(type=f32,ne_a=[5,7,11,13],v=1): not supported [BLAS]
  10844/10844 tests passed
  Backend BLAS: OK
Backend 4/4: CPU
  Skipping CPU backend
4/4 backends passed
OK

test-backend-ops results on NVIDIA RTX A6000

./build/bin/test-backend-ops -o EXP
ggml_cuda_init: GGML_CUDA_FORCE_MMQ:    no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 CUDA devices:
  Device 0: NVIDIA RTX A6000, compute capability 8.6, VMM: yes
register_backend: registered backend CUDA (1 devices)
register_device: registered device CUDA0 (NVIDIA RTX A6000)
ggml_vulkan: Found 1 Vulkan devices:
ggml_vulkan: 0 = NVIDIA RTX A6000 (NVIDIA) | uma: 0 | fp16: 1 | bf16: 0 | warp size: 32 | shared memory: 49152 | int dot: 1 | matrix cores: NV_coopmat2
register_backend: registered backend Vulkan (1 devices)
register_device: registered device Vulkan0 (NVIDIA RTX A6000)
register_backend: registered backend CPU (1 devices)
register_device: registered device CPU (AMD Ryzen Threadripper PRO 5975WX 32-Cores)
load_backend: failed to find ggml_backend_init in /home/dongwon/prj/llama.cpp/build/bin/libggml-cuda.so
load_backend: failed to find ggml_backend_init in /home/dongwon/prj/llama.cpp/build/bin/libggml-vulkan.so
load_backend: failed to find ggml_backend_init in /home/dongwon/prj/llama.cpp/build/bin/libggml-cpu.so
Testing 3 devices

Backend 1/3: CUDA0
  Device description: NVIDIA RTX A6000
  Device memory: 48539 MB (48223 MB free)

update_cuda_graph_executable: CUDA graph update failed
ggml_backend_cuda_graph_compute: disabling CUDA graphs due to too many consecutive updates
  EXP(type=f16,ne_a=[128,2,2,2],v=0): OK
  EXP(type=f16,ne_a=[5,7,11,13],v=0): OK
  EXP(type=f16,ne_a=[128,2,2,2],v=1): not supported [CUDA0] 
  EXP(type=f16,ne_a=[5,7,11,13],v=1): not supported [CUDA0] 
  EXP(type=f32,ne_a=[128,2,2,2],v=0): OK
  EXP(type=f32,ne_a=[5,7,11,13],v=0): OK
  EXP(type=f32,ne_a=[128,2,2,2],v=1): not supported [CUDA0] 
  EXP(type=f32,ne_a=[5,7,11,13],v=1): not supported [CUDA0] 
  10844/10844 tests passed
  Backend CUDA0: OK
Backend 2/3: Vulkan0
  Device description: NVIDIA RTX A6000
  Device memory: 49140 MB (49140 MB free)

  EXP(type=f16,ne_a=[128,2,2,2],v=0): OK
  EXP(type=f16,ne_a=[5,7,11,13],v=0): OK
  EXP(type=f16,ne_a=[128,2,2,2],v=1): not supported [Vulkan0] 
  EXP(type=f16,ne_a=[5,7,11,13],v=1): not supported [Vulkan0] 
  EXP(type=f32,ne_a=[128,2,2,2],v=0): OK
  EXP(type=f32,ne_a=[5,7,11,13],v=0): OK
  EXP(type=f32,ne_a=[128,2,2,2],v=1): not supported [Vulkan0] 
  EXP(type=f32,ne_a=[5,7,11,13],v=1): not supported [Vulkan0] 
  10844/10844 tests passed
  Backend Vulkan0: OK
Backend 3/3: CPU
  Skipping CPU backend
3/3 backends passed
OK

@ddwkim ddwkim requested a review from 0cc4m as a code owner August 20, 2025 15:38
@github-actions github-actions bot added Vulkan Issues specific to the Vulkan backend ggml changes relating to the ggml tensor library for machine learning labels Aug 20, 2025
Copy link
Collaborator

@jeffbolznv jeffbolznv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, and I verified the tests pass on my system.

Copy link
Collaborator

@0cc4m 0cc4m left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

@0cc4m 0cc4m merged commit 20c2dac into ggml-org:master Aug 21, 2025
47 checks passed
qnixsynapse pushed a commit to menloresearch/llama.cpp that referenced this pull request Aug 22, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ggml changes relating to the ggml tensor library for machine learning Vulkan Issues specific to the Vulkan backend

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants