Prerequisites
Feature Description
Hi,
dowloading latest Vulkan Linux build:
https://github.com/ggml-org/llama.cpp/releases/download/b4856/llama-b4856-bin-ubuntu-vulkan-x64.zip
doesn't support use of VK_NV_cooperative_matrix2..
I'm on NV VK dev driver supporting that extension.. (575 release will have that also)
./llama-bench
shows:
ggml_vulkan: 0 = NVIDIA GeForce RTX 4070 (NVIDIA) | uma: 0 | fp16: 1 | warp size: 32 | shared memory: 49152 | matrix cores: KHR_coopmat
so no VK_NV_cooperative_matrix2 support..
but on Windows:
ggml_vulkan: 0 = NVIDIA GeForce RTX 4070 (NVIDIA) | uma: 0 | fp16: 1 | warp size: 32 | shared memory: 49152 | matrix cores: NV_coopmat2
so is enabled..
Motivation
improves performance vs KHR_coopmat implementation..
Possible Implementation
it's all due to script:
https://github.com/ggml-org/llama.cpp/blob/master/.github/workflows/build.yml
on linux using latest SDK available 1.4.304 with support for that but Windows builder uses 1.3.261.1 SDK without support ..
so the fix is change in:
https://github.com/ggml-org/llama.cpp/blob/master/.github/workflows/build.yml
from:
VULKAN_VERSION: 1.3.261.1
to:
VULKAN_VERSION: 1.4.304.1
located:
windows-latest-cmake:
runs-on: windows-latest
..
env:
VULKAN_VERSION: 1.3.261.1