Skip to content

Conversation

@cjsdurj
Copy link

@cjsdurj cjsdurj commented Jul 7, 2025

Make sure to read the contributing guidelines before submitting a PR

@github-actions github-actions bot added Vulkan Issues specific to the Vulkan backend ggml changes relating to the ggml tensor library for machine learning labels Jul 7, 2025
@0cc4m
Copy link
Collaborator

0cc4m commented Jul 7, 2025

By default we don't want to select integrated GPUs, because a lot of PCs contain a Vulkan-capable iGPU that is slower than even running a model on CPU. With this change, those would get used alongside a dedicated GPU and slow the whole thing down.

@cjsdurj
Copy link
Author

cjsdurj commented Jul 7, 2025

i have tested on amd Ryzen AI 9 , intel ultra 7-155 h , it is faster than cpu. under the use case : we need to delpoy an embedding model on igpu and a chat model on dgpu , this feature is usable .

on some pc devices , igpu run slower than cpu . user can disable igpu by set GGML_VK_VISIBLE_DEVICES .

@0cc4m
Copy link
Collaborator

0cc4m commented Jul 7, 2025

Yes, you have tested on APUs. Most of these laptop iGPUs are pretty large, but there are still a ton of other iGPUs that would cause trouble.

The idea is that the default case should cover most of the basic setups in a good way without user intervention, so requiring people with weak iGPU to set an environment variable to disable it is not good. You can also just enable your iGPU for the embedding model with GGML_VK_VISIBLE_DEVICES and solve your problem, can you not?

@cjsdurj
Copy link
Author

cjsdurj commented Jul 7, 2025

Yes, you are right. but the log info is confusing .

GGML_LOG_DEBUG("ggml_vulkan: Found %zu Vulkan devices:\n", vk_instance.device_indices.size()); outputs:

ggml_vulkan: Found 1 Vulkan devices:
ggml_vulkan: 0 = NVIDIA GeForce RTX 4060 Laptop GPU (NVIDIA) | uma: 0 | fp16: 1 | warp size: 32 | shared memory: 49152 |

it should display all Vulkan-capable gpus , and enable all dgpus by default . and user can select igpu to deploy small models.

the device id have many other interfaces in windows , like dxgi ,pdh ... and sometimes different api outputs different device order . the log info is useful when user selecting devices.

@cjsdurj cjsdurj closed this Jul 7, 2025
@cjsdurj cjsdurj deleted the vk_support_igpu branch July 7, 2025 11:59
@cjsdurj
Copy link
Author

cjsdurj commented Jul 7, 2025

Yes, you are right. but the log info is confusing .

GGML_LOG_DEBUG("ggml_vulkan: Found %zu Vulkan devices:\n", vk_instance.device_indices.size()); outputs:

ggml_vulkan: Found 1 Vulkan devices: ggml_vulkan: 0 = NVIDIA GeForce RTX 4060 Laptop GPU (NVIDIA) | uma: 0 | fp16: 1 | warp size: 32 | shared memory: 49152 |

it should display all Vulkan-capable gpus , and enable all dgpus by default . and user can select igpu to deploy small models.

the device id have many other interfaces in windows , like dxgi ,pdh ... and sometimes different api outputs different device order . the log info is useful when user selecting devices.

maybe we can update log info in ggml-vulkan.cpp:3649

    GGML_LOG_DEBUG("ggml_vulkan: %zu = %s (%s) | uma: %d | fp16: %d | warp size: %zu | shared memory: %d | int dot: %d | matrix cores: %s\n",
              idx, device_name.c_str(), driver_props.driverName.data(), uma, fp16, subgroup_size,
              props2.properties.limits.maxComputeSharedMemorySize, integer_dot_product, matrix_cores.c_str());

to

    GGML_LOG_DEBUG("ggml_vulkan: %zu = %s (%s) | uma: %d | fp16: %d | warp size: %zu | shared memory: %d | int dot: %d | matrix cores: %s\n",
              dev_num, device_name.c_str(), driver_props.driverName.data(), uma, fp16, subgroup_size,
              props2.properties.limits.maxComputeSharedMemorySize, integer_dot_product, matrix_cores.c_str());

to use actual idx in enumeratePhysicalDevices.

@0cc4m

@0cc4m
Copy link
Collaborator

0cc4m commented Jul 8, 2025

In my opinion you should use vulkaninfo --summary for that purpose, the device output exists to show the devices that are being used. At least that is how it was before device selection and split-mode none was added to llama.cpp. Maybe it should be reconsidered to make working with that easier. Can you open an issue about it for discussion?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ggml changes relating to the ggml tensor library for machine learning Vulkan Issues specific to the Vulkan backend

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants