Skip to content

Conversation

@netrunnereve
Copy link
Collaborator

@netrunnereve netrunnereve commented May 9, 2025

Basically GCN 3 and 4 chips support FP16, but it's unable to process two values at once like GCN 5 can.. Since there's apparently no performance benefit shaderFloat16 is disabled in the drivers, but the chip fully supports it and RADV is able to generate those instructions.

While the actual FMAs won't run any faster having FP16 means that I can use four times less shared memory for mul mat and save a little bit of memory bandwidth when reading the B matrix. As a result I get a little improvement in prompt processing on my RX 470.

PR:

model size params backend ngl test t/s
llama 7B Q4_0 3.56 GiB 6.74 B Vulkan 99 pp512 195.34 ± 0.70
llama 7B Q4_0 3.56 GiB 6.74 B Vulkan 99 tg128 33.48 ± 0.53

Master:

model size params backend ngl test t/s
llama 7B Q4_0 3.56 GiB 6.74 B Vulkan 99 pp512 188.32 ± 0.38
llama 7B Q4_0 3.56 GiB 6.74 B Vulkan 99 tg128 33.62 ± 0.28

I'm leaving this as a draft for now as it's a bit hacky and I'm not sure if the proprietary drivers support this. The good thing here though is that it'll let me work on and test the FP16 shaders using my old card.

@github-actions github-actions bot added Vulkan Issues specific to the Vulkan backend ggml changes relating to the ggml tensor library for machine learning labels May 9, 2025
Copy link
Collaborator

@0cc4m 0cc4m left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can only be merged if it passes validation, which I doubt. If it does not, I would require disabling it by default and hiding behind an environment variable. The backend has to follow the Vulkan specification.


device->fp16 = device->fp16 && vk12_features.shaderFloat16;
// GCN 3 and 4 chips support FP16 at regular speed, but the drivers don't indicate it
device->fp16 = device->fp16 && (vk12_features.shaderFloat16 || (device->architecture == AMD_GCN34));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's against the spec to enable this if support is not indicated with the shaderFloat16 feature, which it is not for these GPUs. Did you try running this with validation layers enabled? I'm relatively sure this should throw validation issues.

@netrunnereve
Copy link
Collaborator Author

I finally got the validation layers working and it fails with vkCreateComputePipelines(): pCreateInfos[0].stage SPIR-V Capability Float16 was declared, but one of the following requirements is required (VkPhysicalDeviceVulkan12Features::shaderFloat16 OR VK_AMD_gpu_shader_half_float).. So yeah Vulkan isn't happy with how I'm using FP16 when the driver says it doesn't support it.

Considering how small the improvement is I don't think it's worth having a special environment variable and all that, and I'm just going to close this.

@netrunnereve netrunnereve deleted the fp16 branch May 10, 2025 00:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ggml changes relating to the ggml tensor library for machine learning Vulkan Issues specific to the Vulkan backend

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants