Skip to content

Conversation

@rmatif
Copy link
Collaborator

@rmatif rmatif commented Aug 6, 2025

Enables mixed-precision F16/F32 addition and fixes the use of LoRAs on sdcpp leejet/stable-diffusion.cpp#757

@github-actions github-actions bot added ggml changes relating to the ggml tensor library for machine learning OpenCL Issues specific to the OpenCL backend labels Aug 6, 2025
@rmatif rmatif requested review from lhez and max-krasnyansky August 6, 2025 21:44
@lhez
Copy link
Collaborator

lhez commented Aug 12, 2025

Apologize for the delay.

The mixed f16/f32 path is for dst type of f16, does not affect dst type of f32. Did some verification on A830 with language models - all looks good. There might be slow down since there are branches in the kernels (should be uniform, but may still affect the compiler). We can further iterate if needed.

@lhez lhez merged commit 60a7658 into ggml-org:master Aug 12, 2025
47 checks passed
@rmatif
Copy link
Collaborator Author

rmatif commented Aug 14, 2025

Apologize for the delay.

The mixed f16/f32 path is for dst type of f16, does not affect dst type of f32. Did some verification on A830 with language models - all looks good. There might be slow down since there are branches in the kernels (should be uniform, but may still affect the compiler). We can further iterate if needed.

I thought of the branching, but since it's not a critical op, I think it's fine. I didn't want to duplicate the code and add a new kernel just for this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ggml changes relating to the ggml tensor library for machine learning OpenCL Issues specific to the OpenCL backend

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants