CUDA: fix build error from ambiguous __half conversions in conv2d #15690

qnixsynapse · 2025-08-31T04:14:41Z

Building conv2d with half precision failed because __half defines multiple implicit conversion operators (to float, int, short, etc.), causing ambiguous overload resolution when multiplying with float.

Introduce a templated to_float helper that explicitly converts __half via __half2float, while passing through float unchanged. Use this helper in conv2d accumulation to ensure unambiguous and correct promotion to float.

Use ggml_cuda_cast from convert.cuh for casting the value to float instead.

Fixes some build errors with half-precision kernels on CUDA.

Building conv2d with half precision failed because `__half` defines multiple implicit conversion operators (to float, int, short, etc.), causing ambiguous overload resolution when multiplying with float. Introduce a templated `to_float` helper that explicitly converts `__half` via `__half2float`, while passing through float unchanged. Use this helper in conv2d accumulation to ensure unambiguous and correct promotion to float. Fixes some build errors with half-precision kernels on CUDA. ggml-ci

qnixsynapse · 2025-08-31T04:38:07Z

Seems CI is failing because of #15434 hence unrelated. The CI scripts needs to be updated.

JohannesGaessler · 2025-08-31T07:45:33Z

Which build errors?

In any case, in convert.cuh there is already a template ggml_cuda_cast for something like this.

qnixsynapse · 2025-08-31T07:49:12Z

@JohannesGaessler Here is the error log. It suddenly started failed after conv2D got merged.

/home/runner/actions-runner/_work/llama.cpp/llama.cpp/ggml/src/ggml-cuda/conv2d.cu(104): error: more than one conversion function from "half" to a built-in type applies:
            function "__half::operator float() const"
/usr/local/cuda/targets/x86_64-linux/include/cuda_fp16.hpp(204): here
            function "__half::operator short() const"
/usr/local/cuda/targets/x86_64-linux/include/cuda_fp16.hpp(222): here
            function "__half::operator unsigned short() const"
/usr/local/cuda/targets/x86_64-linux/include/cuda_fp16.hpp(225): here
            function "__half::operator int() const"
/usr/local/cuda/targets/x86_64-linux/include/cuda_fp16.hpp(228): here
            function "__half::operator unsigned int() const"
/usr/local/cuda/targets/x86_64-linux/include/cuda_fp16.hpp(231): here
            function "__half::operator long long() const"
/usr/local/cuda/targets/x86_64-linux/include/cuda_fp16.hpp(234): here
            function "__half::operator unsigned long long() const"
/usr/local/cuda/targets/x86_64-linux/include/cuda_fp16.hpp(237): here
            function "__half::operator __nv_bool() const"
/usr/local/cuda/targets/x86_64-linux/include/cuda_fp16.hpp(241): here
          detected during:
            instantiation of "void conv2d_kernel<T,Layout>(const float *, const T *, float *, conv_params) [with T=half, Layout=whcn_layout]" 
(116): here
            instantiation of "void conv2d_cuda(const float *, const T *, float *, conv_params, cudaStream_t) [with T=half]" 
(120): here

/home/runner/actions-runner/_work/llama.cpp/llama.cpp/ggml/src/ggml-cuda/conv2d.cu(104): error: more than one conversion function from "half" to a built-in type applies:
            function "__half::operator float() const"
/usr/local/cuda/targets/x86_64-linux/include/cuda_fp16.hpp(204): here
            function "__half::operator short() const"
/usr/local/cuda/targets/x86_64-linux/include/cuda_fp16.hpp(222): here
            function "__half::operator unsigned short() const"
/usr/local/cuda/targets/x86_64-linux/include/cuda_fp16.hpp(225): here
            function "__half::operator int() const"
/usr/local/cuda/targets/x86_64-linux/include/cuda_fp16.hpp(228): here
            function "__half::operator unsigned int() const"
/usr/local/cuda/targets/x86_64-linux/include/cuda_fp16.hpp(231): here
            function "__half::operator long long() const"
/usr/local/cuda/targets/x86_64-linux/include/cuda_fp16.hpp(234): here
            function "__half::operator unsigned long long() const"
/usr/local/cuda/targets/x86_64-linux/include/cuda_fp16.hpp(237): here
            function "__half::operator __nv_bool() const"
/usr/local/cuda/targets/x86_64-linux/include/cuda_fp16.hpp(241): here
          detected during:
            instantiation of "void conv2d_kernel<T,Layout>(const float *, const T *, float *, conv_params) [with T=half, Layout=whcn_layout]" 
(116): here
            instantiation of "void conv2d_cuda(const float *, const T *, float *, conv_params, cudaStream_t) [with T=half]" 
(120): here

2 errors detected in the compilation of "/home/runner/actions-runner/_work/llama.cpp/llama.cpp/ggml/src/ggml-cuda/conv2d.cu".
make[4]: *** [ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/build.make:242: ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/conv2d.cu.o] Error 1
make[4]: *** Waiting for unfinished jobs....
[ 23%] Linking CXX static library libggml-cpu.a
make[4]: Leaving directory '/home/runner/actions-runner/_work/llama.cpp/llama.cpp/build'
[ 23%] Built target ggml-cpu
make[3]: *** [CMakeFiles/Makefile2:987: ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/all] Error 2
make[4]: Leaving directory '/home/runner/actions-runner/_work/llama.cpp/llama.cpp/build'
make[2]: *** [CMakeFiles/Makefile2:2240: tools/server/CMakeFiles/llama-server.dir/rule] Error 2
make[1]: *** [Makefile:728: llama-server] Error 2
make: *** [Makefile:30: build-lib] Error 2
make[3]: Leaving directory '/home/runner/actions-runner/_work/llama.cpp/llama.cpp/build'
make[2]: Leaving directory '/home/runner/actions-runner/_work/llama.cpp/llama.cpp/build'
make[1]: Leaving directory '/home/runner/actions-runner/_work/llama.cpp/llama.cpp/build'

In any case, in convert.cuh there is already a template ggml_cuda_cast for something like this.

I see. Lemme check and update it.

JohannesGaessler · 2025-08-31T07:52:53Z

I see. Lemme check and update it.

I don't think you need to update template, if it was ambiguous you should already be getting compilation failures at other points in the code.

qnixsynapse · 2025-08-31T08:00:33Z

if it was ambiguous you should already be getting compilation failures at other points in the code.

I see. Fortunately, this change fixed the build in our testing. If you want me to use the template ggml_cuda_cast from convert.cuh here, please let me know.

JohannesGaessler · 2025-08-31T08:20:09Z

Please just use the template in convert.cuh, extend the template if and only if you still get compilation failures using it.

…add half‑>float conversion

Forgot to check CI passing.

JohannesGaessler · 2025-08-31T09:38:49Z

As I said:

extend the template if and only if you still get compilation failures using it.

The code did not compile in the first place due to the missing header. So did you actually confirm that the additional branch in the template is necessary?

qnixsynapse · 2025-08-31T09:46:54Z

Not yet. I will let you know when I do that. Should I convert this PR to draft?

JohannesGaessler · 2025-08-31T09:48:23Z

I think it doesn't really matter, just request a review when ready.

qnixsynapse · 2025-08-31T10:17:47Z

Okay, I think it is good to go now. It is now building without failure and test-backend-ops passed successfully.

JohannesGaessler · 2025-08-31T10:46:30Z

ggml/src/ggml-cuda/conv2d.cu

-                const float kernel_val = kernel[Layout::kernel_index(c_out, c_in, ky, kx, P)];
-                acc += (input_val * kernel_val);
+                const T kernel_val = kernel[Layout::kernel_index(c_out, c_in, ky, kx, P)];
+                acc += (input_val * ggml_cuda_cast<float, T>(kernel_val));


Suggested change

acc += (input_val * ggml_cuda_cast<float, T>(kernel_val));

acc += (input_val * ggml_cuda_cast<float>(kernel_val));

The second type can be inferred from the argument, I think this is more legible.

…ml-org#15690) * CUDA: fix build error from ambiguous __half conversions in conv2d Building conv2d with half precision failed because `__half` defines multiple implicit conversion operators (to float, int, short, etc.), causing ambiguous overload resolution when multiplying with float. Introduce a templated `to_float` helper that explicitly converts `__half` via `__half2float`, while passing through float unchanged. Use this helper in conv2d accumulation to ensure unambiguous and correct promotion to float. Fixes some build errors with half-precision kernels on CUDA. ggml-ci * CUDA: Replace custom to_float helper with unified ggml_cuda_cast and add half‑>float conversion * CUDA: Add missing convert.cuh header * CUDA: remove unnecessary extension in ggml_cuda_cast * CUDA: Address review comment, remove second type template argument

qnixsynapse requested a review from JohannesGaessler August 31, 2025 04:15

github-actions bot added Nvidia GPU Issues specific to Nvidia GPUs ggml changes relating to the ggml tensor library for machine learning labels Aug 31, 2025

CUDA: Replace custom to_float helper with unified ggml_cuda_cast and …

1d4e7ee

…add half‑>float conversion

JohannesGaessler previously approved these changes Aug 31, 2025

View reviewed changes

CUDA: Add missing convert.cuh header

80a679b

CUDA: remove unnecessary extension in ggml_cuda_cast

a704126

qnixsynapse requested a review from JohannesGaessler August 31, 2025 10:17

JohannesGaessler approved these changes Aug 31, 2025

View reviewed changes

CUDA: Address review comment, remove second type template argument

5251a40

qnixsynapse merged commit b66df9d into master Sep 1, 2025
46 of 48 checks passed

qnixsynapse deleted the cuda/fix_conv2d_type branch September 1, 2025 01:25

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

CUDA: fix build error from ambiguous __half conversions in conv2d #15690

CUDA: fix build error from ambiguous __half conversions in conv2d #15690

qnixsynapse commented Aug 31, 2025 •

edited

Loading

Uh oh!

qnixsynapse commented Aug 31, 2025 •

edited

Loading

Uh oh!

JohannesGaessler commented Aug 31, 2025

Uh oh!

qnixsynapse commented Aug 31, 2025

Uh oh!

JohannesGaessler commented Aug 31, 2025

Uh oh!

qnixsynapse commented Aug 31, 2025 •

edited

Loading

Uh oh!

JohannesGaessler commented Aug 31, 2025

Uh oh!

JohannesGaessler commented Aug 31, 2025

Uh oh!

qnixsynapse commented Aug 31, 2025

Uh oh!

JohannesGaessler commented Aug 31, 2025

Uh oh!

qnixsynapse commented Aug 31, 2025

Uh oh!

JohannesGaessler Aug 31, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	acc += (input_val * ggml_cuda_cast<float, T>(kernel_val));
	acc += (input_val * ggml_cuda_cast<float>(kernel_val));

CUDA: fix build error from ambiguous __half conversions in conv2d #15690

CUDA: fix build error from ambiguous __half conversions in conv2d #15690

Conversation

qnixsynapse commented Aug 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

qnixsynapse commented Aug 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

JohannesGaessler commented Aug 31, 2025

Uh oh!

qnixsynapse commented Aug 31, 2025

Uh oh!

JohannesGaessler commented Aug 31, 2025

Uh oh!

qnixsynapse commented Aug 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

JohannesGaessler commented Aug 31, 2025

Uh oh!

JohannesGaessler commented Aug 31, 2025

Uh oh!

qnixsynapse commented Aug 31, 2025

Uh oh!

JohannesGaessler commented Aug 31, 2025

Uh oh!

qnixsynapse commented Aug 31, 2025

Uh oh!

JohannesGaessler Aug 31, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

qnixsynapse commented Aug 31, 2025 •

edited

Loading

qnixsynapse commented Aug 31, 2025 •

edited

Loading

qnixsynapse commented Aug 31, 2025 •

edited

Loading