CUDA: use FP32 arithmetic for conv2d #15683

JohannesGaessler · 2025-08-30T09:29:15Z

The kernel can simply be made to always use FP32 arithmetic, this also has the advantage of not gimping performance on Pascal. Performance differences (on an RTX 4090) are negligible:

Backend	GGML op	Op parameters	TFLOPS `ef47691`	TFLOPS `2721aaf`	Speedup
CUDA0	CONV_2D	ne_input=[16,16,128,8],ne_kernel=[3,3,128,512],type_kernel=f16,stride0=1,stride1=1,padding0=0,padding1=0,dilation0=1,dilation1=1,cwhn=0	1.93	1.94	1.00
CUDA0	CONV_2D	ne_input=[16,16,128,8],ne_kernel=[3,3,128,512],type_kernel=f32,stride0=1,stride1=1,padding0=0,padding1=0,dilation0=1,dilation1=1,cwhn=0	2.06	2.06	1.00
CUDA0	CONV_2D	ne_input=[19,19,256,16],ne_kernel=[4,4,256,4096],type_kernel=f16,stride0=1,stride1=1,padding0=0,padding1=0,dilation0=1,dilation1=1,cwhn=0	2.26	2.33	1.03
CUDA0	CONV_2D	ne_input=[19,19,256,16],ne_kernel=[4,4,256,4096],type_kernel=f32,stride0=1,stride1=1,padding0=0,padding1=0,dilation0=1,dilation1=1,cwhn=0	2.36	2.37	1.00
CUDA0	CONV_2D	ne_input=[19,19,4,16],ne_kernel=[2,2,4,4],type_kernel=f16,stride0=1,stride1=1,padding0=0,padding1=0,dilation0=1,dilation1=1,cwhn=0	0.16	0.16	0.99
CUDA0	CONV_2D	ne_input=[19,19,4,16],ne_kernel=[2,2,4,4],type_kernel=f32,stride0=1,stride1=1,padding0=0,padding1=0,dilation0=1,dilation1=1,cwhn=0	0.16	0.16	1.00
CUDA0	CONV_2D	ne_input=[19,19,8,16],ne_kernel=[4,4,8,128],type_kernel=f16,stride0=1,stride1=1,padding0=0,padding1=0,dilation0=1,dilation1=1,cwhn=0	1.87	1.93	1.03
CUDA0	CONV_2D	ne_input=[19,19,8,16],ne_kernel=[4,4,8,128],type_kernel=f32,stride0=1,stride1=1,padding0=0,padding1=0,dilation0=1,dilation1=1,cwhn=0	1.94	1.94	1.00
CUDA0	CONV_2D	ne_input=[19,19,8,16],ne_kernel=[4,4,8,130],type_kernel=f16,stride0=1,stride1=1,padding0=0,padding1=0,dilation0=1,dilation1=1,cwhn=0	1.81	1.86	1.03
CUDA0	CONV_2D	ne_input=[19,19,8,16],ne_kernel=[4,4,8,130],type_kernel=f32,stride0=1,stride1=1,padding0=0,padding1=0,dilation0=1,dilation1=1,cwhn=0	1.88	1.88	1.00
CUDA0	CONV_2D	ne_input=[224,224,1,1],ne_kernel=[2,2,1,8],type_kernel=f16,stride0=1,stride1=1,padding0=0,padding1=0,dilation0=1,dilation1=1,cwhn=0	0.26	0.26	1.00
CUDA0	CONV_2D	ne_input=[224,224,1,1],ne_kernel=[2,2,1,8],type_kernel=f32,stride0=1,stride1=1,padding0=0,padding1=0,dilation0=1,dilation1=1,cwhn=0	0.27	0.27	1.00
CUDA0	CONV_2D	ne_input=[224,224,1,8],ne_kernel=[2,2,1,8],type_kernel=f16,stride0=1,stride1=1,padding0=0,padding1=0,dilation0=1,dilation1=1,cwhn=0	0.31	0.32	1.00
CUDA0	CONV_2D	ne_input=[224,224,1,8],ne_kernel=[2,2,1,8],type_kernel=f32,stride0=1,stride1=1,padding0=0,padding1=0,dilation0=1,dilation1=1,cwhn=0	0.32	0.32	1.00
CUDA0	CONV_2D	ne_input=[224,224,3,1],ne_kernel=[3,3,3,8],type_kernel=f16,stride0=1,stride1=1,padding0=0,padding1=0,dilation0=1,dilation1=1,cwhn=0	1.05	1.05	1.01
CUDA0	CONV_2D	ne_input=[224,224,3,1],ne_kernel=[3,3,3,8],type_kernel=f32,stride0=1,stride1=1,padding0=0,padding1=0,dilation0=1,dilation1=1,cwhn=0	1.09	1.09	1.00
CUDA0	CONV_2D	ne_input=[58,58,32,1],ne_kernel=[3,3,32,64],type_kernel=f16,stride0=1,stride1=1,padding0=0,padding1=0,dilation0=1,dilation1=1,cwhn=0	1.49	1.49	1.00
CUDA0	CONV_2D	ne_input=[58,58,32,1],ne_kernel=[3,3,32,64],type_kernel=f32,stride0=1,stride1=1,padding0=0,padding1=0,dilation0=1,dilation1=1,cwhn=0	1.57	1.57	1.00
CUDA0	CONV_2D	ne_input=[58,58,32,8],ne_kernel=[3,3,32,64],type_kernel=f16,stride0=1,stride1=1,padding0=0,padding1=0,dilation0=1,dilation1=1,cwhn=0	1.89	1.90	1.01
CUDA0	CONV_2D	ne_input=[58,58,32,8],ne_kernel=[3,3,32,64],type_kernel=f32,stride0=1,stride1=1,padding0=0,padding1=0,dilation0=1,dilation1=1,cwhn=0	2.00	2.00	1.00

CUDA: use FP32 arithmetic for conv2d

2721aaf

JohannesGaessler mentioned this pull request Aug 30, 2025

Compile bug: error: more than one conversion function from "half" to a built-in type applies #15680

Closed

github-actions bot added Nvidia GPU Issues specific to Nvidia GPUs ggml changes relating to the ggml tensor library for machine learning labels Aug 30, 2025

CISC approved these changes Aug 30, 2025

View reviewed changes

JohannesGaessler merged commit 38ad381 into ggml-org:master Aug 30, 2025
47 of 48 checks passed

walidbr pushed a commit to walidbr/llama.cpp that referenced this pull request Sep 7, 2025

CUDA: use FP32 arithmetic for conv2d (ggml-org#15683)

22506f6

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

CUDA: use FP32 arithmetic for conv2d #15683

CUDA: use FP32 arithmetic for conv2d #15683

Uh oh!

JohannesGaessler commented Aug 30, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

CUDA: use FP32 arithmetic for conv2d #15683

CUDA: use FP32 arithmetic for conv2d #15683

Uh oh!

Conversation

JohannesGaessler commented Aug 30, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants