CUDA: don't convert BF16 weights to FP32 #1174

CISC · 2025-04-02T21:01:58Z

Upstreamed from ikawrakow/ik_llama.cpp#40

JohannesGaessler · 2025-04-02T22:12:19Z

Did you ask I. Kawrakow for permission to upstream this code? I'm specifically asking because there previously was conflict over attribution.

CISC · 2025-04-03T06:31:57Z

Did you ask I. Kawrakow for permission to upstream this code? I'm specifically asking because there previously was conflict over attribution.

If so I guess he changed his mind:
ikawrakow/ik_llama.cpp#256 (comment)

Green-Sky · 2025-04-03T10:23:44Z

Even so, attribution is simple in git. just add another author.

src/ggml-cuda/convert.cu

src/ggml-cuda/ggml-cuda.cu

JohannesGaessler · 2025-04-03T12:24:16Z

src/ggml-cuda/ggml-cuda.cu

It would be nice if the FP16 and BF16 code in ggml_cuda_op_mul_mat were deduplicated but I won't block merging the PR if you don't. In that case please add a corresponding // TODO comment though.

I'm not sure I follow; deduplicated how?

You could write a template with a typename that is either half or nv_bfloat16, use it as the type for the memory pool, and conditionally set the parameters for cuBLAS.

JohannesGaessler · 2025-04-03T13:02:24Z

I just noticed that the IK implementation dates back to September of 2024. At this point in time the llama.cpp upstream repository had no CUDA BF16 support whatsoever. In January of 2025 I added BF16 support in ggml-org/llama.cpp#11093 . Did you confirm that this PR improves performance vs. the current llama.cpp master branch?

CISC · 2025-04-03T13:05:59Z

I just noticed that the IK implementation dates back to September of 2024. At this point in time the llama.cpp upstream repository had no CUDA BF16 support whatsoever. In January of 2025 I added BF16 support in ggml-org/llama.cpp#11093 . Did you confirm that this PR improves performance vs. the current llama.cpp master branch?

I did not benchmark it, but I can do that tonight.

CISC · 2025-04-03T17:20:30Z

Here's some numbers that speak for themselves (TG is unchanged):

Model	CPU	GPU	n_batch	test	t/s master	t/s cuda-bf16-support
qwen2 1B BF16	Core i7-9700K	RTX 3090Ti	128	pp1024	8060.09 ± 11.45	20496.05 ± 21.41
qwen2 1B BF16	Core i7-9700K	RTX 3090Ti	256	pp1024	13309.36 ± 4.19	25874.25 ± 30.06
qwen2 1B BF16	Core i7-9700K	RTX 3090Ti	512	pp1024	18651.31 ± 9.07	28498.72 ± 74.78
qwen2 1B BF16	Core i7-9700K	RTX 3090Ti	1024	pp1024	18848.49 ± 12.21	28934.40 ± 34.94

JohannesGaessler

I think the PR would be good to merge as-is. Unless there are more things you still want to add to it.

CISC · 2025-04-04T16:40:27Z

I think the PR would be good to merge as-is. Unless there are more things you still want to add to it.

That's all for now, I will probably upstream some more in other PRs though.

JohannesGaessler · 2025-04-04T19:06:31Z

I changed the title of the PR and the commit message to better reflect what the changes ended up being.

add bf16 support

57f8d2d

use convert_from_bf16_cuda instead of convert_unary_cuda for f32

7ec5085

JohannesGaessler reviewed Apr 3, 2025

View reviewed changes

revert 7ec5085

b7ff33b

move functionality into convert_unary with constexpr

c90376d

CISC requested a review from JohannesGaessler April 4, 2025 14:20

JohannesGaessler approved these changes Apr 4, 2025

View reviewed changes

JohannesGaessler merged commit ab9ed73 into ggml-org:master Apr 4, 2025
3 checks passed

CISC deleted the cuda-bf16-support branch April 4, 2025 19:05

JohannesGaessler changed the title ~~cuda : add bf16 support~~ CUDA: don't convert BF16 weights to FP32 Apr 4, 2025

ggerganov mentioned this pull request Apr 5, 2025

cuda : deterministic and faster k_copy_src1_to_contiguous #1178

Closed

CISC mentioned this pull request Apr 6, 2025

cuda : add missing copyright notice #1180

Open

CUDA: don't convert BF16 weights to FP32 #1174

CUDA: don't convert BF16 weights to FP32 #1174

Uh oh!

Conversation

CISC commented Apr 2, 2025

Uh oh!

JohannesGaessler commented Apr 2, 2025

Uh oh!

CISC commented Apr 3, 2025

Uh oh!

Green-Sky commented Apr 3, 2025

Uh oh!

Uh oh!

Uh oh!

JohannesGaessler Apr 3, 2025

Choose a reason for hiding this comment

Uh oh!

CISC Apr 3, 2025

Choose a reason for hiding this comment

Uh oh!

JohannesGaessler Apr 4, 2025

Choose a reason for hiding this comment

Uh oh!

JohannesGaessler commented Apr 3, 2025

Uh oh!

CISC commented Apr 3, 2025

Uh oh!

CISC commented Apr 3, 2025

Uh oh!

JohannesGaessler left a comment

Choose a reason for hiding this comment

Uh oh!

CISC commented Apr 4, 2025

Uh oh!

Uh oh!

JohannesGaessler commented Apr 4, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants