cuda : deterministic and faster k_copy_src1_to_contiguous #1178

CISC · 2025-04-04T20:37:47Z

Upstreamed from ikawrakow/ik_llama.cpp#283 and ikawrakow/ik_llama.cpp#313

JohannesGaessler

The code on master is deterministic in terms of results. What is not deterministic is how the work is distributed to CUDA blocks but I don't think that matters.

src/ggml-cuda/ggml-cuda.cu

CISC · 2025-04-05T13:44:38Z

A quick benchmark after synchronization changes to show that it's still faster:

Model	CPU	GPU	n_ubatch	test	t/s master	t/s deterministic_k_copy_src1
bailingmoe 16B Q8_0	Core i7-9700K	RTX 3090Ti	2048	pp1024	3230.35 ± 22.55	3962.19 ± 37.60
bailingmoe 16B Q8_0	Core i7-9700K	RTX 3090Ti	2048	pp2048	3912.80 ± 24.77	4992.81 ± 43.04
bailingmoe 16B Q8_0	Core i7-9700K	RTX 3090Ti	2048	pp4096	3685.56 ± 21.49	4631.87 ± 26.74

ggerganov · 2025-04-05T14:31:00Z

@CISC Please preserve the authorship of these patches as it was already suggested (#1174 (comment)).

CISC · 2025-04-05T14:46:59Z

@CISC Please preserve the authorship of these patches as it was already suggested (#1174 (comment)).

It feels wrong to fake authorship of the actual commits. Attribution in the PR itself (which can not be a direct upstream of the code anyway) should be sufficient?

Anyway, maybe it's better to ask @ikawrakow directly what his feelings about this is?

Edit: I see it's possible to have different author and committer, but it seems the best the GitHub interface can do (which I have to use for most of my commits) is co-authorship.

ikawrakow · 2025-04-05T15:31:02Z

Anyway, maybe it's better to ask @ikawrakow directly what his feelings about this is?

In general I don't mind people copying my code, else I wouldn't be publishing it under a MIT license. But llama.cpp is special in this regard, so I think it is best to do the right thing. But as IANAL, and as I would like to avoid that someone comes along and starts teaching me that because of the blah blah convention blah blah blah, I leave it up to you and the project owner to determine what the right thing is.

ikawrakow · 2025-04-05T15:34:22Z

The code on master is deterministic in terms of results. What is not deterministic is how the work is distributed to CUDA blocks but I don't think that matters.

It is not my observation. I get a different result each time I run the code on llama.cpp master (on the same hardware, using the same parameters). The reason for this is that src1 rows end up in a different order in the contiguous copy, which then influence the result as floating point operations are not associative. It is not wrong, but it is still not deterministic. If in doubt, run a few perplexity calculations and observe how the values are different in each run on master, but are always the same with this version of the kernel.

CISC · 2025-04-05T15:36:33Z

In general I don't mind people copying my code, else I wouldn't be publishing it under a MIT license. But llama.cpp is special in this regard, so I think it is best to do the right thing. But as IANAL, and as I would like to avoid that someone comes along and starts teaching me that because of the blah blah convention blah blah blah, I leave it up to you and the project owner to determine what the right thing is.

What address would you like attributed, actual or the one in AUTHORS?

JohannesGaessler · 2025-04-05T16:20:26Z

It is not my observation. I get a different result each time I run the code on llama.cpp master (on the same hardware, using the same parameters). The reason for this is that src1 rows end up in a different order in the contiguous copy, which then influence the result as floating point operations are not associative. It is not wrong, but it is still not deterministic. If in doubt, run a few perplexity calculations and observe how the values are different in each run on master, but are always the same with this version of the kernel.

You're right, I misremembered how the code works.

CISC · 2025-04-05T19:11:24Z

I'm reopening a new PR with co-authorship, the address in AUTHORS seems to link correctly.

ikawrakow · 2025-04-06T06:47:04Z

I'm reopening a new PR with co-authorship, the address in AUTHORS seems to link correctly.

Interesting approach. Because at some point in the past I contributed to this repository, from there on any original work I produce can be freely copied into here, and the attribution is covered by me being included in a list of a few hundred authors with a no-reply address?

CISC · 2025-04-06T08:12:07Z

Interesting approach. Because at some point in the past I contributed to this repository, from there on any original work I produce can be freely copied into here, and the attribution is covered by me being included in a list of a few hundred authors with a no-reply address?

Only work published under a compatible license of course. The AUTHORS list is generated from git history, which is the only manageable way in a project with so many contributors I guess.

Anyway, I'm more than happy to make any additional attributions you wish, it is your work, and I'm very grateful for it!

deterministic and faster k_copy_src1_to_contiguous

288c7dc

JohannesGaessler requested changes Apr 5, 2025

View reviewed changes

src/ggml-cuda/ggml-cuda.cu Show resolved Hide resolved

JohannesGaessler reviewed Apr 5, 2025

View reviewed changes

src/ggml-cuda/ggml-cuda.cu Show resolved Hide resolved

missing synchronization

9251e34

CISC requested a review from JohannesGaessler April 5, 2025 13:30

CISC closed this Apr 5, 2025

CISC mentioned this pull request Apr 5, 2025

cuda : deterministic and faster k_copy_src1_to_contiguous #1179

Closed

CISC deleted the deterministic_k_copy_src1 branch April 5, 2025 19:14

JohannesGaessler mentioned this pull request May 3, 2025

Eval bug: Heavy nondeterminism in Qwen3 MoE (CUDA) ggml-org/llama.cpp#13280

Closed

cuda : deterministic and faster k_copy_src1_to_contiguous #1178

cuda : deterministic and faster k_copy_src1_to_contiguous #1178

Uh oh!

Conversation

CISC commented Apr 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

JohannesGaessler left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

CISC commented Apr 5, 2025

Uh oh!

ggerganov commented Apr 5, 2025

Uh oh!

CISC commented Apr 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ikawrakow commented Apr 5, 2025

Uh oh!

ikawrakow commented Apr 5, 2025

Uh oh!

CISC commented Apr 5, 2025

Uh oh!

JohannesGaessler commented Apr 5, 2025

Uh oh!

CISC commented Apr 5, 2025

Uh oh!

ikawrakow commented Apr 6, 2025

Uh oh!

CISC commented Apr 6, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

CISC commented Apr 4, 2025 •

edited

Loading

CISC commented Apr 5, 2025 •

edited

Loading