Skip to content

Conversation

@CISC
Copy link
Contributor

@CISC CISC commented Apr 4, 2025

Copy link
Collaborator

@JohannesGaessler JohannesGaessler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code on master is deterministic in terms of results. What is not deterministic is how the work is distributed to CUDA blocks but I don't think that matters.

@CISC CISC requested a review from JohannesGaessler April 5, 2025 13:30
@CISC
Copy link
Contributor Author

CISC commented Apr 5, 2025

A quick benchmark after synchronization changes to show that it's still faster:

Model CPU GPU n_ubatch test t/s master t/s deterministic_k_copy_src1
bailingmoe 16B Q8_0 Core i7-9700K RTX 3090Ti 2048 pp1024 3230.35 ± 22.55 3962.19 ± 37.60
bailingmoe 16B Q8_0 Core i7-9700K RTX 3090Ti 2048 pp2048 3912.80 ± 24.77 4992.81 ± 43.04
bailingmoe 16B Q8_0 Core i7-9700K RTX 3090Ti 2048 pp4096 3685.56 ± 21.49 4631.87 ± 26.74

@ggerganov
Copy link
Member

@CISC Please preserve the authorship of these patches as it was already suggested (#1174 (comment)).

@CISC
Copy link
Contributor Author

CISC commented Apr 5, 2025

@CISC Please preserve the authorship of these patches as it was already suggested (#1174 (comment)).

It feels wrong to fake authorship of the actual commits. Attribution in the PR itself (which can not be a direct upstream of the code anyway) should be sufficient?

Anyway, maybe it's better to ask @ikawrakow directly what his feelings about this is?

Edit: I see it's possible to have different author and committer, but it seems the best the GitHub interface can do (which I have to use for most of my commits) is co-authorship.

@ikawrakow
Copy link
Contributor

Anyway, maybe it's better to ask @ikawrakow directly what his feelings about this is?

In general I don't mind people copying my code, else I wouldn't be publishing it under a MIT license. But llama.cpp is special in this regard, so I think it is best to do the right thing. But as IANAL, and as I would like to avoid that someone comes along and starts teaching me that because of the blah blah convention blah blah blah, I leave it up to you and the project owner to determine what the right thing is.

@ikawrakow
Copy link
Contributor

The code on master is deterministic in terms of results. What is not deterministic is how the work is distributed to CUDA blocks but I don't think that matters.

It is not my observation. I get a different result each time I run the code on llama.cpp master (on the same hardware, using the same parameters). The reason for this is that src1 rows end up in a different order in the contiguous copy, which then influence the result as floating point operations are not associative. It is not wrong, but it is still not deterministic. If in doubt, run a few perplexity calculations and observe how the values are different in each run on master, but are always the same with this version of the kernel.

@CISC
Copy link
Contributor Author

CISC commented Apr 5, 2025

In general I don't mind people copying my code, else I wouldn't be publishing it under a MIT license. But llama.cpp is special in this regard, so I think it is best to do the right thing. But as IANAL, and as I would like to avoid that someone comes along and starts teaching me that because of the blah blah convention blah blah blah, I leave it up to you and the project owner to determine what the right thing is.

What address would you like attributed, actual or the one in AUTHORS?

@JohannesGaessler
Copy link
Collaborator

It is not my observation. I get a different result each time I run the code on llama.cpp master (on the same hardware, using the same parameters). The reason for this is that src1 rows end up in a different order in the contiguous copy, which then influence the result as floating point operations are not associative. It is not wrong, but it is still not deterministic. If in doubt, run a few perplexity calculations and observe how the values are different in each run on master, but are always the same with this version of the kernel.

You're right, I misremembered how the code works.

@CISC
Copy link
Contributor Author

CISC commented Apr 5, 2025

I'm reopening a new PR with co-authorship, the address in AUTHORS seems to link correctly.

@CISC CISC closed this Apr 5, 2025
@CISC CISC deleted the deterministic_k_copy_src1 branch April 5, 2025 19:14
@ikawrakow
Copy link
Contributor

I'm reopening a new PR with co-authorship, the address in AUTHORS seems to link correctly.

Interesting approach. Because at some point in the past I contributed to this repository, from there on any original work I produce can be freely copied into here, and the attribution is covered by me being included in a list of a few hundred authors with a no-reply address?

@CISC
Copy link
Contributor Author

CISC commented Apr 6, 2025

Interesting approach. Because at some point in the past I contributed to this repository, from there on any original work I produce can be freely copied into here, and the attribution is covered by me being included in a list of a few hundred authors with a no-reply address?

Only work published under a compatible license of course. The AUTHORS list is generated from git history, which is the only manageable way in a project with so many contributors I guess.

Anyway, I'm more than happy to make any additional attributions you wish, it is your work, and I'm very grateful for it!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants