Skip to content

Conversation

@JohannesGaessler
Copy link
Collaborator

On master the backward pass for matrix multiplication does not work correctly when the broadcasting for GQA is involved. However, this is not being detected because all of the relevant gradient tests are being skipped for speed. This PR fixes the backward pass and adds CUDA support. To make the backward pass work I am adding an extra parameter to ggml_repeat_back because the GQA broadcasting is different from e.g. the one in ggml_repeat.

This PR also adds minor fixes to other backward passes. After this PR it should not be necessary to make further changes to ggml ops for #10544 .

@github-actions github-actions bot added testing Everything test related Nvidia GPU Issues specific to Nvidia GPUs ggml changes relating to the ggml tensor library for machine learning labels Jan 23, 2025
@ggerganov
Copy link
Member

Can the adjacent logic be performed automatically without explicitly passing the argument to ggml_repeat_back(). Not 100% sure, but maybe checking if the repeat operation requires broadcast (i.e. nr1 > 1 || nr2 > 1) then use the adjacent == true branch? I could be missing something though.

@JohannesGaessler
Copy link
Collaborator Author

No, the problem is that the shape is the same but that different values need to be iterated over. Although now that I'm writing this I'm realizing that you could get the same result by interjecting a call to ggml_view and adding CUDA support for noncontiguous inputs. I'll do that instead.

@ggerganov
Copy link
Member

Yup, sounds like a better alternative.

@JohannesGaessler
Copy link
Collaborator Author

I found and fixed another bug in the CUDA code for OUT_PROD related to dimension 1 not being contiguous.

@JohannesGaessler JohannesGaessler merged commit 8137b4b into ggml-org:master Jan 24, 2025
45 checks passed
anagri pushed a commit to BodhiSearch/llama.cpp that referenced this pull request Jan 26, 2025
tinglou pushed a commit to tinglou/llama.cpp that referenced this pull request Feb 13, 2025
arthw pushed a commit to arthw/llama.cpp that referenced this pull request Feb 26, 2025
mglambda pushed a commit to mglambda/llama.cpp that referenced this pull request Mar 8, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ggml changes relating to the ggml tensor library for machine learning Nvidia GPU Issues specific to Nvidia GPUs testing Everything test related

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants