Enable CUDA graphs for embed gemma 300m #16844

ArshM17-NV · 2025-10-29T14:04:34Z

Summary:

CUDA Graphs (CG) were being disabled for the Embedding Gemma model due to some heuristics. This PR addresses those cases to enable CG usage and improve performance.

Fixes:

Add-op heuristic:
- Location: ggml-cuda.cu#L2831-L2849
- Issue: The heuristic incorrectly triggered for Embedding Gemma which disabled CUDA Graphs
- Fix: Skipped the heuristic for specific nodes that erroneously trigger it. Verified that the selected node names are unique and do not appear in other architectures.

Results:

Performance before:

Performance after:

ArshM17-NV · 2025-10-31T05:03:53Z

@CISC @slaren please review this PR when you get a chance. Thanks!

slaren · 2025-10-31T10:01:38Z

The problem I see with this approach is that it will only work if every sequence is the same length. If the sequence length changes, then the CUDA graph will need to be rebuilt. I am not sure how common is that in practice.

ggerganov · 2025-10-31T14:05:22Z

I am not sure how common is that in practice.

Likely it is going to occur very often - my impression is most user code does not pad the input embeddings so they typically have varying lengths.

We can add logic in llama_encode to do padding automatically. Could be optional through a llama_context_param parameter?

Enable CUDA graphs for embed gemma 300m

61cff1b

ArshM17-NV requested review from CISC and slaren as code owners October 29, 2025 14:04

github-actions bot added Nvidia GPU Issues specific to Nvidia GPUs ggml changes relating to the ggml tensor library for machine learning labels Oct 29, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Enable CUDA graphs for embed gemma 300m #16844

Enable CUDA graphs for embed gemma 300m #16844

ArshM17-NV commented Oct 29, 2025

Uh oh!

ArshM17-NV commented Oct 31, 2025

Uh oh!

slaren commented Oct 31, 2025

Uh oh!

ggerganov commented Oct 31, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Enable CUDA graphs for embed gemma 300m #16844

Are you sure you want to change the base?

Enable CUDA graphs for embed gemma 300m #16844

Conversation

ArshM17-NV commented Oct 29, 2025

Summary:

Fixes:

Results:

Performance before:

Performance after:

Uh oh!

ArshM17-NV commented Oct 31, 2025

Uh oh!

slaren commented Oct 31, 2025

Uh oh!

ggerganov commented Oct 31, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants