CUDA: fix GET_ROWS for large tensors #15882

JohannesGaessler · 2025-09-08T20:58:46Z

Fixes GET_ROWS for CUDA when ne11*ne12 > 65535 by adding one extra dimension for the loops. The code could maybe be more optimal if one were to dynamically swap which dimension uses the 32 bit x dimension of the CUDA grid but I think that would not be worth the opportunity cost. (But this kernel would also be applicable for fastdiv).

CUDA: fix GET_ROWS for large tensors

73f262a

github-actions bot added Nvidia GPU Issues specific to Nvidia GPUs ggml changes relating to the ggml tensor library for machine learning labels Sep 8, 2025

ggerganov approved these changes Sep 9, 2025

View reviewed changes

JohannesGaessler merged commit 550cf72 into ggml-org:master Sep 9, 2025
48 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

CUDA: fix GET_ROWS for large tensors #15882

CUDA: fix GET_ROWS for large tensors #15882

Uh oh!

JohannesGaessler commented Sep 8, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

CUDA: fix GET_ROWS for large tensors #15882

CUDA: fix GET_ROWS for large tensors #15882

Uh oh!

Conversation

JohannesGaessler commented Sep 8, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants