Skip to content

Conversation

pt13762104
Copy link
Contributor

Add a warning for special GTX series cards that have low performance due to MMA kernels.

@github-actions github-actions bot added Nvidia GPU Issues specific to Nvidia GPUs ggml changes relating to the ggml tensor library for machine learning labels Aug 25, 2025
Copy link
Collaborator

@JohannesGaessler JohannesGaessler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly just minor changes. I've revised the recommended compilation options since it's still possible to use GTX 16XX in conjunction with Ampere or newer.

Co-authored-by: Johannes Gäßler <[email protected]>
Copy link
Collaborator

@JohannesGaessler JohannesGaessler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the renaming of the vector you can't just accept the suggested change on Github because it needs to be changed consistently. If you fix that I'll merge this PR.

@pt13762104
Copy link
Contributor Author

pt13762104 commented Aug 25, 2025

It now compiles and runs successfully.

@JohannesGaessler JohannesGaessler merged commit 4c37636 into ggml-org:master Aug 26, 2025
48 checks passed
@CISC
Copy link
Collaborator

CISC commented Aug 26, 2025

This causes an abort unless you have compiled in turing support, see #15584

@pt13762104
Copy link
Contributor Author

pt13762104 commented Aug 26, 2025

I would guess that's due to 7d3e9fd#diff-cb2761994492f7839320b81765a6fc2a23c180e3ddfd4a51d6b0f618f8d76a69R277 ? A fix might be on the lines of replacing ggml_cuda_highest_compiled_arch to ggml_cuda_has_arch, something like
pt13762104/llama.cpp@f671559. Could anyone test if this works?

@XZVB12
Copy link

XZVB12 commented Aug 26, 2025

I would guess that's due to 7d3e9fd#diff-cb2761994492f7839320b81765a6fc2a23c180e3ddfd4a51d6b0f618f8d76a69R277 ? A fix might be on the lines of replacing ggml_cuda_highest_compiled_arch to ggml_cuda_has_arch, something like pt13762104@f671559. Could anyone test if this works?

seems to work, at least the model was successfully loaded and answer generated

@blueyred
Copy link

This commit seems to break llama-server on my system:
#15593

@JohannesGaessler
Copy link
Collaborator

Should be fixed with #15587 .

Minh141120 pushed a commit to menloresearch/llama.cpp that referenced this pull request Aug 27, 2025
* Add warning

* Print the devices names

* Add newlines

* Apply suggestions from code review

Co-authored-by: Johannes Gäßler <[email protected]>

* Fix vector names

---------

Co-authored-by: Johannes Gäßler <[email protected]>
Nexesenex added a commit to Nexesenex/croco.cpp that referenced this pull request Oct 6, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ggml changes relating to the ggml tensor library for machine learning Nvidia GPU Issues specific to Nvidia GPUs

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants