Skip to content
Discussion options

You must be logged in to vote

The CUDA backend also does not support BF16, so most of the model is running on the CPU. Try a F16 model instead.

Also note that -ngl -1 does not work the way you might expect, no layers will be offloaded that way. Use a large number to offload the entire model instead, eg. -ngl 99.

Replies: 1 comment 5 replies

Comment options

You must be logged in to vote
5 replies
@robbiemu
Comment options

@slaren
Comment options

@robbiemu
Comment options

@slaren
Comment options

@robbiemu
Comment options

Answer selected by robbiemu
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
2 participants