Skip to content
Discussion options

You must be logged in to vote

Generally, prompt processing of offloaded to the GPU, copying the weights if necessary, but generation is done on the device that holds the weights. If you want a detailed list of operations and the devices they are ran on, run llama.cpp with the environment variable GGML_SCHED_DEBUG=2 and enable debug output with -v.

Replies: 1 comment

Comment options

You must be logged in to vote
0 replies
Answer selected by wallentri88
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
2 participants