Model is offloaded to GPU RAM but execution happens in CPU cores - CLBlast #2652
-
Hi, I was able to compile llama.cpp with CLBlast. However when I run inference, the model layers do get loaded on the GPU Memory (identified by memory utilization) however, the computation is still happening in the CPU core and not in the GPU execution units. I've seen similar issues reported with other GPUs but wasn't able to find a soliton that works for me Command used: |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment
-
GPU acceleration for generation is currently not supported with CLBlast |
Beta Was this translation helpful? Give feedback.
GPU acceleration for generation is currently not supported with CLBlast
#1319 (comment)