Model is offloaded to GPU RAM but execution happens in CPU cores - CLBlast #2652

sheikmohdimran · 2023-08-18T04:22:51Z

sheikmohdimran
Aug 18, 2023

Hi,

I was able to compile llama.cpp with CLBlast. However when I run inference, the model layers do get loaded on the GPU Memory (identified by memory utilization) however, the computation is still happening in the CPU core and not in the GPU execution units.

I've seen similar issues reported with other GPUs but wasn't able to find a soliton that works for me

Command used:
export LD_LIBRARY_PATH=/opt/intel/oneapi/compiler/latest/linux/compiler/lib/intel64_lin:$LD_LIBRARY_PATH
GGML_OPENCL_PLATFORM="Intel(R) OpenCL Graphics" GGML_OPENCL_DEVICE=0 ./main -m ~/wrkdir/llama-2-13b.ggmlv3.q4_0.bin -n 128 -ngl 40 -p "Building a website can be done in 10 simple steps:" --no-mmap

Answered by sheikmohdimran

Aug 18, 2023

GPU acceleration for generation is currently not supported with CLBlast
#1319 (comment)

View full answer

sheikmohdimran · 2023-08-18T07:14:08Z

sheikmohdimran
Aug 18, 2023
Author

GPU acceleration for generation is currently not supported with CLBlast
#1319 (comment)

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Model is offloaded to GPU RAM but execution happens in CPU cores - CLBlast #2652

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Model is offloaded to GPU RAM but execution happens in CPU cores - CLBlast #2652

Uh oh!

sheikmohdimran Aug 18, 2023

Replies: 1 comment

Uh oh!

sheikmohdimran Aug 18, 2023 Author

sheikmohdimran
Aug 18, 2023

sheikmohdimran
Aug 18, 2023
Author