Unable to inference on Quantized 70B Model using llama.cpp #2575

vatsarishabh22 · 2023-08-10T08:57:09Z

vatsarishabh22
Aug 10, 2023

Got error :
error loading model: llama.cpp: tensor 'layers.0.attention.wk.weight' has wrong shape; expected 8192 * 8192, got 8192 * 1024

I am using ubuntu linux

Answered by klosax

Use parameter -gqa 8 for 70b models to work.

klosax · 2023-08-10T21:27:03Z

Use parameter -gqa 8 for 70b models to work.

0 replies