llama : allow other bufts when overriding to CPU, add --no-repack option #14990

slaren · 2025-07-31T13:40:28Z

When using --override-tensor to override to the CPU, other buffer types will be considered as well. In practice, what this means is that the host buffer types will be used, which may improve performance when prompt processing is offloaded (Note that mmap needs to be disabled to use host buffers).
Adds --no-repack (-nr) option to disable weight repacking.

llama-bench -m Qwen3-30B-A3B-Q4_0.gguf -ot exps=CPU -n 0 -p 32,64,128,256,512,1024 -ub 1024 -mmp 0:

Model	Test	t/s master	t/s sl/ot-repacking	Speedup
qwen3moe 30B.A3B Q4_0	pp32	15.03	22.62	1.50
qwen3moe 30B.A3B Q4_0	pp64	28.87	45.04	1.56
qwen3moe 30B.A3B Q4_0	pp128	61.06	89.35	1.46
qwen3moe 30B.A3B Q4_0	pp256	121.44	173.97	1.43
qwen3moe 30B.A3B Q4_0	pp512	227.41	309.59	1.36
qwen3moe 30B.A3B Q4_0	pp1024	421.50	594.32	1.41

…ion (ggml-org#14990)

llama : allow other bufts when overriding to CPU, add --no-repack option

f725b6b

ggerganov approved these changes Jul 31, 2025

View reviewed changes

slaren merged commit d6818d0 into master Jul 31, 2025
47 checks passed

slaren deleted the sl/ot-repacking branch July 31, 2025 16:11

samteezy mentioned this pull request Jul 31, 2025

Eval bug: Getting memory critical errors when using --no-mmap with MoE models #14999

Closed

Nexesenex pushed a commit to Nexesenex/croco.cpp that referenced this pull request Aug 1, 2025

llama : allow other bufts when overriding to CPU, add --no-repack opt…

8b86d0e

…ion (ggml-org#14990)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

llama : allow other bufts when overriding to CPU, add --no-repack option #14990

llama : allow other bufts when overriding to CPU, add --no-repack option #14990

Uh oh!

slaren commented Jul 31, 2025 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

llama : allow other bufts when overriding to CPU, add --no-repack option #14990

llama : allow other bufts when overriding to CPU, add --no-repack option #14990

Uh oh!

Conversation

slaren commented Jul 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

slaren commented Jul 31, 2025 •

edited

Loading