Add llama_memory_breakdown_print support by kusaanko · Pull Request #963 · utilityai/llama-cpp-rs

kusaanko · 2026-03-17T09:10:39Z

We can estimate how much memory will be allocated by setting the model's parameter, no_alloc, to true and printing the memory breakdown.

Otherwise, you can look into the detail memory usage.

By hooking the logger, users can see how much memory is needed in other than stdout.

You can estimate using this code

let param = LlamaModelParams::default().with_no_alloc(true);
// ...create LlamaContext
ctx.print_memory_breakdown();

2026-03-17T08:56:23.843326Z  INFO llama-cpp-2: | memory breakdown [MiB] | total   free    self   model   context   compute    unaccounted | module="llama.cpp::llama_memory_breakdown_print"
2026-03-17T08:56:23.843385Z  INFO llama-cpp-2: |   - Host               |                 1426 =  1067 +      56 +     302                | module="llama.cpp::llama_memory_breakdown_print"

kusaanko added 2 commits March 17, 2026 17:38

Add llama_memory_breakdown_print support

665277d

Disable use_mmap when no_alloc is true

f633910

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add llama_memory_breakdown_print support#963

Add llama_memory_breakdown_print support#963
kusaanko wants to merge 2 commits intoutilityai:mainfrom
kusaanko:feat/memory-breakdown

kusaanko commented Mar 17, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

kusaanko commented Mar 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

kusaanko commented Mar 17, 2026 •

edited

Loading