is Dynamic context Size possible for llama.cpp Server? #14955

BVEsun · 2025-07-30T02:51:10Z

BVEsun
Jul 30, 2025

As most of the prompt only use below 16k context size,
but now some model support 256k as Qwen/Qwen3-30B-A3B-Instruct-2507

Could the llama.cpp Server just add another flag beside:
-c, --ctx-size N

-c-min, --ctx-size-min N

may be like -c 262144 -c-min 16384

Then it will only load 16384 at start up and in case prompt context exceed 16384 but below 262144, then the model loading necessary context size 2048 x n > necessary context size to process it.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

is Dynamic context Size possible for llama.cpp Server? #14955

Uh oh!

{{title}}

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

is Dynamic context Size possible for llama.cpp Server? #14955

Uh oh!

BVEsun Jul 30, 2025

Replies: 0 comments

BVEsun
Jul 30, 2025