You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
As most of the prompt only use below 16k context size,
but now some model support 256k as Qwen/Qwen3-30B-A3B-Instruct-2507
Could the llama.cpp Server just add another flag beside:
-c, --ctx-size N
-c-min, --ctx-size-min N
may be like -c 262144 -c-min 16384
Then it will only load 16384 at start up and in case prompt context exceed 16384 but below 262144, then the model loading necessary context size 2048 x n > necessary context size to process it.
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
Uh oh!
There was an error while loading. Please reload this page.
-
As most of the prompt only use below 16k context size,
but now some model support 256k as Qwen/Qwen3-30B-A3B-Instruct-2507
Could the llama.cpp Server just add another flag beside:
-c, --ctx-size N
-c-min, --ctx-size-min N
may be like -c 262144 -c-min 16384
Then it will only load 16384 at start up and in case prompt context exceed 16384 but below 262144, then the model loading necessary context size 2048 x n > necessary context size to process it.
Beta Was this translation helpful? Give feedback.
All reactions