Tips for improving prompt processing speed. #1699

mercurial-moon · 2025-08-21T01:00:11Z

mercurial-moon
Aug 21, 2025

Hi i have system that has a Nivdia GPU with low VRAM(2GB) but system ram is 24GB DDR4 and cpu is decent Intel 11th Gen i5 processor.

I tested with a large initial prompt of around 8000 tokens and it takes around 40mins to process the prompt before generating the reply at around 2-3 tok/sec. Model used was Gemma3 27B 4bit quantized IQuants

The kobold settings were running in CPU mode, no offload, no flash attention, context shift was on, high priority off, kv cache was not quantized, Max Context was set to around 14000. I was running the kobold.exe (the cuda one) binary. The prompt was sent via koboldcpp api.

Is there any way to improve prompt processing time, would the next prompt take lesser time than this, due to context shift.
Is context shift another name for prompt caching?

My prompt which is 8000 tokens has mostly a static part that is around 7500 tokens and only the 500 tokens will change across requests, can I preprocess that 7500 tokens so as to not let it re-evaluate it on every request?
I'm thinking of something on these lines, send 7500 tokens (static part) to kobold using some special api command kobold processes it and caches it. Next I send the dynamic portion of the prompt and kobold processing the remain 500 tokens and starts generating reply.
On subsequent queries again I send a new dynamic portion but kobold still uses the precalculated static part.
Is something like this possible?

LostRuins · 2025-08-21T03:27:24Z

LostRuins
Aug 21, 2025
Maintainer

Yes, prompts get cached if they have the same prefix. If you send two prompts in sequence.

Hello, how are you today?

and then

Hello, how are you today? I am fine!

then only the I am fine! part needs to be processed.

Also for faster speeds, try running it in CUDA mode with 0 layers offloaded.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Tips for improving prompt processing speed. #1699

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Tips for improving prompt processing speed. #1699

Uh oh!

Uh oh!

mercurial-moon Aug 21, 2025

Replies: 1 comment

Uh oh!

LostRuins Aug 21, 2025 Maintainer

mercurial-moon
Aug 21, 2025

LostRuins
Aug 21, 2025
Maintainer