Protip: Context compression works best at 25% on long context models (100k+) #726

devlux76 · 2025-06-13T17:00:44Z

devlux76
Jun 13, 2025

I was having an issue where context compression was redacting too much information.
The underlaying problem is that for large context models (110+k) the compression results in at most 15k of output and that's using Gemini with a 1M token context window to perform the compression.

As it turns out, triggering at 25% instead of 100% makes context compression keep most of the relevant context.
The results for me at least are night and day because I'm using exclusively, GPT 4.1 and Claude for the heavy lifting (code/debug) and Gemini for Architect with Devstral for Orchestration. These are all large context models but do work best when you keep the context fill under 50%, so 25% allows about half of that to be used with another half available for current work.

Thanks to Kilocode team for such an awesome product!

olearycrew · 2025-06-13T17:54:10Z

olearycrew
Jun 13, 2025
Maintainer

Thanks so much for sharing @devlux76

I think that's in line with what folks have been seeing - LLMs get awful once they are above 50-60% context...so even context compression at that point is compromised.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Protip: Context compression works best at 25% on long context models (100k+) #726

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Protip: Context compression works best at 25% on long context models (100k+) #726

Uh oh!

devlux76 Jun 13, 2025

Replies: 1 comment

Uh oh!

olearycrew Jun 13, 2025 Maintainer

devlux76
Jun 13, 2025

olearycrew
Jun 13, 2025
Maintainer