Protip: Context compression works best at 25% on long context models (100k+) #726
devlux76
started this conversation in
4. Show and tell
Replies: 1 comment
-
Thanks so much for sharing @devlux76 I think that's in line with what folks have been seeing - LLMs get awful once they are above 50-60% context...so even context compression at that point is compromised. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
I was having an issue where context compression was redacting too much information.
The underlaying problem is that for large context models (110+k) the compression results in at most 15k of output and that's using Gemini with a 1M token context window to perform the compression.
As it turns out, triggering at 25% instead of 100% makes context compression keep most of the relevant context.
The results for me at least are night and day because I'm using exclusively, GPT 4.1 and Claude for the heavy lifting (code/debug) and Gemini for Architect with Devstral for Orchestration. These are all large context models but do work best when you keep the context fill under 50%, so 25% allows about half of that to be used with another half available for current work.
Thanks to Kilocode team for such an awesome product!
Beta Was this translation helpful? Give feedback.
All reactions