Significant degradation after just using 20% of context? #5269
Replies: 5 comments 4 replies
-
Hello, Any update on this yet? :) |
Beta Was this translation helpful? Give feedback.
-
Some more reports on this on reddit: https://www.reddit.com/r/GeminiCLI/comments/1mewkku/any_tips_to_avoid_brain_farts/. |
Beta Was this translation helpful? Give feedback.
-
@NTaylorMullen could you have a look? |
Beta Was this translation helpful? Give feedback.
-
I know I sound like a broken record but this is exactly the nextSpeakerChecker problem. Instead of getting rid of it they switched to flash lite. Now it is very marginally faster but goes insane. I removed it from our downstream fork https://github.com/acoliver/llxprt-code and the performance increased and it no longer loops on larger contexts so far. I really can't express enough what a bad solution it is sending the entire context to flash multiple times per user interaction to ask if pro should continue. Fixing this makes the tool infinitely more usable. The other thing that helps is not making the context so huge. Tightening things like the tool calls... |
Beta Was this translation helpful? Give feedback.
-
Multiple times. Especially if you are using pro. https://github.com/google-gemini/gemini-cli/blob/main/packages/core/src/utils/nextSpeakerChecker.ts Called from So send a chat, tool calls happen, next speaker checker sends the whole conversation to flash (now flash lite) and asks if pro should continue. Flash lite tells pro to continue. Pro responds or edits something. Then nextSpeakerChecker runs again... In LLxprt Code we removed it and fixed some exposed asynch bugs. It works with Pro (paid and free) as well as other models. I could be mistaken but I've yet to see Pro sit there and wonder whether it should continue. It seems perfectly able without this check. The token burn is certainly lower, but more importantly my patience is tried less. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
So, I've been using gemini for a couple of weeks now.
One thing I've noticed consistently is that the performance significantly degrades after I've used around 15-20% of the context, i.e. I see
80% context left
or less.I understand there's a term for this called context rot, but I'd not imagine that to affect so early in the chat. Due to this reason, I always have to quit and start a new chat after this point. I've never reached a point where I used 50% of the available context.
To fellow users, have you also observed a similar pattern?
To Googlers working on it, have you internally tested specifically for the context rot after we've used more than 20%?
Next time it happens, I can provide a more detailed example. But its usually of the form that the model would stop honoring some o the conditions in the context file OR start going in loops OR start taking longer for tasks OR not really provide me good enough solutions. And it doesn't do any of that in a fresh chat. Its quite good when I start a fresh chat and upto the point I've used 10% of the context.
Beta Was this translation helpful? Give feedback.
All reactions