Auto-completion is silently draining Pro/Preview model quotas - This logic is fundamentally broken. #17278
Unanswered
chun79
asked this question in
Community Support
Replies: 1 comment
-
|
From my point of view, this is one of the clearest cases where an internal low value feature should be decoupled from the user selected high value model path. Autocomplete has very different latency and cost requirements from real reasoning turns, so tying them to the same model is a routing mistake, not just a quota problem. A dedicated completion model or explicit completionModel setting feels like the technically correct direction. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Auto-completion is silently draining Pro/Preview model quotas - This logic is fundamentally broken.
To the Gemini CLI Development Team,
I am writing to report a severe design oversight in the current CLI implementation that is actively ruining the user experience for high-tier models.
The Problem:
The "Prompt Completion" feature currently uses the active chat model for prediction. This is a catastrophic decision when a user selects a rate-limited model like
gemini-3-pro-preview.Why this is unacceptable:
This implementation is illogical. High-reasoning models should NEVER be used for latency-sensitive, low-value tasks like text completion.
Required Fix:
Decouple the completion model immediately.
gemini-2.0-flash), regardless of the user's selected chat model.completionModelsetting insettings.jsonso we can configure this manually.Please prioritize this fix. The current behavior penalizes users for trying to use your advanced models.
Regards,
Beta Was this translation helpful? Give feedback.
All reactions