Auto-completion is silently draining Pro/Preview model quotas - This logic is fundamentally broken. #17278

chun79 · 2026-01-22T05:45:38Z

chun79
Jan 22, 2026

Auto-completion is silently draining Pro/Preview model quotas - This logic is fundamentally broken.

To the Gemini CLI Development Team,

I am writing to report a severe design oversight in the current CLI implementation that is actively ruining the user experience for high-tier models.

The Problem:
The "Prompt Completion" feature currently uses the active chat model for prediction. This is a catastrophic decision when a user selects a rate-limited model like gemini-3-pro-preview.

Why this is unacceptable:

Quota Annihilation: I selected the Preview model to test its reasoning capabilities, but before I could even finish typing one single prompt, the background auto-completion requests consumed my entire Daily Request Limit (RPD).
Waste of Resources: Using a heavy reasoning model (like 3.0 Pro) just to predict the next few characters of a user's sentence is a massive waste of computational resources. It is like using a supercomputer to check for spelling errors.
Broken UX: I was locked out of the model (API Error 429) without sending a single actual message. This makes the CLI practically unusable with high-tier models unless I manually disable core features.

This implementation is illogical. High-reasoning models should NEVER be used for latency-sensitive, low-value tasks like text completion.

Required Fix:
Decouple the completion model immediately.

Force the auto-completion feature to always use a lightweight, high-quota model (e.g., gemini-2.0-flash), regardless of the user's selected chat model.
Or, provide a completionModel setting in settings.json so we can configure this manually.

Please prioritize this fix. The current behavior penalizes users for trying to use your advanced models.

Regards,

aniruddhaadak80 · 2026-03-09T19:05:44Z

aniruddhaadak80
Mar 9, 2026

From my point of view, this is one of the clearest cases where an internal low value feature should be decoupled from the user selected high value model path. Autocomplete has very different latency and cost requirements from real reasoning turns, so tying them to the same model is a routing mistake, not just a quota problem. A dedicated completion model or explicit completionModel setting feels like the technically correct direction.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Auto-completion is silently draining Pro/Preview model quotas - This logic is fundamentally broken. #17278

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Auto-completion is silently draining Pro/Preview model quotas - This logic is fundamentally broken. #17278

Uh oh!

chun79 Jan 22, 2026

Replies: 1 comment

Uh oh!

aniruddhaadak80 Mar 9, 2026

chun79
Jan 22, 2026

aniruddhaadak80
Mar 9, 2026