So, how exactly does Gemini CLI decide between Pro and Flash? #3064
Locked
ryanjsalva
announced in
Announcements
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
It’s a good question. For those devs logging in with your Google account, our goal is to deliver the best possible experience at the keyboard – ideally, one where you never have to stop work because you hit a limit. To do that, we have to balance model choice with capacity. Thus, Gemini CLI uses a blend of Gemini 2.5 Pro and Flash when you use a Google login rather than an API key.
Our near-term goal is to use intelligent routing to make the best use of each model. For example, Pro is overkill for a lot of really simple steps better routed to Flash (e.g. “start the npm server”). Pro is better suited for big, complex tasks that require reasoning (e.g. "write integration tests for
x
andy
microservices"). Today, we might use Flash to consider next steps after a failed tool call, or confirm that a model response fully satisfied your prompt. Example:We also fallback from Pro to Flash when there are two or more slow responses. When Gemini CLI falls back, it stays with Flash for the remainder of the session. Because of the (frankly, overwhelming 🤗 ) developer response in the first week of availability, our service has returned more slow response times than we'd like. But we're working to add capacity quickly.
While I know many of you have asked, we do not have any plans to offer model choice within the free tier. As always, if you want to use a specific model, you can always use an API Key.
We’re at the beginning of our release journey. There are still a lot of improvements we can make to improve planning and orchestration. If we get it right, you won’t have to think about which model is being used. Thanks, and please keep the feedback, questions, and contributions coming. 🙏 Happy coding!
Beta Was this translation helpful? Give feedback.
All reactions