🚀 Feature Launch: Intelligent Model Routing arrives in v0.11.2 #12375

abhipatel12 · 2025-10-31T15:11:53Z

abhipatel12
Oct 31, 2025
Maintainer

Hey everyone!

We're excited to announce that Intelligent Model Routing has graduated from our experimental channel post and is now standard in version 0.11.2!

A huge thank you to everyone who tested this in preview and provided the feedback needed to stabilize this feature.

Why Model Routing? 🤔

Until now, Gemini CLI used one primary model (like Gemini 2.5 Pro) for your entire session. While fantastic for complex tasks, using such a heavy-duty model for simpler requests often increased latency and burned through your Pro quota faster than necessary.

Intelligent Model Routing solves this by dynamically directing your requests to the model best suited for the job. Simple queries are handled instantly by faster, lighter models (like Flash), while complex analytical or creative tasks continue to leverage the full power of Pro.

By upgrading, you gain:

📉 Reduced latency for a snappier, more responsive command-line experience.
💰 Extended quotas by automatically reserving the Pro model only for tasks that truly require it.

How to get it 📦

Routing is now enabled by default in our latest stable release. No configuration required.

Just update to the latest version:

npm install -g @google/gemini-cli@latest

Taking Control: The new `/model` slash command 🛠️

While the router is great for 95% of cases, sometimes you need guaranteed raw power, or perhaps you specifically want to iterate quickly on small tasks with a lighter model.

We’ve introduced /model, allowing you to instantly switch routing behavior mid-session without restarting Gemini CLI.

This will open an interactive dialog to help you switch between routing and models mid-session!

(You can also still use the -m startup flag when starting with gemini).

Here's what it looks like!

Verifying it's active ⚙️

You can verify routing is active by running the standard /stats command during a session. You will see an increase in requests to lighter models (like gemini-2.5-flash) alongside your usual Pro requests. You'll also see a new, small stream of gemini-2.5-flash-lite model requests that we use to help route your request!

Feedback 💭

While this feature is now stable, we are always looking to refine the routing logic. Please continue to share your experiences in this thread, specifically if you encounter cases where the router consistently underproduces on complex tasks.

Have questions? If you're curious about the mechanics of how the router makes decisions, or how this fits into your specific workflows, please ask away in this thread! We're happy to dive into the details.

Thanks for helping us build Gemini CLI! ❤️

Duxon · 2025-11-05T13:42:44Z

Duxon
Nov 5, 2025

works quite well, thanks! for the automatic setting, it might be nice to see a model indicator in the UI to know which model produced the response. by knowing this, I could more readily understand the trust I can put into the response from experience with those three models.

0 replies

dezmen3 · 2025-11-27T16:31:19Z

dezmen3
Nov 27, 2025

Awful future, suggest to disable it for good. It's switching between different models and may ruin your project by removing important code lines.

2 replies

yaohsien-md Nov 27, 2025

Awful future, suggest to disable it for good. It's switching between different models and may ruin your project by removing important code lines.

I don't understand. How does switching between different models ruin our project?

dezmen3 Nov 27, 2025

#13984
Answer is pretty obvious, different models work in different ways unlike same constant model, so, basically, anything can happen

John-Gee · 2026-01-09T10:58:22Z

John-Gee
Jan 9, 2026

@abhipatel12
It breaks usage of gemini-cli for me, I often run out of 2.5 flash and because it's trying to use that it tells me I'm out of credits, even though I still can use 3pro fine...

The router should be smart enough to not try to use a model that is out of token when others are not. Plus it's not that clear what is happening, in the /model settings I'm on Auto for 3 yet it complains about being unable to use 2.5 without telling me that maybe I can switch to another model. For a while actually I thought I was all out and it was trying to weakest one as a last recourse, it's only yesterday that I switched to manual 3pro and realized it worked just fine...
Of course other things in the background still tried to use 2.5flash even though I was out... (the compaction one I think).

1 reply

eggp Jan 14, 2026

I had a similar experience, but what’s even stranger now is that in Gemini, no matter what I switch it to, it tells me to wait. In Antigravity, however, I can continue without any issues using Gem 3 Pro. Strange.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

🚀 Feature Launch: Intelligent Model Routing arrives in v0.11.2 #12375

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 3 comments 3 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

🚀 Feature Launch: Intelligent Model Routing arrives in v0.11.2 #12375

Uh oh!

Uh oh!

abhipatel12 Oct 31, 2025 Maintainer

Why Model Routing? 🤔

How to get it 📦

Taking Control: The new /model slash command 🛠️

Verifying it's active ⚙️

Feedback 💭

Replies: 3 comments · 3 replies

Uh oh!

Duxon Nov 5, 2025

Uh oh!

dezmen3 Nov 27, 2025

Uh oh!

yaohsien-md Nov 27, 2025

Uh oh!

dezmen3 Nov 27, 2025

Uh oh!

John-Gee Jan 9, 2026

Uh oh!

eggp Jan 14, 2026

abhipatel12
Oct 31, 2025
Maintainer

Taking Control: The new `/model` slash command 🛠️

Replies: 3 comments 3 replies

Duxon
Nov 5, 2025

dezmen3
Nov 27, 2025

John-Gee
Jan 9, 2026