model routing appears to be very good at protecting 2.5-pro quota. What's the catch? #12056
timrichardson
started this conversation in
General
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
A few hours ago I enabled model routing in my stable release. It seems to be an exceptionally good feature. In my current session, it shows 69 requests to 2.5-pro, 21 to 2.5-flash and 563 to 2.5-flash-lite. It is definitely protecting my quota of pro queries, and I don't notice any decline in usefulness, with a fairly large python code base.
/stats model reports 347 errors for the 563 flash-lite requests, but I see no evidence of that in my experience.
After a few hours, this looks like a transformative feature, consider that by now I would have been well out of my AI Pro quota which is I think 200 2.5 queries.
I am surprised this feature is not on by default. Or I have failed to detect some downside of it.
Beta Was this translation helpful? Give feedback.
All reactions