-
Notifications
You must be signed in to change notification settings - Fork 2.7k
Description
Which version of the app are you using?
v3.3.14
Which API Provider are you using?
Google Gemini
Which Model are you using?
gemini-2.0-flash
What happened?
I'm not sure what was the reason behind introducing exponentialDelay for API requests retry but it goes out of control for Gemini models. Gemini API has 2 cases when API request fail: either because user exceeded the quota per minute or uncontrollable shared quota. The shared quota is applied randomly to everyone when API is "busy" and is randomly removed every few seconds. When such case happens the delay grows from 5 to 40-80 seconds and it ends up just waiting for over a minute without even trying to call API. I don't think there's any API that requires this cooldown period anyways. This feature should either be optional, or removed imho, or at least have the cap of like 30-45 seconds max before the next retry.
Steps to reproduce
- Use free tier gemini models from Google AI Studio
- Try to develop features for a while until it happens
Relevant API REQUEST output
Additional context
No response