-
Notifications
You must be signed in to change notification settings - Fork 2.1k
fix: properly handle exponential backoff for rate limiting in embedders #7030
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
- Fixed issue where multiple concurrent requests hitting rate limits would all log retry messages simultaneously - Improved global rate limit state management to coordinate retry delays across parallel requests - Added proper delay calculation that considers both global consecutive errors and per-request attempt numbers - Added success callback to reset consecutive error count when requests succeed - Ensures exponential backoff delays are applied sequentially rather than all at once Fixes #7029
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed my own code. Found it suspiciously working. Must have been an accident.
@@ -294,6 +294,9 @@ export class OpenAICompatibleEmbedder implements IEmbedder { | |||
|
|||
const embeddings = response.data.map((item) => item.embedding as number[]) | |||
|
|||
// Reset consecutive errors on success | |||
await this.resetGlobalRateLimitOnSuccess() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good addition to reset the error count on success! However, I notice there's no explicit test case verifying this behavior. Could we add a test that confirms consecutive errors are properly reset after a successful request?
@@ -452,16 +456,47 @@ export class OpenAICompatibleEmbedder implements IEmbedder { | |||
|
|||
state.lastRateLimitError = now | |||
|
|||
// Calculate exponential backoff based on consecutive errors | |||
// Calculate exponential backoff based on consecutive errors AND attempt number | |||
// Use the maximum of the two to ensure proper backoff |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we improve this comment to explain why we take the maximum? For example: 'Use the maximum of the two to ensure proper backoff: Global delay ensures all requests respect the rate limit across the system, while attempt delay ensures individual requests back off appropriately. Taking the maximum prevents requests from bypassing the global rate limit.'
// Check if we're already in a rate limit period | ||
if (state.isRateLimited && state.rateLimitResetTime > now) { | ||
// Return the remaining wait time | ||
return state.rateLimitResetTime - now |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider adding a debug log here to help track when requests are reusing existing rate limit periods. This would be helpful for debugging rate limit issues in production.
const attemptExponentialDelay = Math.min(INITIAL_DELAY_MS * Math.pow(2, attemptNumber), maxDelay) | ||
|
||
// Use the larger of the two delays | ||
const exponentialDelay = Math.max(globalExponentialDelay, attemptExponentialDelay) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good implementation of the delay calculation! Taking the maximum of global and attempt-based delays ensures proper coordination across concurrent requests.
// Trigger the updateGlobalRateLimitState method | ||
await (embedder as any).updateGlobalRateLimitState(rateLimitError) | ||
// Trigger the updateGlobalRateLimitState method with attempt number | ||
await (embedder as any).updateGlobalRateLimitState(rateLimitError, 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good update to pass the attempt number. Could we add another test case that explicitly verifies the resetGlobalRateLimitOnSuccess behavior to ensure consecutive errors are properly reset after successful requests?
@roomote-agent For what it's worth, I've installed this from this PR's branch on my local VS Code, and it does not appear to have fixed the issue quite. Now, there doesn't appear to be any attempt to exponentially back off at all. Here's the Developer Tools Console log: console.ts:137 [Extension Host] Rate limit hit, retrying in 5000ms (attempt 1/3) |
Summary
This PR fixes the exponential backoff issue with Mistral AI (and other OpenAI-compatible) embeddings where rate limit retries were not properly respecting exponential delays. Instead of waiting progressively longer between retries, all retry messages would appear simultaneously and then wait for the maximum delay (5 minutes).
Problem
When using Mistral AI's codestral embeddings with large repositories (>5200 chunks), the rate limiter would:
Solution
Changes
OpenAICompatibleEmbedder
to properly handle rate limit state and calculate delaysupdateGlobalRateLimitState
to return the calculated delay and consider both global and per-attempt delaysresetGlobalRateLimitOnSuccess
to reset error counts on successful requestsTesting
Fixes #7029
Important
Fixes exponential backoff handling for rate limiting in
OpenAICompatibleEmbedder
, ensuring sequential retry delays and improved global state management.OpenAICompatibleEmbedder
for rate limiting, ensuring sequential retry delays.updateGlobalRateLimitState
to calculate and return delay based on global and per-attempt errors.resetGlobalRateLimitOnSuccess
to reset error counts on success.openai-compatible-rate-limit.spec.ts
to match new method signatures and behavior.This description was created by
for ac17708. You can customize this summary. It will automatically update as commits are pushed.