-
Notifications
You must be signed in to change notification settings - Fork 2.6k
feat: honor provider Retry-After headers on 429 responses in embedders #8105
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
- Parse and honor Retry-After header from providers on rate limit errors - Support multiple header formats: Retry-After, X-RateLimit-Reset-After, X-RateLimit-Reset - Add support for Gemini structured retry info in error response body - Update global rate limit state to prefer provider-specified delays over exponential backoff - Add comprehensive tests for Retry-After header handling - Improve rate limit handling to reduce unnecessary delays and quota exhaustion Fixes #8101
| * @param retryAfter The Retry-After header value | ||
| * @returns The number of milliseconds to wait, or undefined if not parseable | ||
| */ | ||
| private parseRetryAfter(retryAfter: string | null): number | undefined { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar header-parsing logic is duplicated across both OpenAICompatibleEmbedder and OpenAiEmbedder (in openai.ts). Consider extracting common functions (e.g. parseRetryAfter and rate limit delay extraction) into a shared utility module to reduce duplication and ensure consistency.
This comment was generated because it violated a code review rule: irule_tTqpIuNs8DV0QFGj.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wrote this code five minutes ago and I'm already questioning my life choices.
Review Summary
This PR successfully implements support for honoring provider-specific Retry-After headers when embedding requests hit rate limits. The implementation is solid with good test coverage.
Strengths:
✅ Comprehensive support for multiple header formats (Retry-After, X-RateLimit-Reset-After, X-RateLimit-Reset)
✅ Smart fallback strategy maintaining backward compatibility
✅ Excellent test coverage with 6 new test cases
✅ Proper handling of Gemini-specific structured retry info
✅ Safety caps to prevent excessive delays
Suggestions for future improvements:
-
Mutex handling in waitForGlobalRateLimit(): The mutex is released before the wait, which could theoretically allow race conditions. Consider documenting why this approach is safe or using a different synchronization pattern.
-
Silent failures in parseDurationString(): The method returns undefined for invalid formats without logging. Consider adding debug logging for unexpected duration formats from providers.
-
Code duplication: The parseRetryAfter() logic is duplicated between openai.ts and openai-compatible.ts. A shared utility function could reduce duplication.
-
Magic numbers: The 1000ms buffer and 300000ms cap could be extracted as named constants for better maintainability.
-
Additional test coverage: Consider adding tests for concurrent requests hitting rate limits and invalid Gemini duration formats.
Overall, this is a well-implemented enhancement that should significantly improve embedding throughput and efficiency when dealing with rate limits.
Summary
This PR implements support for honoring provider-specific Retry-After headers when embedding requests hit rate limits (429 responses). This enhancement improves throughput and avoids unnecessary delays or quota exhaustion by respecting the exact retry guidance from providers.
Problem Solved
Previously, the embedders used fixed/exponential delays and a global wait when hitting rate limits, which could:
Implementation Details
Core Changes
Files Modified
src/services/code-index/embedders/openai-compatible.ts: Main implementation for OpenAI-compatible providerssrc/services/code-index/embedders/openai.ts: Implementation for native OpenAI SDKsrc/services/code-index/embedders/__tests__/openai-compatible.spec.ts: Comprehensive test coverageTesting
✅ All existing tests pass
✅ Added 6 new test cases covering:
Impact
Fixes #8101
Important
Enhance embedders to respect provider-specific Retry-After headers for 429 responses, improving rate limit handling.
openai-compatible.tsandopenai.tsfor 429 responses.Retry-After,X-RateLimit-Reset-After, andX-RateLimit-Resetheaders.parseRetryAfter()andextractRateLimitInfo()inopenai-compatible.ts.extractRateLimitDelay()inopenai.ts._embedBatchWithRetries()to use provider delays or exponential backoff.openai-compatible.spec.tsfor various Retry-After scenarios and fallback logic.This description was created by
for 7b31ee7. You can customize this summary. It will automatically update as commits are pushed.