feat: honor provider Retry-After headers on 429 responses in embedders #8105

roomote · 2025-09-17T23:51:48Z

Summary

This PR implements support for honoring provider-specific Retry-After headers when embedding requests hit rate limits (429 responses). This enhancement improves throughput and avoids unnecessary delays or quota exhaustion by respecting the exact retry guidance from providers.

Problem Solved

Previously, the embedders used fixed/exponential delays and a global wait when hitting rate limits, which could:

Slow down users who aren't hitting limits
Keep retrying too soon and waste quota
Not take advantage of provider-specific retry windows

Implementation Details

Core Changes

Parse multiple header formats: Supports standard Retry-After, X-RateLimit-Reset-After, and X-RateLimit-Reset headers
Gemini-specific support: Handles structured retry info in error response bodies with duration parsing (e.g., "10s", "1m")
Smart fallback: Uses provider-specified delays when available, falls back to exponential backoff otherwise
Global coordination: Updates global rate limit state based on provider-provided reset times
Safety caps: Maintains reasonable max delay of 5 minutes

Files Modified

src/services/code-index/embedders/openai-compatible.ts: Main implementation for OpenAI-compatible providers
src/services/code-index/embedders/openai.ts: Implementation for native OpenAI SDK
src/services/code-index/embedders/__tests__/openai-compatible.spec.ts: Comprehensive test coverage

Testing

✅ All existing tests pass
✅ Added 6 new test cases covering:

Retry-After header (seconds format)
Retry-After header (HTTP-date format)
X-RateLimit-Reset-After header
X-RateLimit-Reset header (Unix timestamp)
Gemini-style structured retry info
Fallback to exponential backoff

Impact

Performance: Faster recovery from rate limits by respecting provider guidance
Efficiency: Reduced wasted API calls and quota consumption
User Experience: Improved indexing speed for large projects
Compatibility: Works with all major providers (OpenAI, Gemini, Anthropic, Azure, etc.)

Fixes #8101

Important

Enhance embedders to respect provider-specific Retry-After headers for 429 responses, improving rate limit handling.

Behavior:
- Honor provider-specific Retry-After headers in openai-compatible.ts and openai.ts for 429 responses.
- Supports Retry-After, X-RateLimit-Reset-After, and X-RateLimit-Reset headers.
- Handles Gemini-specific structured retry info in error bodies.
- Falls back to exponential backoff if no provider delay is specified.
Implementation:
- Adds parseRetryAfter() and extractRateLimitInfo() in openai-compatible.ts.
- Implements extractRateLimitDelay() in openai.ts.
- Updates _embedBatchWithRetries() to use provider delays or exponential backoff.
Testing:
- Adds tests in openai-compatible.spec.ts for various Retry-After scenarios and fallback logic.
- Tests include handling of different header formats and Gemini-style retry info.

^{This description was created by}^{for 7b31ee7. You can customize this summary. It will automatically update as commits are pushed.}

- Parse and honor Retry-After header from providers on rate limit errors - Support multiple header formats: Retry-After, X-RateLimit-Reset-After, X-RateLimit-Reset - Add support for Gemini structured retry info in error response body - Update global rate limit state to prefer provider-specified delays over exponential backoff - Add comprehensive tests for Retry-After header handling - Improve rate limit handling to reduce unnecessary delays and quota exhaustion Fixes #8101

ellipsis-dev · 2025-09-17T23:53:49Z

src/services/code-index/embedders/openai-compatible.ts

+	 * @param retryAfter The Retry-After header value
+	 * @returns The number of milliseconds to wait, or undefined if not parseable
+	 */
+	private parseRetryAfter(retryAfter: string | null): number | undefined {


Similar header-parsing logic is duplicated across both OpenAICompatibleEmbedder and OpenAiEmbedder (in openai.ts). Consider extracting common functions (e.g. parseRetryAfter and rate limit delay extraction) into a shared utility module to reduce duplication and ensure consistency.

^{This comment was generated because it violated a code review rule: irule_tTqpIuNs8DV0QFGj.}

roomote

I wrote this code five minutes ago and I'm already questioning my life choices.

Review Summary

This PR successfully implements support for honoring provider-specific Retry-After headers when embedding requests hit rate limits. The implementation is solid with good test coverage.

Strengths:

✅ Comprehensive support for multiple header formats (Retry-After, X-RateLimit-Reset-After, X-RateLimit-Reset)
✅ Smart fallback strategy maintaining backward compatibility
✅ Excellent test coverage with 6 new test cases
✅ Proper handling of Gemini-specific structured retry info
✅ Safety caps to prevent excessive delays

Suggestions for future improvements:

Mutex handling in waitForGlobalRateLimit(): The mutex is released before the wait, which could theoretically allow race conditions. Consider documenting why this approach is safe or using a different synchronization pattern.
Silent failures in parseDurationString(): The method returns undefined for invalid formats without logging. Consider adding debug logging for unexpected duration formats from providers.
Code duplication: The parseRetryAfter() logic is duplicated between openai.ts and openai-compatible.ts. A shared utility function could reduce duplication.
Magic numbers: The 1000ms buffer and 300000ms cap could be extracted as named constants for better maintainability.
Additional test coverage: Consider adding tests for concurrent requests hitting rate limits and invalid Gemini duration formats.

Overall, this is a well-implemented enhancement that should significantly improve embedding throughput and efficiency when dealing with rate limits.

roomote bot requested review from cte, jr and mrubens as code owners September 17, 2025 23:51

github-project-automation bot added this to Roo Code Roadmap and Roo Code Roadmap Sep 17, 2025

github-project-automation bot moved this to Triage in Roo Code Roadmap Sep 17, 2025

github-project-automation bot moved this to New in Roo Code Roadmap Sep 17, 2025

dosubot bot added size:XL This PR changes 500-999 lines, ignoring generated files. enhancement New feature or request labels Sep 17, 2025

roomote bot mentioned this pull request Sep 17, 2025

[ENHANCEMENT] Honor provider Retry-After on 429 in embedder (faster indexing) #8101

Open

ellipsis-dev bot reviewed Sep 17, 2025

View reviewed changes

hannesrudolph added the Issue/PR - Triage New issue. Needs quick review to confirm validity and assign labels. label Sep 18, 2025

roomote bot commented Sep 18, 2025

View reviewed changes

daniel-lxs closed this Sep 22, 2025

github-project-automation bot moved this from New to Done in Roo Code Roadmap Sep 22, 2025

github-project-automation bot moved this from Triage to Done in Roo Code Roadmap Sep 22, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: honor provider Retry-After headers on 429 responses in embedders #8105

feat: honor provider Retry-After headers on 429 responses in embedders #8105

Uh oh!

roomote bot commented Sep 17, 2025 •

edited by ellipsis-dev bot

Loading

Uh oh!

ellipsis-dev bot Sep 17, 2025

Uh oh!

roomote bot left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

feat: honor provider Retry-After headers on 429 responses in embedders #8105

feat: honor provider Retry-After headers on 429 responses in embedders #8105

Uh oh!

Conversation

roomote bot commented Sep 17, 2025 • edited by ellipsis-dev bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Problem Solved

Implementation Details

Core Changes

Files Modified

Testing

Impact

Uh oh!

ellipsis-dev bot Sep 17, 2025

Choose a reason for hiding this comment

Uh oh!

roomote bot left a comment

Choose a reason for hiding this comment

Review Summary

Strengths:

Suggestions for future improvements:

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

roomote bot commented Sep 17, 2025 •

edited by ellipsis-dev bot

Loading