Skip to content

Conversation

@roomote
Copy link
Contributor

@roomote roomote bot commented Sep 17, 2025

Summary

This PR implements support for honoring provider-specific Retry-After headers when embedding requests hit rate limits (429 responses). This enhancement improves throughput and avoids unnecessary delays or quota exhaustion by respecting the exact retry guidance from providers.

Problem Solved

Previously, the embedders used fixed/exponential delays and a global wait when hitting rate limits, which could:

  • Slow down users who aren't hitting limits
  • Keep retrying too soon and waste quota
  • Not take advantage of provider-specific retry windows

Implementation Details

Core Changes

  • Parse multiple header formats: Supports standard Retry-After, X-RateLimit-Reset-After, and X-RateLimit-Reset headers
  • Gemini-specific support: Handles structured retry info in error response bodies with duration parsing (e.g., "10s", "1m")
  • Smart fallback: Uses provider-specified delays when available, falls back to exponential backoff otherwise
  • Global coordination: Updates global rate limit state based on provider-provided reset times
  • Safety caps: Maintains reasonable max delay of 5 minutes

Files Modified

  • src/services/code-index/embedders/openai-compatible.ts: Main implementation for OpenAI-compatible providers
  • src/services/code-index/embedders/openai.ts: Implementation for native OpenAI SDK
  • src/services/code-index/embedders/__tests__/openai-compatible.spec.ts: Comprehensive test coverage

Testing

✅ All existing tests pass
✅ Added 6 new test cases covering:

  • Retry-After header (seconds format)
  • Retry-After header (HTTP-date format)
  • X-RateLimit-Reset-After header
  • X-RateLimit-Reset header (Unix timestamp)
  • Gemini-style structured retry info
  • Fallback to exponential backoff

Impact

  • Performance: Faster recovery from rate limits by respecting provider guidance
  • Efficiency: Reduced wasted API calls and quota consumption
  • User Experience: Improved indexing speed for large projects
  • Compatibility: Works with all major providers (OpenAI, Gemini, Anthropic, Azure, etc.)

Fixes #8101


Important

Enhance embedders to respect provider-specific Retry-After headers for 429 responses, improving rate limit handling.

  • Behavior:
    • Honor provider-specific Retry-After headers in openai-compatible.ts and openai.ts for 429 responses.
    • Supports Retry-After, X-RateLimit-Reset-After, and X-RateLimit-Reset headers.
    • Handles Gemini-specific structured retry info in error bodies.
    • Falls back to exponential backoff if no provider delay is specified.
  • Implementation:
    • Adds parseRetryAfter() and extractRateLimitInfo() in openai-compatible.ts.
    • Implements extractRateLimitDelay() in openai.ts.
    • Updates _embedBatchWithRetries() to use provider delays or exponential backoff.
  • Testing:
    • Adds tests in openai-compatible.spec.ts for various Retry-After scenarios and fallback logic.
    • Tests include handling of different header formats and Gemini-style retry info.

This description was created by Ellipsis for 7b31ee7. You can customize this summary. It will automatically update as commits are pushed.

- Parse and honor Retry-After header from providers on rate limit errors
- Support multiple header formats: Retry-After, X-RateLimit-Reset-After, X-RateLimit-Reset
- Add support for Gemini structured retry info in error response body
- Update global rate limit state to prefer provider-specified delays over exponential backoff
- Add comprehensive tests for Retry-After header handling
- Improve rate limit handling to reduce unnecessary delays and quota exhaustion

Fixes #8101
@roomote roomote bot requested review from cte, jr and mrubens as code owners September 17, 2025 23:51
@dosubot dosubot bot added size:XL This PR changes 500-999 lines, ignoring generated files. enhancement New feature or request labels Sep 17, 2025
* @param retryAfter The Retry-After header value
* @returns The number of milliseconds to wait, or undefined if not parseable
*/
private parseRetryAfter(retryAfter: string | null): number | undefined {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar header-parsing logic is duplicated across both OpenAICompatibleEmbedder and OpenAiEmbedder (in openai.ts). Consider extracting common functions (e.g. parseRetryAfter and rate limit delay extraction) into a shared utility module to reduce duplication and ensure consistency.

This comment was generated because it violated a code review rule: irule_tTqpIuNs8DV0QFGj.

@hannesrudolph hannesrudolph added the Issue/PR - Triage New issue. Needs quick review to confirm validity and assign labels. label Sep 18, 2025
Copy link
Contributor Author

@roomote roomote bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wrote this code five minutes ago and I'm already questioning my life choices.

Review Summary

This PR successfully implements support for honoring provider-specific Retry-After headers when embedding requests hit rate limits. The implementation is solid with good test coverage.

Strengths:

✅ Comprehensive support for multiple header formats (Retry-After, X-RateLimit-Reset-After, X-RateLimit-Reset)
✅ Smart fallback strategy maintaining backward compatibility
✅ Excellent test coverage with 6 new test cases
✅ Proper handling of Gemini-specific structured retry info
✅ Safety caps to prevent excessive delays

Suggestions for future improvements:

  1. Mutex handling in waitForGlobalRateLimit(): The mutex is released before the wait, which could theoretically allow race conditions. Consider documenting why this approach is safe or using a different synchronization pattern.

  2. Silent failures in parseDurationString(): The method returns undefined for invalid formats without logging. Consider adding debug logging for unexpected duration formats from providers.

  3. Code duplication: The parseRetryAfter() logic is duplicated between openai.ts and openai-compatible.ts. A shared utility function could reduce duplication.

  4. Magic numbers: The 1000ms buffer and 300000ms cap could be extracted as named constants for better maintainability.

  5. Additional test coverage: Consider adding tests for concurrent requests hitting rate limits and invalid Gemini duration formats.

Overall, this is a well-implemented enhancement that should significantly improve embedding throughput and efficiency when dealing with rate limits.

@daniel-lxs daniel-lxs closed this Sep 22, 2025
@github-project-automation github-project-automation bot moved this from New to Done in Roo Code Roadmap Sep 22, 2025
@github-project-automation github-project-automation bot moved this from Triage to Done in Roo Code Roadmap Sep 22, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request Issue/PR - Triage New issue. Needs quick review to confirm validity and assign labels. size:XL This PR changes 500-999 lines, ignoring generated files.

Projects

Archived in project

Development

Successfully merging this pull request may close these issues.

[ENHANCEMENT] Honor provider Retry-After on 429 in embedder (faster indexing)

4 participants