Skip to content

Conversation

@roomote
Copy link
Contributor

@roomote roomote bot commented Jul 15, 2025

Fixes #5713

Problem

The gemini-embedding-001 model was hitting quota limits (HTTP 429) during indexing with "Batch Embed Content API requests" per-minute limit exceeded, while text-embedding-004 worked fine with the same API key.

Root Cause

Both Gemini models were using the same batching configuration:

  • Same batch token limit (100,000 tokens)
  • Same retry delays (500ms initial)
  • No model-specific rate limiting

However, gemini-embedding-001 has stricter per-minute batch embedding limits compared to text-embedding-004.

Solution

Implemented model-specific rate limiting for gemini-embedding-001:

Changes Made

  1. Added Gemini-specific constants:

    • GEMINI_EMBEDDING_001_MAX_BATCH_TOKENS = 20,000 (reduced from 100,000)
    • GEMINI_EMBEDDING_001_RETRY_DELAY_MS = 2,000 (increased from 500ms)
    • GEMINI_EMBEDDING_001_MAX_BATCH_SIZE = 10 (new batch size limit)
  2. Enhanced OpenAICompatibleEmbedder:

    • Added configurable batch token limits
    • Added configurable retry delays
    • Added configurable batch size limits
    • Added inter-batch delays for models with stricter rate limits
  3. Updated GeminiEmbedder:

    • Detects gemini-embedding-001 model and applies specific configuration
    • Uses default configuration for other models like text-embedding-004

Benefits

  • Prevents quota exhaustion: Smaller batches and longer delays respect API limits
  • Maintains performance: Only applies restrictions to gemini-embedding-001
  • Backward compatible: No changes to existing text-embedding-004 behavior
  • Configurable: Easy to adjust limits if Google changes quotas

Testing

  • ✅ All existing tests pass
  • ✅ Updated test expectations for new constructor signature
  • ✅ Type checking passes
  • ✅ Linting passes

This should resolve the indexing failures for users with gemini-embedding-001 while maintaining optimal performance for text-embedding-004.


Important

Implements model-specific rate limiting for gemini-embedding-001 in GeminiEmbedder and OpenAICompatibleEmbedder with new constants and tests.

  • Behavior:
    • Implements model-specific rate limiting for gemini-embedding-001 in GeminiEmbedder and OpenAICompatibleEmbedder.
    • Adds inter-batch delays for models with stricter rate limits in OpenAICompatibleEmbedder.
  • Constants:
    • Adds GEMINI_EMBEDDING_001_MAX_BATCH_TOKENS, GEMINI_EMBEDDING_001_RETRY_DELAY_MS, and GEMINI_EMBEDDING_001_MAX_BATCH_SIZE to constants/index.ts.
  • Testing:
    • Updates gemini.spec.ts to test new constructor parameters and behavior for GeminiEmbedder.

This description was created by Ellipsis for 5f27712. You can customize this summary. It will automatically update as commits are pushed.

- Add Gemini-specific constants for gemini-embedding-001:
  - Reduced batch token limit (20,000 vs 100,000)
  - Longer retry delays (2000ms vs 500ms)
  - Smaller batch size limit (10 items)
- Update OpenAICompatibleEmbedder to accept configurable rate limiting parameters
- Add inter-batch delays for models with stricter rate limits
- Update GeminiEmbedder to use model-specific configuration
- Fix test expectations to match new constructor signature

Fixes #5713: gemini-embedding-001 quota limit issues during indexing
@roomote roomote bot requested review from cte, jr and mrubens as code owners July 15, 2025 00:42
@dosubot dosubot bot added size:M This PR changes 30-99 lines, ignoring generated files. bug Something isn't working labels Jul 15, 2025
@hannesrudolph hannesrudolph added the Issue/PR - Triage New issue. Needs quick review to confirm validity and assign labels. label Jul 15, 2025
@daniel-lxs daniel-lxs moved this from Triage to renovate BOT in Roo Code Roadmap Jul 15, 2025
@SatoshiReport
Copy link

Code Review Summary

I've reviewed the implementation for issue #5714 and the changes look excellent! Here's my analysis:

Strengths

  1. Targeted Solution: The fix specifically addresses gemini-embedding-001 rate limits without affecting text-embedding-004 performance
  2. Clean Architecture: Model-specific configuration is handled elegantly through constructor parameters
  3. Proper Constants: All magic numbers are extracted to named constants with clear documentation
  4. Backward Compatibility: Existing behavior for text-embedding-004 remains unchanged
  5. Test Coverage: Tests are updated to cover both model configurations
  6. Inter-batch Delays: Smart addition of delays between batches for rate-limited models

🎯 Key Implementation Details

  • Batch Size Reduction: 20K tokens vs 100K (5x smaller batches)
  • Retry Delay Increase: 2000ms vs 500ms (4x longer delays)
  • Batch Count Limit: Max 10 items per batch for gemini-embedding-001
  • Smart Detection: Automatic model detection and configuration application

📋 Code Quality

  • ✅ Follows fail-fast principles
  • ✅ Clear, unambiguous naming conventions
  • ✅ No magic numbers or hardcoded values
  • ✅ Proper error handling maintained
  • ✅ Test coverage updated appropriately

🚀 Ready for Merge

This implementation should resolve the quota exhaustion issues for gemini-embedding-001 users while maintaining optimal performance for other models. The solution is well-architected, properly tested, and follows the project's coding standards.

Recommendation: ✅ Approve and merge

@SatoshiReport
Copy link

Code Review: ✅ APPROVED

Excellent implementation of model-specific rate limiting for gemini-embedding-001!

🎯 Strengths:

  • Targeted solution: Only affects the problematic model
  • Clean architecture: Configurable parameters through constructor
  • Proper constants: All magic numbers extracted with clear names
  • Backward compatible: No breaking changes to existing functionality
  • Test coverage: Updated tests for new behavior

🚀 Key improvements:

  • Batch size: 20K tokens (vs 100K) - 5x reduction
  • Retry delays: 2000ms (vs 500ms) - 4x increase
  • Batch limits: Max 10 items per batch
  • Inter-batch delays: Smart rate limiting for gemini-embedding-001

📋 Code Quality Assessment:

  • ✅ Follows fail-fast principles with clear error handling
  • ✅ Uses explicit, unambiguous naming conventions
  • ✅ Implements targeted solution without scope creep
  • ✅ Maintains clean architecture with single responsibility
  • ✅ Has proper test coverage

This should resolve the quota exhaustion issues (HTTP 429 errors) while maintaining optimal performance for text-embedding-004.

Ready to merge! 🚀

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working Issue/PR - Triage New issue. Needs quick review to confirm validity and assign labels. size:M This PR changes 30-99 lines, ignoring generated files.

Projects

Archived in project

Development

Successfully merging this pull request may close these issues.

Gemini-embedding-001 not respecting free quota limits

5 participants