fix: implement model-specific rate limiting for gemini-embedding-001 #5714

roomote · 2025-07-15T00:42:47Z

Problem

The gemini-embedding-001 model was hitting quota limits (HTTP 429) during indexing with "Batch Embed Content API requests" per-minute limit exceeded, while text-embedding-004 worked fine with the same API key.

Root Cause

Both Gemini models were using the same batching configuration:

Same batch token limit (100,000 tokens)
Same retry delays (500ms initial)
No model-specific rate limiting

However, gemini-embedding-001 has stricter per-minute batch embedding limits compared to text-embedding-004.

Solution

Implemented model-specific rate limiting for gemini-embedding-001:

Changes Made

Added Gemini-specific constants:
- GEMINI_EMBEDDING_001_MAX_BATCH_TOKENS = 20,000 (reduced from 100,000)
- GEMINI_EMBEDDING_001_RETRY_DELAY_MS = 2,000 (increased from 500ms)
- GEMINI_EMBEDDING_001_MAX_BATCH_SIZE = 10 (new batch size limit)
Enhanced OpenAICompatibleEmbedder:
- Added configurable batch token limits
- Added configurable retry delays
- Added configurable batch size limits
- Added inter-batch delays for models with stricter rate limits
Updated GeminiEmbedder:
- Detects gemini-embedding-001 model and applies specific configuration
- Uses default configuration for other models like text-embedding-004

Benefits

Prevents quota exhaustion: Smaller batches and longer delays respect API limits
Maintains performance: Only applies restrictions to gemini-embedding-001
Backward compatible: No changes to existing text-embedding-004 behavior
Configurable: Easy to adjust limits if Google changes quotas

Testing

✅ All existing tests pass
✅ Updated test expectations for new constructor signature
✅ Type checking passes
✅ Linting passes

This should resolve the indexing failures for users with gemini-embedding-001 while maintaining optimal performance for text-embedding-004.

Important

Implements model-specific rate limiting for gemini-embedding-001 in GeminiEmbedder and OpenAICompatibleEmbedder with new constants and tests.

Behavior:
- Implements model-specific rate limiting for gemini-embedding-001 in GeminiEmbedder and OpenAICompatibleEmbedder.
- Adds inter-batch delays for models with stricter rate limits in OpenAICompatibleEmbedder.
Constants:
- Adds GEMINI_EMBEDDING_001_MAX_BATCH_TOKENS, GEMINI_EMBEDDING_001_RETRY_DELAY_MS, and GEMINI_EMBEDDING_001_MAX_BATCH_SIZE to constants/index.ts.
Testing:
- Updates gemini.spec.ts to test new constructor parameters and behavior for GeminiEmbedder.

^{This description was created by}^{for 5f27712. You can customize this summary. It will automatically update as commits are pushed.}

- Add Gemini-specific constants for gemini-embedding-001: - Reduced batch token limit (20,000 vs 100,000) - Longer retry delays (2000ms vs 500ms) - Smaller batch size limit (10 items) - Update OpenAICompatibleEmbedder to accept configurable rate limiting parameters - Add inter-batch delays for models with stricter rate limits - Update GeminiEmbedder to use model-specific configuration - Fix test expectations to match new constructor signature Fixes #5713: gemini-embedding-001 quota limit issues during indexing

SatoshiReport · 2025-07-15T02:08:18Z

Code Review Summary

I've reviewed the implementation for issue #5714 and the changes look excellent! Here's my analysis:

✅ Strengths

Targeted Solution: The fix specifically addresses gemini-embedding-001 rate limits without affecting text-embedding-004 performance
Clean Architecture: Model-specific configuration is handled elegantly through constructor parameters
Proper Constants: All magic numbers are extracted to named constants with clear documentation
Backward Compatibility: Existing behavior for text-embedding-004 remains unchanged
Test Coverage: Tests are updated to cover both model configurations
Inter-batch Delays: Smart addition of delays between batches for rate-limited models

🎯 Key Implementation Details

Batch Size Reduction: 20K tokens vs 100K (5x smaller batches)
Retry Delay Increase: 2000ms vs 500ms (4x longer delays)
Batch Count Limit: Max 10 items per batch for gemini-embedding-001
Smart Detection: Automatic model detection and configuration application

📋 Code Quality

✅ Follows fail-fast principles
✅ Clear, unambiguous naming conventions
✅ No magic numbers or hardcoded values
✅ Proper error handling maintained
✅ Test coverage updated appropriately

🚀 Ready for Merge

This implementation should resolve the quota exhaustion issues for gemini-embedding-001 users while maintaining optimal performance for other models. The solution is well-architected, properly tested, and follows the project's coding standards.

Recommendation: ✅ Approve and merge

SatoshiReport · 2025-07-15T02:09:55Z

Code Review: ✅ APPROVED

Excellent implementation of model-specific rate limiting for gemini-embedding-001!

🎯 Strengths:

Targeted solution: Only affects the problematic model
Clean architecture: Configurable parameters through constructor
Proper constants: All magic numbers extracted with clear names
Backward compatible: No breaking changes to existing functionality
Test coverage: Updated tests for new behavior

🚀 Key improvements:

Batch size: 20K tokens (vs 100K) - 5x reduction
Retry delays: 2000ms (vs 500ms) - 4x increase
Batch limits: Max 10 items per batch
Inter-batch delays: Smart rate limiting for gemini-embedding-001

📋 Code Quality Assessment:

✅ Follows fail-fast principles with clear error handling
✅ Uses explicit, unambiguous naming conventions
✅ Implements targeted solution without scope creep
✅ Maintains clean architecture with single responsibility
✅ Has proper test coverage

This should resolve the quota exhaustion issues (HTTP 429 errors) while maintaining optimal performance for text-embedding-004.

Ready to merge! 🚀

roomote bot requested review from cte, jr and mrubens as code owners July 15, 2025 00:42

github-project-automation bot added this to Roo Code Roadmap and Roo Code Roadmap Jul 15, 2025

github-project-automation bot moved this to New in Roo Code Roadmap Jul 15, 2025

github-project-automation bot moved this to Triage in Roo Code Roadmap Jul 15, 2025

dosubot bot added size:M This PR changes 30-99 lines, ignoring generated files. bug Something isn't working labels Jul 15, 2025

hannesrudolph added the Issue/PR - Triage New issue. Needs quick review to confirm validity and assign labels. label Jul 15, 2025

daniel-lxs moved this from Triage to renovate BOT in Roo Code Roadmap Jul 15, 2025

SatoshiReport mentioned this pull request Jul 15, 2025

feat: add embedding rate limiting to prevent API quota exhaustion For Issue #5713 #5715

Closed

7 tasks

daniel-lxs closed this Sep 16, 2025

github-project-automation bot moved this from New to Done in Roo Code Roadmap Sep 16, 2025

github-project-automation bot moved this from Renovate BOT to Done in Roo Code Roadmap Sep 16, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: implement model-specific rate limiting for gemini-embedding-001 #5714

fix: implement model-specific rate limiting for gemini-embedding-001 #5714

roomote bot commented Jul 15, 2025 •

edited by ellipsis-dev bot

Loading

Uh oh!

SatoshiReport commented Jul 15, 2025

Uh oh!

SatoshiReport commented Jul 15, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

fix: implement model-specific rate limiting for gemini-embedding-001 #5714

fix: implement model-specific rate limiting for gemini-embedding-001 #5714

Conversation

roomote bot commented Jul 15, 2025 • edited by ellipsis-dev bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem

Root Cause

Solution

Changes Made

Benefits

Testing

Uh oh!

SatoshiReport commented Jul 15, 2025

Code Review Summary

✅ Strengths

🎯 Key Implementation Details

📋 Code Quality

🚀 Ready for Merge

Uh oh!

SatoshiReport commented Jul 15, 2025

Code Review: ✅ APPROVED

🎯 Strengths:

🚀 Key improvements:

📋 Code Quality Assessment:

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

roomote bot commented Jul 15, 2025 •

edited by ellipsis-dev bot

Loading