fix: implement model-specific rate limiting for gemini-embedding-001 #5714
+55
−4
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Fixes #5713
Problem
The
gemini-embedding-001model was hitting quota limits (HTTP 429) during indexing with "Batch Embed Content API requests" per-minute limit exceeded, whiletext-embedding-004worked fine with the same API key.Root Cause
Both Gemini models were using the same batching configuration:
However,
gemini-embedding-001has stricter per-minute batch embedding limits compared totext-embedding-004.Solution
Implemented model-specific rate limiting for
gemini-embedding-001:Changes Made
Added Gemini-specific constants:
GEMINI_EMBEDDING_001_MAX_BATCH_TOKENS = 20,000(reduced from 100,000)GEMINI_EMBEDDING_001_RETRY_DELAY_MS = 2,000(increased from 500ms)GEMINI_EMBEDDING_001_MAX_BATCH_SIZE = 10(new batch size limit)Enhanced OpenAICompatibleEmbedder:
Updated GeminiEmbedder:
gemini-embedding-001model and applies specific configurationtext-embedding-004Benefits
gemini-embedding-001text-embedding-004behaviorTesting
This should resolve the indexing failures for users with
gemini-embedding-001while maintaining optimal performance fortext-embedding-004.Important
Implements model-specific rate limiting for
gemini-embedding-001inGeminiEmbedderandOpenAICompatibleEmbedderwith new constants and tests.gemini-embedding-001inGeminiEmbedderandOpenAICompatibleEmbedder.OpenAICompatibleEmbedder.GEMINI_EMBEDDING_001_MAX_BATCH_TOKENS,GEMINI_EMBEDDING_001_RETRY_DELAY_MS, andGEMINI_EMBEDDING_001_MAX_BATCH_SIZEtoconstants/index.ts.gemini.spec.tsto test new constructor parameters and behavior forGeminiEmbedder.This description was created by
for 5f27712. You can customize this summary. It will automatically update as commits are pushed.