-
Notifications
You must be signed in to change notification settings - Fork 2.6k
feat: add Jina as embedding provider for code indexing #6416
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
- Add Jina to EmbedderProvider type and model profiles - Implement JinaEmbedder class with multi-vector embeddings support - Configure jina-embeddings-v4 model with code.query downstream task - Add UI components for Jina provider selection and API key input - Include proper error handling and rate limiting - Add localization support for Jina-related messages
- Add jinaConfigMissing translation to all backend embeddings.json files - Add jinaProvider, jinaApiKeyLabel, jinaApiKeyPlaceholder, and jinaApiKeyRequired translations to all frontend settings.json files - Ensures complete internationalization support for Jina embedding provider feature - All translations verified with check-translations script
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a bunch for adding Jina as an embedding provider! The implementation follows the existing patterns nicely and includes proper internationalization. But, I noticed that the critical issue from the previous review about missing secret status handling is still not resolved, along with a few other things that need our attention. Can you take a look?
| codebaseIndexOpenAiCompatibleApiKey?: string | ||
| codebaseIndexGeminiApiKey?: string | ||
| codebaseIndexMistralApiKey?: string | ||
| codebaseIndexJinaApiKey?: string |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Critical Issue: The updateWithSecrets function is missing handling for codebaseIndexJinaApiKey. This prevents the UI from properly showing placeholder values for existing Jina API keys. You need to add this handling similar to other providers around line 320.
| const data = (await response.json()) as JinaEmbeddingResponse | ||
|
|
||
| // Capture telemetry | ||
| // Log telemetry for successful embedding creation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing telemetry implementation. Other embedders (OpenAI, Mistral, Gemini) use TelemetryService.instance.captureEvent for error tracking. Could you add actual telemetry here instead of just comments?
Example:
| // Log telemetry for successful embedding creation | |
| TelemetryService.instance.captureEvent(TelemetryEventName.CODE_INDEX_ERROR, { | |
| error: lastError instanceof Error ? lastError.message : String(lastError), | |
| stack: lastError instanceof Error ? lastError.stack : undefined, | |
| location: "JinaEmbedder:createEmbeddings", | |
| attempt: attempt | |
| }); |
| input: texts, | ||
| encoding_type: "float", | ||
| // Use code.query task for code search embeddings | ||
| task: "code.query", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the hardcoded task: "code.query" intentional? While it makes sense for code-specific embeddings, would it be beneficial to make this configurable for future flexibility, perhaps as an optional parameter or class property?
| * Validates the embedder configuration by testing connectivity and credentials | ||
| * @returns Promise resolving to validation result | ||
| */ | ||
| async validateConfiguration(): Promise<{ valid: boolean; error?: string }> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The error handling pattern here differs from other embedders. While validateConfiguration uses withValidationErrorHandling, the createEmbeddings method implements its own retry logic. Consider aligning with the established error handling approach used in other embedders for consistency. Would it make sense to extract the retry logic into a shared helper similar to OpenAI's _embedBatchWithRetries?
Summary
This PR adds Jina AI as a new embedding provider option for the code indexing feature in Roo Code.
Changes
EmbedderProvidertype across the codebaseJinaEmbedderclass with full support for:jina-embeddings-v4modelcode.querydownstream task for code-specific embeddingsImplementation Details
https://api.jina.ai/v1/embeddingsjina-embeddings-v4,jina-embeddings-v3, andjina-clip-v2Testing
Usage
jina-embeddings-v4)code.queryas the downstream taskThis allows users to leverage Jina's powerful code understanding capabilities for better code search and indexing.
Important
Add Jina as an embedding provider for code indexing, including configuration, UI updates, and validation support.
EmbedderProvidertype.JinaEmbedderclass for multi-vector embeddings usingjina-embeddings-v4model.codebaseIndexConfigSchemaandcodebaseIndexModelsSchemaincodebase-index.ts.global-settings.tsandwebviewMessageHandler.ts.embeddings.json.settings.json.service-factory.tsto createJinaEmbedderinstance.embeddingModels.ts.This description was created by
for b56695e. You can customize this summary. It will automatically update as commits are pushed.