Skip to content

Conversation

@Onnson
Copy link

@Onnson Onnson commented Jul 30, 2025

Summary

This PR adds Jina AI as a new embedding provider option for the code indexing feature in Roo Code.

Changes

  • Added Jina to the EmbedderProvider type across the codebase
  • Implemented JinaEmbedder class with full support for:
    • Multi-vector embeddings using jina-embeddings-v4 model
    • code.query downstream task for code-specific embeddings
    • Proper batching and rate limiting
    • Error handling and validation
  • Updated UI components to allow selecting Jina as provider and entering API key
  • Added localization support for Jina-related messages
  • Updated type definitions and schemas

Implementation Details

  • Uses Jina's REST API endpoint at https://api.jina.ai/v1/embeddings
  • Supports models: jina-embeddings-v4, jina-embeddings-v3, and jina-clip-v2
  • Configured with appropriate dimensions and score thresholds for each model
  • Follows the same pattern as other embedding providers for consistency

Testing

  • Successfully built and packaged the extension
  • Installed and tested in Windsurf Next
  • All TypeScript type checks pass
  • All linting checks pass

Usage

  1. Select "Jina" as the embedding provider in code indexing settings
  2. Enter your Jina API key
  3. Choose the model (defaults to jina-embeddings-v4)
  4. The embedder will automatically use code.query as the downstream task

This allows users to leverage Jina's powerful code understanding capabilities for better code search and indexing.


Important

Add Jina as an embedding provider for code indexing, including configuration, UI updates, and validation support.

  • Behavior:
    • Add Jina as an embedding provider in EmbedderProvider type.
    • Implement JinaEmbedder class for multi-vector embeddings using jina-embeddings-v4 model.
    • Update UI to select Jina as provider and enter API key.
  • Configuration:
    • Add Jina to codebaseIndexConfigSchema and codebaseIndexModelsSchema in codebase-index.ts.
    • Add Jina API key handling in global-settings.ts and webviewMessageHandler.ts.
  • Validation and Localization:
    • Add Jina-related validation messages in embeddings.json.
    • Add localization support for Jina in settings.json.
  • Misc:
    • Update service-factory.ts to create JinaEmbedder instance.
    • Add Jina model profiles to embeddingModels.ts.

This description was created by Ellipsis for b56695e. You can customize this summary. It will automatically update as commits are pushed.

- Add Jina to EmbedderProvider type and model profiles
- Implement JinaEmbedder class with multi-vector embeddings support
- Configure jina-embeddings-v4 model with code.query downstream task
- Add UI components for Jina provider selection and API key input
- Include proper error handling and rate limiting
- Add localization support for Jina-related messages
@Onnson Onnson requested review from cte, jr and mrubens as code owners July 30, 2025 06:26
@dosubot dosubot bot added size:L This PR changes 100-499 lines, ignoring generated files. enhancement New feature or request labels Jul 30, 2025
roomote[bot]

This comment was marked as outdated.

@hannesrudolph hannesrudolph added the Issue/PR - Triage New issue. Needs quick review to confirm validity and assign labels. label Jul 30, 2025
roomote bot pushed a commit that referenced this pull request Jul 30, 2025
- Add jinaConfigMissing translation to all backend embeddings.json files
- Add jinaProvider, jinaApiKeyLabel, jinaApiKeyPlaceholder, and jinaApiKeyRequired translations to all frontend settings.json files
- Ensures complete internationalization support for Jina embedding provider feature
- All translations verified with check-translations script
@daniel-lxs daniel-lxs moved this from Triage to PR [Needs Prelim Review] in Roo Code Roadmap Aug 1, 2025
@hannesrudolph hannesrudolph added PR - Needs Preliminary Review and removed Issue/PR - Triage New issue. Needs quick review to confirm validity and assign labels. labels Aug 1, 2025
Copy link
Member

@daniel-lxs daniel-lxs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a bunch for adding Jina as an embedding provider! The implementation follows the existing patterns nicely and includes proper internationalization. But, I noticed that the critical issue from the previous review about missing secret status handling is still not resolved, along with a few other things that need our attention. Can you take a look?

codebaseIndexOpenAiCompatibleApiKey?: string
codebaseIndexGeminiApiKey?: string
codebaseIndexMistralApiKey?: string
codebaseIndexJinaApiKey?: string
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Critical Issue: The updateWithSecrets function is missing handling for codebaseIndexJinaApiKey. This prevents the UI from properly showing placeholder values for existing Jina API keys. You need to add this handling similar to other providers around line 320.

const data = (await response.json()) as JinaEmbeddingResponse

// Capture telemetry
// Log telemetry for successful embedding creation
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing telemetry implementation. Other embedders (OpenAI, Mistral, Gemini) use TelemetryService.instance.captureEvent for error tracking. Could you add actual telemetry here instead of just comments?

Example:

Suggested change
// Log telemetry for successful embedding creation
TelemetryService.instance.captureEvent(TelemetryEventName.CODE_INDEX_ERROR, {
error: lastError instanceof Error ? lastError.message : String(lastError),
stack: lastError instanceof Error ? lastError.stack : undefined,
location: "JinaEmbedder:createEmbeddings",
attempt: attempt
});

input: texts,
encoding_type: "float",
// Use code.query task for code search embeddings
task: "code.query",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the hardcoded task: "code.query" intentional? While it makes sense for code-specific embeddings, would it be beneficial to make this configurable for future flexibility, perhaps as an optional parameter or class property?

* Validates the embedder configuration by testing connectivity and credentials
* @returns Promise resolving to validation result
*/
async validateConfiguration(): Promise<{ valid: boolean; error?: string }> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The error handling pattern here differs from other embedders. While validateConfiguration uses withValidationErrorHandling, the createEmbeddings method implements its own retry logic. Consider aligning with the established error handling approach used in other embedders for consistency. Would it make sense to extract the retry logic into a shared helper similar to OpenAI's _embedBatchWithRetries?

@daniel-lxs daniel-lxs moved this from PR [Needs Prelim Review] to PR [Changes Requested] in Roo Code Roadmap Aug 11, 2025
@github-project-automation github-project-automation bot moved this from New to Done in Roo Code Roadmap Sep 23, 2025
@github-project-automation github-project-automation bot moved this from PR [Changes Requested] to Done in Roo Code Roadmap Sep 23, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request PR - Changes Requested size:L This PR changes 100-499 lines, ignoring generated files.

Projects

Archived in project

Development

Successfully merging this pull request may close these issues.

3 participants