feat: Allow custom embedding model configuration for Codebase Indexing #3974

DanielusG · 2025-05-25T19:38:06Z

Description:

This Pull Request introduces the ability for users to configure custom embedding models (name, URL, dimension) for the Codebase Indexing feature, as proposed in Issue #3959

Changes Introduced:

Adds settings UI fields for:
- Base URL (for Ollama endpoint or OpenAI-compatible proxy/endpoint)
- Model ID (allowing custom model names)
- Custom Model Dimension (optional, for unrecognized models)
Updates backend configuration (config-manager.ts, types, schemas) to store and use these settings.
Modifies OpenAiEmbedder to accept baseURL and optional dimensions.
Refines CodeIndexOllamaEmbedder for standard API interaction.
Updates service-factory.ts to determine vector dimensions using built-in profiles first, then falling back to the user-provided Custom Model Dimension.
Adds corresponding VS Code settings definitions in package.json.
Updates settings UI (CodeIndexSettings.tsx) with the new input fields and logic.
Adds relevant i18n strings (settings.json).

Motivation & Roadmap Alignment:

Provides flexibility for users to use local models (Ollama), specialized code models, or custom endpoints (proxies, Azure OpenAI).
Directly supports the project roadmap goal: "Expand robust support for a wide variety of AI providers and models."

Testing:

Successfully tested manually using Ollama (http://localhost:11434) with the model jina-embeddings-v2-base-code:latest and specifying the dimension 768. Indexing used the custom settings correctly.
Limitation: Only manually tested with the Ollama setup described. Interactions with OpenAI custom endpoints/models or specific error scenarios may require further testing.

Screenshots/Videos:

Development Notes:

Implementation assisted by Cursor (Claude 4), followed by manual review, refactoring for consistency, and testing.
As I am learning TypeScript/React, I welcome feedback on code style, potential improvements, and adherence to Roo Code best practices. Happy to make necessary revisions.

Documentation Impact:

Yes, this change requires updates to the user documentation (to explain the new settings for custom embedding configuration). (Check this box if you agree)
No documentation changes needed.

Important

This PR adds support for configuring custom embedding models for codebase indexing, including UI, backend, and schema updates.

Behavior:
- Adds settings UI fields for Base URL, Model ID, and Custom Model Dimension in CodeIndexSettings.tsx.
- Updates config-manager.ts to store and use custom model settings.
- Modifies OpenAiEmbedder and CodeIndexOllamaEmbedder to accept custom baseURL and dimensions.
- Updates service-factory.ts to handle custom model dimensions.
Schemas and Types:
- Adds codebaseIndexEmbedderDimension to CodebaseIndexConfig in schemas/index.ts and types.ts.
- Updates ProviderSettings and GlobalSettings to include custom model configuration.
Misc:
- Adds i18n strings for new settings in settings.json.
- Updates package.json to include new settings definitions.

^{This description was created by}^{for 9acabb3. You can customize this summary. It will automatically update as commits are pushed.}

- Add custom embedding dimension configuration for unknown models - Fix Ollama API implementation to use correct prompt-based structure - Enhance OpenAI embedder with base URL and dimension support - Improve UI with provider-specific settings and better model management - Add comprehensive VSCode settings for codebase indexing feature **Core Changes:** - Add `codebaseIndexEmbedderDimension` field to schemas and config - Implement proper dimension validation and fallback logic - Support manual dimension override for custom/unknown models **API Fixes:** - Fix Ollama embedder to use individual `prompt` requests instead of batch `input` - Correct response parsing to use `embedding` (singular) field - Add OpenAI base URL support for proxies and Azure endpoints - Implement dimension parameter for OpenAI embedding requests **UI Improvements:** - Reorganize settings with provider-specific sections - Replace model dropdown with flexible text input - Add dimension input field with validation - Improve placeholder text and help descriptions - Add OpenAI base URL configuration field **Configuration:** - Add VSCode settings for all codebase indexing options - Enhance config manager with proper dimension handling - Improve service factory with better error messages - Add comprehensive validation for custom models

ellipsis-dev · 2025-05-25T19:39:39Z

webview-ui/src/components/settings/CodeIndexSettings.tsx

+							</div>
+						</div>
+						<div>
+							<VSCodeTextField


Consider adding inline validation feedback for non-numeric or non-positive input in the 'Custom Model Dimension' UI field to improve user experience.

SannidhyaSah · 2025-05-28T07:06:44Z

Hi @DanielusG, thanks for your work on this PR—expanding support for custom embedding models is a great feature!

Feedback:
I noticed that instead of creating a new, generic embedding provider class for custom models/endpoints, the implementation modifies the existing OpenAiEmbedder and CodeIndexOllamaEmbedder classes to accept custom parameters (like baseURL, modelId, and dimension). While this works, it does make the provider-specific classes more complex and tightly coupled to multiple use cases.

Suggestion:
Would it make sense to consider extracting the custom embedding logic into a separate, more generic “CustomEmbedder” class or abstraction? This could help keep the provider-specific classes focused and easier to maintain, and would make it simpler to add support for additional providers or models in the future.

Overall, nice job on the UI/settings and backend integration! Let me know your thoughts on the above, and happy to discuss further if needed. Also you will need to complete all the translation files whenever you make a change in the settings UI.

DanielusG · 2025-05-28T07:19:44Z

Hi @DanielusG, thanks for your work on this PR—expanding support for custom embedding models is a great feature!

Feedback: I noticed that instead of creating a new, generic embedding provider class for custom models/endpoints, the implementation modifies the existing OpenAiEmbedder and CodeIndexOllamaEmbedder classes to accept custom parameters (like baseURL, modelId, and dimension). While this works, it does make the provider-specific classes more complex and tightly coupled to multiple use cases.

Suggestion: Would it make sense to consider extracting the custom embedding logic into a separate, more generic “CustomEmbedder” class or abstraction? This could help keep the provider-specific classes focused and easier to maintain, and would make it simpler to add support for additional providers or models in the future.

Overall, nice job on the UI/settings and backend integration! Let me know your thoughts on the above, and happy to discuss further if needed. Also you will need to complete all the translation files whenever you make a change in the settings UI.

Alright, after work I'll take a look at making the changes a bit more generic and improving the code's maintainability. I'll also try using an LLM's help to generate all the missing translations and resolve any conflicts with the main branch :)

I’ll share any updates or questions regarding the changes here.

VoxBG · 2025-06-07T23:33:27Z

I think this PR is important, it fixes the use of ollama embedding endpoint, because now GET method is used istead of POST which results in 405 Method Not Allowed response from ollama.

hannesrudolph · 2025-06-10T16:31:28Z

@DanielusG What do we need to do to get this across the finish line?

DanielusG · 2025-06-10T17:00:03Z

Hi there, sorry but I'm currently extremely busy with university exams. I made these changes in my free time and currently have none left to dedicate. Additionally, I noticed that OpenAI Custom Provider for Indexing has already implemented the feature to define the model name and embedding size, making this PR relatively redundant. There have also been significant changes to the main branch, requiring a fairly complex merge operation beyond my current skill level.
I don't know what to say except sorry for the inconvenience 😅.
If there’s a way to make my code editable by others in case someone wants to finish the implementation, please let me know

daniel-lxs · 2025-06-30T15:48:42Z

Closing as stale, let me know if you want to keep working on this at some point although we already allow custom embedding model configuration through the OpenAI compatible provider.

DanielusG added 3 commits May 25, 2025 12:30

Clean up code

526f3af

Merge branch 'RooCodeInc:main' into add-custom-model-name-embeding-index

9acabb3

DanielusG requested review from cte and mrubens as code owners May 25, 2025 19:38

github-project-automation bot added this to Roo Code Roadmap and Roo Code Roadmap May 25, 2025

github-project-automation bot moved this to New in Roo Code Roadmap May 25, 2025

ellipsis-dev bot reviewed May 25, 2025

View reviewed changes

hannesrudolph added the Issue/PR - Triage New issue. Needs quick review to confirm validity and assign labels. label May 27, 2025

hannesrudolph moved this from Triage to PR [Needs Preliminary Review] in Roo Code Roadmap May 28, 2025

hannesrudolph added PR - Needs Preliminary Review and removed Issue/PR - Triage New issue. Needs quick review to confirm validity and assign labels. labels May 28, 2025

daniel-lxs added the PR - Draft / In Progress label Jun 2, 2025

daniel-lxs moved this from PR [Needs Prelim Review] to PR [Draft / In Progress] in Roo Code Roadmap Jun 2, 2025

daniel-lxs removed the PR - Needs Preliminary Review label Jun 2, 2025

daniel-lxs marked this pull request as draft June 3, 2025 23:36

SannidhyaSah mentioned this pull request Jun 29, 2025

feat: add Gemini embedding provider for codebase indexing #5228

Merged

16 tasks

daniel-lxs closed this Jun 30, 2025

github-project-automation bot moved this from New to Done in Roo Code Roadmap Jun 30, 2025

github-project-automation bot moved this from PR [Draft / In Progress] to Done in Roo Code Roadmap Jun 30, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: Allow custom embedding model configuration for Codebase Indexing #3974

feat: Allow custom embedding model configuration for Codebase Indexing #3974

Uh oh!

DanielusG commented May 25, 2025 •

edited by ellipsis-dev bot

Loading

Uh oh!

ellipsis-dev bot May 25, 2025

Uh oh!

SannidhyaSah commented May 28, 2025

Uh oh!

DanielusG commented May 28, 2025

Uh oh!

VoxBG commented Jun 7, 2025

Uh oh!

hannesrudolph commented Jun 10, 2025

Uh oh!

DanielusG commented Jun 10, 2025 •

edited

Loading

Uh oh!

daniel-lxs commented Jun 30, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

feat: Allow custom embedding model configuration for Codebase Indexing #3974

feat: Allow custom embedding model configuration for Codebase Indexing #3974

Uh oh!

Conversation

DanielusG commented May 25, 2025 • edited by ellipsis-dev bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ellipsis-dev bot May 25, 2025

Choose a reason for hiding this comment

Uh oh!

SannidhyaSah commented May 28, 2025

Uh oh!

DanielusG commented May 28, 2025

Uh oh!

VoxBG commented Jun 7, 2025

Uh oh!

hannesrudolph commented Jun 10, 2025

Uh oh!

DanielusG commented Jun 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

daniel-lxs commented Jun 30, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

DanielusG commented May 25, 2025 •

edited by ellipsis-dev bot

Loading

DanielusG commented Jun 10, 2025 •

edited

Loading