-
Notifications
You must be signed in to change notification settings - Fork 2.6k
feat: Allow custom embedding model configuration for Codebase Indexing #3974
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Allow custom embedding model configuration for Codebase Indexing #3974
Conversation
- Add custom embedding dimension configuration for unknown models - Fix Ollama API implementation to use correct prompt-based structure - Enhance OpenAI embedder with base URL and dimension support - Improve UI with provider-specific settings and better model management - Add comprehensive VSCode settings for codebase indexing feature **Core Changes:** - Add `codebaseIndexEmbedderDimension` field to schemas and config - Implement proper dimension validation and fallback logic - Support manual dimension override for custom/unknown models **API Fixes:** - Fix Ollama embedder to use individual `prompt` requests instead of batch `input` - Correct response parsing to use `embedding` (singular) field - Add OpenAI base URL support for proxies and Azure endpoints - Implement dimension parameter for OpenAI embedding requests **UI Improvements:** - Reorganize settings with provider-specific sections - Replace model dropdown with flexible text input - Add dimension input field with validation - Improve placeholder text and help descriptions - Add OpenAI base URL configuration field **Configuration:** - Add VSCode settings for all codebase indexing options - Enhance config manager with proper dimension handling - Improve service factory with better error messages - Add comprehensive validation for custom models
| </div> | ||
| </div> | ||
| <div> | ||
| <VSCodeTextField |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider adding inline validation feedback for non-numeric or non-positive input in the 'Custom Model Dimension' UI field to improve user experience.
|
Hi @DanielusG, thanks for your work on this PR—expanding support for custom embedding models is a great feature! Feedback: Suggestion: Overall, nice job on the UI/settings and backend integration! Let me know your thoughts on the above, and happy to discuss further if needed. Also you will need to complete all the translation files whenever you make a change in the settings UI. |
Alright, after work I'll take a look at making the changes a bit more generic and improving the code's maintainability. I'll also try using an LLM's help to generate all the missing translations and resolve any conflicts with the main branch :) I’ll share any updates or questions regarding the changes here. |
|
I think this PR is important, it fixes the use of ollama embedding endpoint, because now GET method is used istead of POST which results in |
|
@DanielusG What do we need to do to get this across the finish line? |
|
Hi there, sorry but I'm currently extremely busy with university exams. I made these changes in my free time and currently have none left to dedicate. Additionally, I noticed that OpenAI Custom Provider for Indexing has already implemented the feature to define the model name and embedding size, making this PR relatively redundant. There have also been significant changes to the main branch, requiring a fairly complex merge operation beyond my current skill level. |
|
Closing as stale, let me know if you want to keep working on this at some point although we already allow custom embedding model configuration through the OpenAI compatible provider. |
Closes #3959
Description:
This Pull Request introduces the ability for users to configure custom embedding models (name, URL, dimension) for the Codebase Indexing feature, as proposed in Issue #3959
Changes Introduced:
Base URL(for Ollama endpoint or OpenAI-compatible proxy/endpoint)Model ID(allowing custom model names)Custom Model Dimension(optional, for unrecognized models)config-manager.ts, types, schemas) to store and use these settings.OpenAiEmbedderto acceptbaseURLand optionaldimensions.CodeIndexOllamaEmbedderfor standard API interaction.service-factory.tsto determine vector dimensions using built-in profiles first, then falling back to the user-providedCustom Model Dimension.package.json.CodeIndexSettings.tsx) with the new input fields and logic.settings.json).Motivation & Roadmap Alignment:
Testing:
http://localhost:11434) with the modeljina-embeddings-v2-base-code:latestand specifying the dimension768. Indexing used the custom settings correctly.Screenshots/Videos:

Development Notes:
Documentation Impact:
Important
This PR adds support for configuring custom embedding models for codebase indexing, including UI, backend, and schema updates.
Base URL,Model ID, andCustom Model DimensioninCodeIndexSettings.tsx.config-manager.tsto store and use custom model settings.OpenAiEmbedderandCodeIndexOllamaEmbedderto accept custombaseURLanddimensions.service-factory.tsto handle custom model dimensions.codebaseIndexEmbedderDimensiontoCodebaseIndexConfiginschemas/index.tsandtypes.ts.ProviderSettingsandGlobalSettingsto include custom model configuration.settings.json.package.jsonto include new settings definitions.This description was created by
for 9acabb3. You can customize this summary. It will automatically update as commits are pushed.