-
Notifications
You must be signed in to change notification settings - Fork 2.6k
fix: add nomic-embed-code support with model-specific score thresholds and query prefixes (#5027) #5036
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…s and query prefixes (#5027) - Add scoreThreshold and queryPrefix properties to embedding model profiles - Implement nomic-embed-code model with 0.15 threshold and required query prefix - Update config manager to use model-specific score thresholds dynamically - Modify all embedders to apply query prefixes when required - Maintain backward compatibility for existing models - Fix search functionality for nomic-embed-code embeddings
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR introduces model-specific configurations for semantic search, enabling per-model score thresholds and query prefixes—primarily adding support for the nomic-embed-code model.
- Extended
EmbeddingModelProfilewithscoreThresholdandqueryPrefix, addednomic-embed-codeentries - Added
getModelScoreThreshold/getModelQueryPrefixutilities and updated embedders to apply prefixes - Updated
CodeIndexConfigManagerto use dynamic search minimum score viacurrentSearchMinScore
Reviewed Changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| src/shared/embeddingModels.ts | Added scoreThreshold/queryPrefix, configured nomic-embed-code, and utility functions |
| src/services/code-index/interfaces/config.ts | Made searchMinScore required |
| src/services/code-index/embedders/openai.ts | Imported and applied getModelQueryPrefix |
| src/services/code-index/embedders/openai-compatible.ts | Imported and applied getModelQueryPrefix |
| src/services/code-index/embedders/ollama.ts | Imported and applied getModelQueryPrefix |
| src/services/code-index/config-manager.ts | Switched to dynamic currentSearchMinScore via getModelScoreThreshold |
Comments suppressed due to low confidence (2)
src/shared/embeddingModels.ts:80
- Add unit tests for
getModelScoreThresholdandgetModelQueryPrefixto validate behavior across providers and models, including edge cases and the newnomic-embed-codeconfiguration.
export function getModelScoreThreshold(provider: EmbedderProvider, modelId: string): number | undefined {
src/services/code-index/embedders/openai.ts:42
- Add tests for
createEmbeddingsto ensure that thequeryPrefixis correctly prepended when returned bygetModelQueryPrefix, and that texts remain unmodified when no prefix is provided.
const queryPrefix = getModelQueryPrefix("openai", modelToUse)
| "text-embedding-3-small": { dimension: 1536, scoreThreshold: 0.4 }, | ||
| "text-embedding-3-large": { dimension: 3072, scoreThreshold: 0.4 }, | ||
| "text-embedding-ada-002": { dimension: 1536, scoreThreshold: 0.4 }, | ||
| }, | ||
| ollama: { | ||
| "nomic-embed-text": { dimension: 768 }, | ||
| "mxbai-embed-large": { dimension: 1024 }, | ||
| "all-minilm": { dimension: 384 }, | ||
| "nomic-embed-text": { dimension: 768, scoreThreshold: 0.4 }, | ||
| "nomic-embed-code": { | ||
| dimension: 3584, | ||
| scoreThreshold: 0.15, | ||
| queryPrefix: "Represent this query for searching relevant code: ", | ||
| }, | ||
| "mxbai-embed-large": { dimension: 1024, scoreThreshold: 0.4 }, | ||
| "all-minilm": { dimension: 384, scoreThreshold: 0.4 }, | ||
| // Add default Ollama model if applicable, e.g.: | ||
| // 'default': { dimension: 768 } // Assuming a default dimension | ||
| }, | ||
| "openai-compatible": { | ||
| "text-embedding-3-small": { dimension: 1536 }, | ||
| "text-embedding-3-large": { dimension: 3072 }, | ||
| "text-embedding-ada-002": { dimension: 1536 }, | ||
| "text-embedding-3-small": { dimension: 1536, scoreThreshold: 0.4 }, | ||
| "text-embedding-3-large": { dimension: 3072, scoreThreshold: 0.4 }, | ||
| "text-embedding-ada-002": { dimension: 1536, scoreThreshold: 0.4 }, |
Copilot
AI
Jun 23, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider extracting the repeated default scoreThreshold value (0.4) into a named constant (e.g., DEFAULT_SCORE_THRESHOLD) to avoid duplication and ease future updates.
| "nomic-embed-code": { | ||
| dimension: 3584, | ||
| scoreThreshold: 0.15, | ||
| queryPrefix: "Represent this query for searching relevant code: ", |
Copilot
AI
Jun 23, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Extract this query prefix literal into a constant (e.g., NOMINC_EMBED_CODE_PREFIX) to reduce duplication and improve readability.
| queryPrefix: "Represent this query for searching relevant code: ", | |
| queryPrefix: NOMIC_EMBED_CODE_PREFIX, |
|
I think a better solution for the issue would be to expose the score threshold for it to be modified by users rather than adding a default score to each model. That would require some UI changes but I think is worth it and easier to maintain in the long run. The issue also mentions adding text for the I think we can split the issue into 2 PRs:
I'll be closing this PR but it can be used as a base for the 2 required PRs to solve the issue. |
Description
Fixes #5027
This PR adds proper support for the
nomic-embed-codemodel in the semantic code indexing feature by implementing model-specific configurations for score thresholds and query prefixes.Changes Made
EmbeddingModelProfileinterface with optionalscoreThresholdandqueryPrefixpropertiesnomic-embed-codemodel configuration with dimension 3584, score threshold 0.15, and required query prefixconfig-manager.tsto dynamically retrieve model-specific score thresholds viagetModelScoreThreshold()openai.ts,ollama.ts,openai-compatible.ts) to apply query prefixes when requiredsearchMinScorerequired in config interface for consistencygetModelScoreThreshold()andgetModelQueryPrefix()for dynamic configuration lookupTesting
Verification of Acceptance Criteria
Checklist
Technical Details
The solution implements a per-model configuration system that allows each embedding model to specify:
This approach ensures optimal search performance while maintaining full backward compatibility with existing models that don't require these configurations.
Important
Adds model-specific configurations for
nomic-embed-codewith score thresholds and query prefixes, updating embedders and configuration management.nomic-embed-codemodel with dimension 3584, score threshold 0.15, and query prefix inembeddingModels.ts.config-manager.tsto usegetModelScoreThreshold()for dynamic score threshold retrieval.openai.ts,ollama.ts, andopenai-compatible.tsto apply query prefixes.EmbeddingModelProfilewithscoreThresholdandqueryPrefix.searchMinScorerequired inCodeIndexConfig.getModelScoreThreshold()andgetModelQueryPrefix()inembeddingModels.ts.This description was created by
for e03c03e. You can customize this summary. It will automatically update as commits are pushed.