-
Notifications
You must be signed in to change notification settings - Fork 2.4k
Feature/instruction aware embeddings #8969
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature/instruction aware embeddings #8969
Conversation
Update branch to latest - 31 Oct 2025
…ld index
Added support for DeepInfra-hosted embedding models and fix a critical bug where
the 'type' field index was missing in Qdrant, causing "Bad Request" errors
during code search operations.
Changes:
- Added DeepInfra provider detection in OpenAICompatibleEmbedder
* Detect DeepInfra URLs (deepinfra.com)
* Use 'float' encoding format for DeepInfra, 'base64' for other standard
providers
* Handle both float array and base64 string embedding responses
* Added validation for embedding values (NaN/Infinity checking)
- Fix missing Qdrant payload index for 'type' field
* Non-existing `type` field causes "Bad Request" during `codebase_search`
tool invocation
* Create keyword index for 'type' field to support metadata filtering
* Resolves "Index required but not found for 'type' field" error
- Added 7 DeepInfra embedding model profiles:
* Qwen/Qwen3-Embedding-0.6B (1024 dims)
* Qwen/Qwen3-Embedding-4B (2560 dims)
* Qwen/Qwen3-Embedding-8B (4096 dims)
* intfloat/multilingual-e5-large-instruct (1024 dims)
* google/embeddinggemma-300m (768 dims)
* BAAI/bge-m3 (1024 dims)
* BAAI/bge-large-en-v1.5 (1024 dims)
- Added some test coverage for DeepInfra
* Provider validation
* Encoding format tests
* Float array and base64 response handling tests
* Configuration validation tests
Tested with: embeddinggemma-300m, text-embedding-004, multilingual-e5-large
feat: add DeepInfra embedding support and fix missing Qdrant `type` index
…dels - Add queryPrefix support for Qwen3-Embedding models (0.6B, 4B, 8B) - Add queryPrefix for intfloat/multilingual-e5-large-instruct - Add queryPrefix for google/embeddinggemma-300m - Add queryPrefix for BAAI/bge-large-en-v1.5 - Reduce MAX_ITEM_TOKENS from 8191 to 512 for compatibility with models that have 512 token limits (e5-large, bge-large-en-v1.5) Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>
|
Found 1 issue that needs to be addressed:
Mention @roomote in a comment to trigger your PR Fixer agent and make changes to this pull request. |
| ) | ||
| }) | ||
| }) | ||
| }) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The query prefix functionality added in this PR (lines 114-137) lacks test coverage. While DeepInfra provider detection and encoding format are tested, there are no tests verifying that getModelQueryPrefix() prefixes are actually being applied to queries. Consider adding tests that verify: (1) prefixes are correctly added for models that require them, (2) double-prefixing is prevented, and (3) texts that would exceed MAX_ITEM_TOKENS after prefixing are handled appropriately.
Description
Adds instruction-aware query prefixes for DeepInfra embedding models to
improve semantic search accuracy. Changes include:
search instruction format
that have 512 token limits
Test Procedure
Pre-Submission Checklist
documentation updates (see "Documentation Updates" section below).
Screenshots / Videos
Documentation Updates
Additional Notes
The query prefixes are model-specific and follow the recommended format
from each model's documentation. This change is backward compatible
and won't affect existing implementations that don't use these specific models.
Get in Touch
Discord: badgambit
Important
Adds instruction-aware query prefixes for embedding models and adjusts token limits for compatibility.
Qwen3-Embeddingmodels (0.6B, 4B, 8B),intfloat/multilingual-e5-large-instruct,google/embeddinggemma-300m, andBAAI/bge-large-en-v1.5inembeddingModels.ts.MAX_ITEM_TOKENSfrom 8191 to 511 inindex.tsfor model compatibility.openai-compatible.spec.tsforDeepInfraprovider detection and handling, including encoding format and response processing.getModelQueryPrefix()function returns correct prefixes for each model.OpenAICompatibleEmbedderinopenai-compatible.tsto handle different encoding formats based on provider type.This description was created by
for e9f2e0c. You can customize this summary. It will automatically update as commits are pushed.