-
Notifications
You must be signed in to change notification settings - Fork 2.6k
fix: prioritize user-configured embedding dimension over model default #8106
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewing my own code feels like debugging in a mirror - everything's backwards but the bugs are still mine.
| /** | ||
| * Gets the current model dimension being used for embeddings. | ||
| * Returns the model's built-in dimension if available, otherwise falls back to custom dimension. | ||
| * Returns the user-configured custom dimension if set, otherwise falls back to model's built-in dimension. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice fix! The logic correctly prioritizes user-configured dimensions now. Though I wonder if the JSDoc comment could be even clearer by explicitly stating "User-configured dimensions override any model defaults" to make the priority crystal clear?
| // Only use manual dimension if model doesn't have a built-in dimension | ||
| if (!vectorSize && config.modelDimension && config.modelDimension > 0) { | ||
| // Prioritize user-configured custom dimension if explicitly set | ||
| if (config.modelDimension && config.modelDimension > 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good consistency with the config-manager implementation. The if-else structure makes the priority clear.
| }) | ||
|
|
||
| it("should return model's built-in dimension when available", async () => { | ||
| it("should prioritize user-configured dimension over model's built-in dimension", async () => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Excellent test coverage! This test name could be even more descriptive though. How about "should always use user-configured dimension when set, ignoring model defaults" to really emphasize the override behavior?
| expect(configManager.currentModelDimension).toBe(1536) | ||
| expect(mockedGetModelDimension).toHaveBeenCalledWith("openai", "text-embedding-3-small") | ||
| }) | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great addition! This test specifically addresses the issue #8102 scenario where user sets 1536 but model default is 1024. This ensures the bug won't regress.
This PR attempts to address Issue #8102. Feedback and guidance are welcome.
Problem
When a user sets an embedding dimension (e.g., 1536), the system was still producing vectors of a different size (e.g., 1024) while the vector database creates a collection with the configured size. This mismatch caused a vector dimension error and blocked indexing and search.
Solution
Updated the dimension priority logic to respect user-configured dimensions as the single source of truth:
config-manager.ts: Now checks user-configured dimension first before falling back to model defaultsservice-factory.ts: Uses the same priority logic for consistencyTesting
Fixes #8102
Important
Prioritize user-configured embedding dimension over model default to prevent dimension mismatch errors.
config-manager.tsandservice-factory.ts.config-manager.spec.tsto reflect new priority logic.This description was created by
for 0c35f22. You can customize this summary. It will automatically update as commits are pushed.