-
Notifications
You must be signed in to change notification settings - Fork 2.6k
fix: add support for float encoding format in OpenAI Compatible embedder #7181
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
- Add encodingFormat configuration option to interfaces - Update OpenAICompatibleEmbedder to support both base64 and float formats - Default to base64 for backward compatibility - Add comprehensive tests for encoding format functionality - Update service factory to pass encoding format parameter Fixes #7180
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewing my own code is like debugging in a mirror - everything looks backward but the bugs are still mine.
| input: batchTexts, | ||
| model: model, | ||
| encoding_format: "base64", | ||
| encoding_format: this.encodingFormat, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I notice that in the direct HTTP request path, we always send encoding_format regardless of whether it's "float" or "base64" (line 220). However, in the SDK path (lines 285-287), we only set it for base64. Should we align this behavior for consistency? Some providers might be strict about not accepting the encoding_format parameter when expecting float arrays.
| const openAiCompatibleApiKey = this.contextProxy?.getSecret("codebaseIndexOpenAiCompatibleApiKey") ?? "" | ||
| // Default to base64 for backward compatibility, but allow float format | ||
| const openAiCompatibleEncodingFormat = | ||
| (codebaseIndexConfig as any).codebaseIndexOpenAiCompatibleEncodingFormat ?? "base64" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using (codebaseIndexConfig as any) here bypasses TypeScript's type checking. Consider properly extending the config interface to include codebaseIndexOpenAiCompatibleEncodingFormat for better type safety. This would help catch potential issues at compile time.
| this.isFullUrl = this.isFullEndpointUrl(baseUrl) | ||
| this.maxItemTokens = maxItemTokens || MAX_ITEM_TOKENS | ||
| // Default to base64 for backward compatibility, but allow float format | ||
| this.encodingFormat = encodingFormat || "base64" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The constructor accepts the encodingFormat parameter but doesn't validate it. Consider adding validation to ensure only "base64" or "float" are accepted:
| this.encodingFormat = encodingFormat || "base64" | |
| // Default to base64 for backward compatibility, but allow float format | |
| const validFormats = ["base64", "float"] as const | |
| this.encodingFormat = encodingFormat || "base64" | |
| if (!validFormats.includes(this.encodingFormat)) { | |
| throw new Error(`Invalid encoding format: ${encodingFormat}. Must be 'base64' or 'float'`) | |
| } |
| encoding_format: "base64", | ||
| }) | ||
| }) | ||
| }) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great test coverage! Consider adding a few edge cases:
- What happens when an invalid encoding format like "invalid" is provided?
- How do we handle mixed responses (some base64, some float) from providers?
- Error handling when base64 decoding fails due to corrupted data?
|
Duplicate |
This PR fixes issue #7180 by adding support for the float encoding format in the OpenAI Compatible embedder.
Problem
Some OpenAI-compatible providers (like certain litellm configurations) only support
encoding_format: "float"and not"base64". The current implementation was hardcoded to use"base64", causing errors with these providers.Solution
encodingFormatparameter to the OpenAICompatibleEmbedder constructor"base64"for backward compatibilityencodingFormatis set to"float", the embedder:encoding_formatparameter from the OpenAI SDK call (letting it default to float)"float"in direct HTTP requestsTesting
openai-compatible-encoding.spec.tswith tests for both formatsBreaking Changes
None - the change is backward compatible with existing configurations.
Fixes #7180
Important
Adds support for 'float' encoding format in
OpenAICompatibleEmbedder, with backward compatibility for 'base64'.encodingFormatparameter toOpenAICompatibleEmbedderconstructor, defaulting to"base64"."float"encoding by omittingencoding_formatin SDK calls and using it in HTTP requests.CodeIndexConfigManagerto includeencodingFormatinopenAiCompatibleOptions.CodeIndexConfiginterface to supportencodingFormat.openai-compatible-encoding.spec.tswith tests for both encoding formats."base64"whenencodingFormatis not specified.service-factory.tsto passencodingFormattoOpenAICompatibleEmbedder.This description was created by
for e8df38a. You can customize this summary. It will automatically update as commits are pushed.