Skip to content

Conversation

@roomote
Copy link
Contributor

@roomote roomote bot commented Aug 18, 2025

This PR fixes issue #7180 by adding support for the float encoding format in the OpenAI Compatible embedder.

Problem

Some OpenAI-compatible providers (like certain litellm configurations) only support encoding_format: "float" and not "base64". The current implementation was hardcoded to use "base64", causing errors with these providers.

Solution

  • Added an optional encodingFormat parameter to the OpenAICompatibleEmbedder constructor
  • Default to "base64" for backward compatibility
  • When encodingFormat is set to "float", the embedder:
    • Omits the encoding_format parameter from the OpenAI SDK call (letting it default to float)
    • Uses "float" in direct HTTP requests
    • Handles float array responses without base64 decoding
  • Added configuration support in interfaces and config manager
  • Added comprehensive tests for both encoding formats

Testing

  • Added new test file openai-compatible-encoding.spec.ts with tests for both formats
  • All existing tests pass
  • Verified backward compatibility with default base64 format

Breaking Changes

None - the change is backward compatible with existing configurations.

Fixes #7180


Important

Adds support for 'float' encoding format in OpenAICompatibleEmbedder, with backward compatibility for 'base64'.

  • Behavior:
    • Adds encodingFormat parameter to OpenAICompatibleEmbedder constructor, defaulting to "base64".
    • Supports "float" encoding by omitting encoding_format in SDK calls and using it in HTTP requests.
    • Handles float array responses without base64 decoding.
  • Configuration:
    • Updates CodeIndexConfigManager to include encodingFormat in openAiCompatibleOptions.
    • Modifies CodeIndexConfig interface to support encodingFormat.
  • Testing:
    • Adds openai-compatible-encoding.spec.ts with tests for both encoding formats.
    • Ensures backward compatibility by defaulting to "base64" when encodingFormat is not specified.
  • Misc:
    • Updates service-factory.ts to pass encodingFormat to OpenAICompatibleEmbedder.

This description was created by Ellipsis for e8df38a. You can customize this summary. It will automatically update as commits are pushed.

- Add encodingFormat configuration option to interfaces
- Update OpenAICompatibleEmbedder to support both base64 and float formats
- Default to base64 for backward compatibility
- Add comprehensive tests for encoding format functionality
- Update service factory to pass encoding format parameter

Fixes #7180
@roomote roomote bot requested review from cte, jr and mrubens as code owners August 18, 2025 13:22
@dosubot dosubot bot added size:L This PR changes 100-499 lines, ignoring generated files. enhancement New feature or request labels Aug 18, 2025
Copy link
Contributor Author

@roomote roomote bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewing my own code is like debugging in a mirror - everything looks backward but the bugs are still mine.

input: batchTexts,
model: model,
encoding_format: "base64",
encoding_format: this.encodingFormat,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I notice that in the direct HTTP request path, we always send encoding_format regardless of whether it's "float" or "base64" (line 220). However, in the SDK path (lines 285-287), we only set it for base64. Should we align this behavior for consistency? Some providers might be strict about not accepting the encoding_format parameter when expecting float arrays.

const openAiCompatibleApiKey = this.contextProxy?.getSecret("codebaseIndexOpenAiCompatibleApiKey") ?? ""
// Default to base64 for backward compatibility, but allow float format
const openAiCompatibleEncodingFormat =
(codebaseIndexConfig as any).codebaseIndexOpenAiCompatibleEncodingFormat ?? "base64"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using (codebaseIndexConfig as any) here bypasses TypeScript's type checking. Consider properly extending the config interface to include codebaseIndexOpenAiCompatibleEncodingFormat for better type safety. This would help catch potential issues at compile time.

this.isFullUrl = this.isFullEndpointUrl(baseUrl)
this.maxItemTokens = maxItemTokens || MAX_ITEM_TOKENS
// Default to base64 for backward compatibility, but allow float format
this.encodingFormat = encodingFormat || "base64"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The constructor accepts the encodingFormat parameter but doesn't validate it. Consider adding validation to ensure only "base64" or "float" are accepted:

Suggested change
this.encodingFormat = encodingFormat || "base64"
// Default to base64 for backward compatibility, but allow float format
const validFormats = ["base64", "float"] as const
this.encodingFormat = encodingFormat || "base64"
if (!validFormats.includes(this.encodingFormat)) {
throw new Error(`Invalid encoding format: ${encodingFormat}. Must be 'base64' or 'float'`)
}

encoding_format: "base64",
})
})
})
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great test coverage! Consider adding a few edge cases:

  • What happens when an invalid encoding format like "invalid" is provided?
  • How do we handle mixed responses (some base64, some float) from providers?
  • Error handling when base64 decoding fails due to corrupted data?

@hannesrudolph hannesrudolph added the Issue/PR - Triage New issue. Needs quick review to confirm validity and assign labels. label Aug 18, 2025
@daniel-lxs daniel-lxs moved this from Triage to PR [Needs Prelim Review] in Roo Code Roadmap Aug 19, 2025
@hannesrudolph hannesrudolph added PR - Needs Preliminary Review and removed Issue/PR - Triage New issue. Needs quick review to confirm validity and assign labels. labels Aug 19, 2025
@daniel-lxs
Copy link
Member

Duplicate

@daniel-lxs daniel-lxs closed this Aug 19, 2025
@github-project-automation github-project-automation bot moved this from New to Done in Roo Code Roadmap Aug 19, 2025
@github-project-automation github-project-automation bot moved this from PR [Needs Prelim Review] to Done in Roo Code Roadmap Aug 19, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request PR - Needs Preliminary Review size:L This PR changes 100-499 lines, ignoring generated files.

Projects

Archived in project

Development

Successfully merging this pull request may close these issues.

Code-index: Not work with encoding_format "float"

4 participants