-
Notifications
You must be signed in to change notification settings - Fork 2.5k
fix: sanitize non-ASCII characters in API keys for HTTP headers #7960
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
- Added sanitizeForHeader() method to replace non-ASCII characters with ? - Added isAsciiOnly() method to check if string contains only ASCII - Added warning when API key contains non-ASCII characters - Added comprehensive tests for API key sanitization - Fixed ESLint warnings by using charCodeAt instead of regex with control chars Fixes #7959 - ByteString conversion error with Unicode characters
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewing my own code is like debugging in a mirror - everything looks backward but the bugs are still mine.
|
|
||
| it("should handle API keys with emoji and special Unicode characters", async () => { | ||
| const apiKeyWithEmoji = "key-😀-test-™-api" | ||
| // Emoji (😀) is multi-byte and gets replaced with ?? (one for each byte) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this comment accurate? The emoji 😀 is a 4-byte UTF-8 sequence, but the test expects only "??" (2 question marks) rather than "????" (4 question marks). The current implementation replaces each character (UTF-16 code unit) with "?", not each UTF-8 byte. Should we clarify this comment or adjust the implementation to be byte-based?
| model: string, | ||
| ): Promise<OpenAIEmbeddingResponse> { | ||
| // Sanitize the API key to ensure it only contains ASCII characters | ||
| const sanitizedApiKey = OpenAICompatibleEmbedder.sanitizeForHeader(this.apiKey) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Potential security consideration: Multiple different API keys with non-ASCII characters could sanitize to the same string (e.g., "key-•" and "key-§" both become "key-?"). While unlikely to cause issues in practice, would it be worth logging a hash of the original key for debugging purposes when we detect non-ASCII characters?
| * @param value The string to sanitize | ||
| * @returns The sanitized string containing only ASCII characters | ||
| */ | ||
| private static sanitizeForHeader(value: string): string { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor suggestion: The method name sanitizeForHeader() could be more specific like sanitizeToAscii() since it's specifically about ASCII compliance rather than general header sanitization. This would make the intent clearer for future maintainers.
| mutex: new Mutex(), | ||
| } | ||
|
|
||
| /** |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Documentation enhancement: Consider adding a JSDoc example showing the transformation of common non-ASCII characters:
| /** | |
| /** | |
| * Sanitizes a string to ensure it only contains ASCII characters suitable for HTTP headers. | |
| * Non-ASCII characters are replaced with '?' to maintain the string structure. | |
| * @param value The string to sanitize | |
| * @returns The sanitized string containing only ASCII characters | |
| * @example | |
| * sanitizeForHeader("key-•-test") // returns "key-?-test" | |
| * sanitizeForHeader("api-😀-key") // returns "api-??-key" | |
| * sanitizeForHeader("test-™-api") // returns "test-?-api" | |
| */ |
| model: string, | ||
| ): Promise<OpenAIEmbeddingResponse> { | ||
| // Sanitize the API key to ensure it only contains ASCII characters | ||
| const sanitizedApiKey = OpenAICompatibleEmbedder.sanitizeForHeader(this.apiKey) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good implementation! The sanitization is applied consistently to both the direct HTTP request path here and presumably to the SDK path as well. This ensures the fix works regardless of how the embedder is configured.
|
I'd like this fix to be more general |
This PR attempts to address Issue #7959. Feedback and guidance are welcome.
Problem
Users were experiencing a "Cannot convert argument to a ByteString" error when using the OpenAI-compatible API with non-ASCII characters (like Unicode bullet points •) in their API keys. This occurred because HTTP headers must contain only ASCII characters.
Solution
sanitizeForHeader()method to replace non-ASCII characters with?before using API keys in HTTP headersisAsciiOnly()method to check if strings contain only ASCII characterscharCodeAt()instead of regex with control characters to avoid ESLint warningsTesting
Impact
This fix ensures that API keys with non-ASCII characters will work (though with reduced functionality) rather than causing a complete failure. Users are warned when their API keys contain problematic characters.
Fixes #7959
Important
Sanitizes non-ASCII characters in API keys for HTTP headers, adds validation, and tests for Unicode handling.
sanitizeForHeader()inopenai-compatible.tsto replace non-ASCII characters in API keys with?for HTTP headers.isAsciiOnly()to check if strings contain only ASCII characters.openai-compatible.spec.tsfor API key sanitization, covering Unicode characters, emojis, and multi-byte characters.This description was created by
for 5645a52. You can customize this summary. It will automatically update as commits are pushed.