Skip to content
This repository was archived by the owner on Sep 11, 2025. It is now read-only.

Conversation

@mattjohnsonpint
Copy link
Contributor

Sanitizes HTTP response from models or API calls to strip away any invalid UTF-8 sequences or null bytes, if the expected content is a string, or an object serialized as a JSON string.

Fixes #903

@mattjohnsonpint mattjohnsonpint requested review from a team and Copilot June 23, 2025 20:34
@linear
Copy link

linear bot commented Jun 23, 2025

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR fixes an issue with ensuring that HTTP responses contain valid UTF-8 byte sequences by sanitizing the content. Key changes include:

  • Introducing Test_SanitizeUTF8 in strings_test.go with various edge cases.
  • Adding the SanitizeUTF8 function in strings.go to remove invalid UTF-8 and null bytes.
  • Applying the sanitation of HTTP response content in http.go before processing it.

Reviewed Changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.

File Description
runtime/utils/strings_test.go Added tests to verify UTF-8 sanitation logic.
runtime/utils/strings.go Implemented SanitizeUTF8 to filter out invalid UTF-8 and null bytes.
runtime/utils/http.go Integrated UTF-8 sanitation in HTTP response handling.
CHANGELOG.md Updated changelog to reflect the fix for invalid UTF-8 responses.
Comments suppressed due to low confidence (1)

runtime/utils/strings.go:49

  • It may be helpful to update the documentation comment to clarify that invalid multi-byte sequences and null bytes are skipped rather than replaced. This can help avoid any potential confusion about the function's behavior.
// SanitizeUTF8 removes invalid UTF-8 sequences from a byte slice.

@mattjohnsonpint mattjohnsonpint enabled auto-merge (squash) June 23, 2025 20:37
@mattjohnsonpint mattjohnsonpint merged commit 5818398 into main Jun 23, 2025
33 checks passed
@mattjohnsonpint mattjohnsonpint deleted the mjp/mod-5-error-invalid-byte-sequence-for-encoding-utf8-0xf8-sqlstate branch June 23, 2025 20:42
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Development

Successfully merging this pull request may close these issues.

ERROR: invalid byte sequence for encoding \"UTF8\": 0xf8 (SQLSTATE 22021)

3 participants