-
Notifications
You must be signed in to change notification settings - Fork 716
Fix broken unicode escape sequence in JSX text #1754
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix broken unicode escape sequence in JSX text #1754
Conversation
Co-authored-by: jakebailey <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shockingly I think it successfully got there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR fixes a bug where unicode characters on standalone lines in JSX text were being corrupted and output as \uFFFD
(the Unicode replacement character) instead of their correct escape sequences.
- Fixed string slicing logic in
fixupWhitespaceAndDecodeEntities
to properly handle multi-byte UTF-8 characters - Changed byte boundary tracking to use end position instead of start position for non-whitespace characters
- Added comprehensive test coverage for unicode characters in JSX text content
Reviewed Changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated no comments.
Show a summary per file
File | Description |
---|---|
testdata/tests/cases/compiler/jsxUnicodeEscapeSequence.tsx |
Added test cases covering inline, standalone, and multiple unicode characters in JSX |
testdata/baselines/reference/compiler/jsxUnicodeEscapeSequence.types |
Reference baseline for type checking output |
testdata/baselines/reference/compiler/jsxUnicodeEscapeSequence.symbols |
Reference baseline for symbol resolution output |
testdata/baselines/reference/compiler/jsxUnicodeEscapeSequence.js |
Reference baseline showing correctly encoded unicode escape sequences |
testdata/baselines/reference/compiler/jsxUnicodeEscapeSequence.errors.txt |
Reference baseline for expected JSX type errors |
internal/transformers/jsxtransforms/jsx.go |
Fixed unicode character handling by tracking end byte position instead of start position |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Otherwise, looks good.
Fixes a bug where unicode characters on standalone lines in JSX text were being corrupted and output as
\uFFFD
(the Unicode replacement character) instead of their correct escape sequences.Problem
When JSX contained unicode characters on separate lines, they were being incorrectly encoded:
Before (broken):
After (fixed):
Root Cause
The issue was in the
fixupWhitespaceAndDecodeEntities
function ininternal/transformers/jsxtransforms/jsx.go
. The code was using byte indices to track character positions but slicing strings at byte boundaries, which corrupted multi-byte UTF-8 characters.For the 3-byte UTF-8 sequence ⚠ (
E2 9A A0
), the code was only including the first byte (E2
) when slicing, resulting in an invalid UTF-8 sequence that gets replaced with\uFFFD
.Solution
Fixed the string slicing logic to properly handle multi-byte UTF-8 characters by tracking the end byte position (
i + size - 1
) instead of just the start position (i
) of non-whitespace characters. This ensures that when slicing the string, we include complete UTF-8 character sequences.The fix is minimal and surgical - only changing the variable name and assignment to make the byte boundary handling correct.
Testing
Added comprehensive test coverage in
jsxUnicodeEscapeSequence.tsx
that verifies:All existing tests continue to pass, confirming no regressions.
Original prompt
💬 Share your feedback on Copilot coding agent for the chance to win a $200 gift card! Click here to start the survey.