Fix broken unicode escape sequence in JSX text #1754

Copilot · 2025-09-26T15:01:45Z

Fixes a bug where unicode characters on standalone lines in JSX text were being corrupted and output as \uFFFD (the Unicode replacement character) instead of their correct escape sequences.

Problem

When JSX contained unicode characters on separate lines, they were being incorrectly encoded:

export const Component = () => {
    return (<div><span>⚠</span>
        ⚠
    </div>)
}

Before (broken):

return (jsx_runtime_1.jsxs("div", { children: [jsx_runtime_1.jsx("span", { children: "\u26A0" }), "\uFFFD"] }));

After (fixed):

return (jsx_runtime_1.jsxs("div", { children: [jsx_runtime_1.jsx("span", { children: "\u26A0" }), "\u26A0"] }));

Root Cause

The issue was in the fixupWhitespaceAndDecodeEntities function in internal/transformers/jsxtransforms/jsx.go. The code was using byte indices to track character positions but slicing strings at byte boundaries, which corrupted multi-byte UTF-8 characters.

For the 3-byte UTF-8 sequence ⚠ (E2 9A A0), the code was only including the first byte (E2) when slicing, resulting in an invalid UTF-8 sequence that gets replaced with \uFFFD.

Solution

Fixed the string slicing logic to properly handle multi-byte UTF-8 characters by tracking the end byte position (i + size - 1) instead of just the start position (i) of non-whitespace characters. This ensures that when slicing the string, we include complete UTF-8 character sequences.

The fix is minimal and surgical - only changing the variable name and assignment to make the byte boundary handling correct.

Testing

Added comprehensive test coverage in jsxUnicodeEscapeSequence.tsx that verifies:

Inline unicode characters (already worked)
Standalone unicode characters on separate lines (the bug case)
Multiple unicode characters (also affected)

All existing tests continue to pass, confirming no regressions.

Original prompt

This section details on the original issue you should resolve

<issue_title>broken unicode escape sequence in jsx</issue_title>
<issue_description>## Steps to reproduce

tsconfig.json
{
    "compilerOptions": {
        "outDir": "./dist",
        "declaration": true,
        "moduleResolution": "node",
        "strict": true,
        "skipLibCheck": true,
        "jsx": "react-jsx"
    }
}
repro.tsx
export const Repro = () => {
    return (<div><span>⚠</span>
        ⚠
    </div>)
}
Behavior with [email protected]
> tsc
> grep children dist/repro.js
    return ((0, jsx_runtime_1.jsxs)("div", { children: [(0, jsx_runtime_1.jsx)("span", { children: "\u26A0" }), "\u26A0"] }));
Behavior with tsgo
> tsgo
> grep children dist/repro.js
    return (jsx_runtime_1.jsxs("div", { children: [jsx_runtime_1.jsx("span", { children: "\u26A0" }), "\uFFFD"] }));
Its worth noting that the issue happens with the unicode character on a line by itself but not when there is other stuff in the same line.</issue_description>

Comments on the Issue (you are @copilot in this section)
@jakebailey Probably related to #1701 but, why have you done this? 😭 @jakebailey I see, you're saying it's literally https://www.compart.com/en/unicode/U+26A0. @jakebailey Yeah, it's just a bug then, this is a valid character so not sure where it's gone wrong.

Fixes #1715

💬 Share your feedback on Copilot coding agent for the chance to win a $200 gift card! Click here to start the survey.

Co-authored-by: jakebailey <[email protected]>

jakebailey

Shockingly I think it successfully got there.

Copilot

Pull Request Overview

This PR fixes a bug where unicode characters on standalone lines in JSX text were being corrupted and output as \uFFFD (the Unicode replacement character) instead of their correct escape sequences.

Fixed string slicing logic in fixupWhitespaceAndDecodeEntities to properly handle multi-byte UTF-8 characters
Changed byte boundary tracking to use end position instead of start position for non-whitespace characters
Added comprehensive test coverage for unicode characters in JSX text content

Reviewed Changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated no comments.

Show a summary per file

File	Description
`testdata/tests/cases/compiler/jsxUnicodeEscapeSequence.tsx`	Added test cases covering inline, standalone, and multiple unicode characters in JSX
`testdata/baselines/reference/compiler/jsxUnicodeEscapeSequence.types`	Reference baseline for type checking output
`testdata/baselines/reference/compiler/jsxUnicodeEscapeSequence.symbols`	Reference baseline for symbol resolution output
`testdata/baselines/reference/compiler/jsxUnicodeEscapeSequence.js`	Reference baseline showing correctly encoded unicode escape sequences
`testdata/baselines/reference/compiler/jsxUnicodeEscapeSequence.errors.txt`	Reference baseline for expected JSX type errors
`internal/transformers/jsxtransforms/jsx.go`	Fixed unicode character handling by tracking end byte position instead of start position

weswigham

Otherwise, looks good.

testdata/tests/cases/compiler/jsxUnicodeEscapeSequence.tsx

Initial plan

bfd4ad2

Copilot AI assigned Copilot and jakebailey Sep 26, 2025

Copilot started work on behalf of jakebailey September 26, 2025 15:01 View session

Fix unicode escape sequence issue in JSX text

7b25565

Co-authored-by: jakebailey <[email protected]>

Copilot AI changed the title ~~[WIP] broken unicode escape sequence in jsx~~ Fix broken unicode escape sequence in JSX text Sep 26, 2025

Copilot finished work on behalf of jakebailey September 26, 2025 15:23

Copilot AI requested a review from jakebailey September 26, 2025 15:23

jakebailey approved these changes Sep 30, 2025

View reviewed changes

jakebailey marked this pull request as ready for review September 30, 2025 20:13

Copilot AI review requested due to automatic review settings September 30, 2025 20:13

Copilot AI reviewed Sep 30, 2025

View reviewed changes

jakebailey requested review from RyanCavanaugh, sandersn and weswigham September 30, 2025 20:14

weswigham requested changes Sep 30, 2025

View reviewed changes

testdata/tests/cases/compiler/jsxUnicodeEscapeSequence.tsx Show resolved Hide resolved

Update test to have react types

11ed8a7

weswigham approved these changes Sep 30, 2025

View reviewed changes

jakebailey enabled auto-merge September 30, 2025 20:24

jakebailey added this pull request to the merge queue Sep 30, 2025

Merged via the queue into main with commit 96d7293 Sep 30, 2025
22 checks passed

jakebailey deleted the copilot/fix-401ef11a-03e3-494a-ab79-3dd7dd5acd07 branch September 30, 2025 20:55

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix broken unicode escape sequence in JSX text #1754

Fix broken unicode escape sequence in JSX text #1754

Uh oh!

Copilot AI commented Sep 26, 2025 •

edited

Loading

Uh oh!

jakebailey left a comment

Uh oh!

Copilot AI left a comment

Uh oh!

weswigham left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Fix broken unicode escape sequence in JSX text #1754

Fix broken unicode escape sequence in JSX text #1754

Uh oh!

Conversation

Copilot AI commented Sep 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem

Root Cause

Solution

Testing

Behavior with [email protected]

Behavior with tsgo

Comments on the Issue (you are @copilot in this section)

Uh oh!

jakebailey left a comment

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

weswigham left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Copilot AI commented Sep 26, 2025 •

edited

Loading

Behavior with `[email protected]`

Behavior with `tsgo`