Fix type soundness for lone-surrogate string literals by Copilot · Pull Request #3912 · microsoft/typescript-go

Copilot · 2026-05-16T16:33:33Z

Go's string(rune) replaces surrogate code points (U+D800–U+DFFF) with U+FFFD, so all lone surrogates like "\uD800" and "\uDC00" collapsed to the same internal string value, making the type checker treat them as identical types.

const x: "\uD800" = "\uDC00"; // should error, was accepted

Scanner (scanner.go): Use encodeSurrogate() (CESU-8) for all lone surrogates in scanEscapeSequence, including both fixed-width \uXXXX and extended \u{XXXX} escape syntax. Different surrogates now produce distinct byte sequences.
Printer (utilities.go): Add decodeCESU8OrUTF8 to recognize CESU-8 surrogates during string escaping, and force surrogates to always emit as \uXXXX regardless of NeverAsciiEscape.
String comparison (compare.go): Add decodeCESU8OrUTF8 to CompareStringsCaseInsensitive so CESU-8 encoded surrogates are properly decoded and compared instead of being seen as individual invalid UTF-8 bytes.
Test: New loneSurrogateStringLiterals.ts covering cross-surrogate assignment errors, same-surrogate identity, extended Unicode escapes (\u{D800} vs \u{DC00}), and mixed syntax equivalence.

Agent-Logs-Url: https://github.com/microsoft/typescript-go/sessions/408ccedd-6e7b-4112-befe-36f10f374c56 Co-authored-by: jakebailey <5341706+jakebailey@users.noreply.github.com>

Use CESU-8 encoding for lone surrogates in string literal scanning so different surrogates remain distinct. Add CESU-8 surrogate detection in the printer's escapeStringWorker to properly render them as \uXXXX escape sequences in output. Agent-Logs-Url: https://github.com/microsoft/typescript-go/sessions/408ccedd-6e7b-4112-befe-36f10f374c56 Co-authored-by: jakebailey <5341706+jakebailey@users.noreply.github.com>

DanielRosenwasser · 2026-05-18T19:06:55Z

There's a few places that rely on text from strings that we may want adjusted as well. For example, CompareStringsCaseInsensitive calls utf8.DecodeRuneInString, but CompareStringsCaseInsensitive is called by compareDeclarationInfos.

Copilot

Pull request overview

This PR fixes handling of lone-surrogate string literal values so distinct surrogate code units no longer collapse to U+FFFD in type checking and declaration/type output.

Changes:

Encodes lone \uXXXX surrogate escapes as CESU-8 sentinel bytes in the scanner.
Updates printer escaping to recognize CESU-8 surrogate sentinels and emit \uXXXX.
Adds a compiler regression test and updates affected baselines.

Show a summary per file

File	Description
`internal/scanner/scanner.go`	Preserves lone surrogate escapes as distinct internal values.
`internal/printer/utilities.go`	Prints CESU-8 surrogate sentinels as UTF-16 escape sequences.
`testdata/tests/cases/compiler/loneSurrogateStringLiterals.ts`	Adds regression coverage for surrogate literal assignability.
`testdata/baselines/reference/compiler/loneSurrogateStringLiterals.errors.txt`	Adds expected assignability diagnostics.
`testdata/baselines/reference/compiler/loneSurrogateStringLiterals.js`	Adds JS emit baseline for the new test.
`testdata/baselines/reference/compiler/loneSurrogateStringLiterals.symbols`	Adds symbol baseline for the new test.
`testdata/baselines/reference/compiler/loneSurrogateStringLiterals.types`	Adds type baseline for the new test.
`testdata/baselines/reference/compiler/unicodeSurrogatesInStringLiterals.js`	Updates declaration emit baseline for lone surrogates.
`testdata/baselines/reference/compiler/unicodeSurrogatesInStringLiterals.types`	Updates type display baseline for lone surrogates.

Copilot's findings

Files reviewed: 9/9 changed files
Comments generated: 2

+		} else if codePointIsHighSurrogate(codePoint) || codePointIsLowSurrogate(codePoint) {
+			// Lone surrogate: encode as CESU-8 so that distinct surrogates remain
+			// distinguishable. Go's string(rune) would replace all surrogates with
+			// U+FFFD, collapsing e.g. "\uD800" and "\uDC00" into the same value.
 			return encodeSurrogate(codePoint)


+const highSurrogate: "\uD800" = "\uD800"; // ok
+const lowSurrogate: "\uDC00" = "\uDC00"; // ok


jakebailey · 2026-05-18T19:18:04Z

@copilot apply changes based on the comments in this thread

…pareStringsCaseInsensitive - Fix extended escape branch in scanEscapeSequence to use encodeSurrogate for surrogate code points in \u{XXXX} syntax - Add decodeCESU8OrUTF8 to CompareStringsCaseInsensitive so CESU-8 encoded surrogates are properly compared - Add test cases for \u{D800} vs \u{DC00} and mixed syntax Agent-Logs-Url: https://github.com/microsoft/typescript-go/sessions/a677b991-fe22-4da6-a336-875eaefb4eff Co-authored-by: jakebailey <5341706+jakebailey@users.noreply.github.com>

Copilot · 2026-05-18T19:38:11Z

There's a few places that rely on text from strings that we may want adjusted as well. For example, CompareStringsCaseInsensitive calls utf8.DecodeRuneInString, but CompareStringsCaseInsensitive...

Good catch. Fixed in ef2a825 — CompareStringsCaseInsensitive now uses a decodeCESU8OrUTF8 helper that recognizes the CESU-8 surrogate byte sequences, so different surrogates won't collapse to the same RuneError during comparison.

@copilot apply changes based on the comments in this thread

Done in ef2a825 — the extended escape branch (\u{D800}) now uses encodeSurrogate for surrogate code points, and the test covers \u{D800} vs \u{DC00} as well as mixed syntax (\uD800 vs \u{DC00}).

jakebailey · 2026-05-18T19:40:45Z

oh no, this is getting spooky

I feel like this all really does say that we need to be consistent about using WTF-8 or something everywhere

Initial plan

25349cc

Copilot AI assigned Copilot and jakebailey May 16, 2026

Copilot started work on behalf of jakebailey May 16, 2026 16:33 View session

Copilot AI linked an issue May 16, 2026 that may be closed by this pull request

tsgo breaks type soundness for lone-surrogate string literals #3899

Open

Copilot AI and others added 2 commits May 16, 2026 16:40

Add test case for lone surrogate string literals (pre-fix)

ba75b02

Agent-Logs-Url: https://github.com/microsoft/typescript-go/sessions/408ccedd-6e7b-4112-befe-36f10f374c56 Co-authored-by: jakebailey <5341706+jakebailey@users.noreply.github.com>

Copilot AI changed the title ~~[WIP] Fix tsgo breaking type soundness for lone-surrogate string literals~~ Fix type soundness for lone-surrogate string literals May 16, 2026

Copilot finished work on behalf of jakebailey May 16, 2026 17:02

Copilot AI requested a review from jakebailey May 16, 2026 17:02

DanielRosenwasser marked this pull request as ready for review May 18, 2026 19:07

Copilot AI review requested due to automatic review settings May 18, 2026 19:07

Copilot started reviewing on behalf of DanielRosenwasser May 18, 2026 19:10 View session

Copilot AI reviewed May 18, 2026

View reviewed changes

Copilot started work on behalf of jakebailey May 18, 2026 19:18 View session

Copilot finished work on behalf of jakebailey May 18, 2026 19:39

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix type soundness for lone-surrogate string literals#3912

Fix type soundness for lone-surrogate string literals#3912
Copilot wants to merge 4 commits into
mainfrom
copilot/fix-tsgo-lone-surrogate-issue

Copilot AI commented May 16, 2026 •

edited

Loading

Uh oh!

DanielRosenwasser commented May 18, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

jakebailey commented May 18, 2026

Uh oh!

Copilot AI commented May 18, 2026 •

edited

Loading

Uh oh!

jakebailey commented May 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

		const highSurrogate: "\uD800" = "\uD800"; // ok
		const lowSurrogate: "\uDC00" = "\uDC00"; // ok

Conversation

Copilot AI commented May 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

DanielRosenwasser commented May 18, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Copilot's findings

Uh oh!

jakebailey commented May 18, 2026

Uh oh!

Copilot AI commented May 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jakebailey commented May 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Copilot AI commented May 16, 2026 •

edited

Loading

Copilot AI commented May 18, 2026 •

edited

Loading