Skip to content

fix(es/minifier): Prevent convert_tpl_to_str when there's emoji under es5#11529

Merged
kdy1 merged 3 commits intomainfrom
02-04-fix/convert_tpl_to_str
Feb 5, 2026
Merged

fix(es/minifier): Prevent convert_tpl_to_str when there's emoji under es5#11529
kdy1 merged 3 commits intomainfrom
02-04-fix/convert_tpl_to_str

Conversation

@CPunisher
Copy link
Member

Description:
When the target environment is below ES2015, and non-BMP characters (like emojis) are encountered, we stop the conversion.

This is because:

  1. Tpl: 🦀 (may output directly in source code or require minimal escaping) -> shorter
  2. Str (in ES5 mode): \uD83E\uDD80 (escape sequence for surrogate pair) -> extremely long

Related issue (if exists):

#11415

@CPunisher CPunisher requested a review from a team as a code owner February 4, 2026 09:26
Copilot AI review requested due to automatic review settings February 4, 2026 09:26
@changeset-bot
Copy link

changeset-bot bot commented Feb 4, 2026

⚠️ No Changeset found

Latest commit: 1d9f078

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

@claude

This comment has been minimized.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adjusts Pure::convert_tpl_to_str to avoid converting template literals into string literals in ES5/ES3 targets when non-BMP characters would cause string codegen to expand into long surrogate-pair escapes.

Changes:

  • Add an early-exit in convert_tpl_to_str for targets < ES2015 when encountering non-BMP characters.
  • Document the rationale (template output can be shorter than ES5 string escape output).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@CPunisher CPunisher changed the title fix: convert_tpl_to_str should consider the codegen of escaped string fix(es/minifier): prevent convert_tpl_to_str when there's emoji under es5 Feb 4, 2026
@github-actions
Copy link
Contributor

github-actions bot commented Feb 4, 2026

Binary Sizes

File Size
swc.linux-x64-gnu.node 28M (28691784 bytes)

Commit: 2cee340

@codspeed-hq
Copy link

codspeed-hq bot commented Feb 4, 2026

Merging this PR will not alter performance

✅ 184 untouched benchmarks


Comparing 02-04-fix/convert_tpl_to_str (1d9f078) with main (3e63627)1

Open in CodSpeed

Footnotes

  1. No successful run was found on main (f523e4b) during the generation of this report, so 3e63627 was used instead as the comparison base. There might be some changes unrelated to this pull request in this report.

@kdy1
Copy link
Member

kdy1 commented Feb 4, 2026

@claude Add enormous amount of unit tests in the swc_ecma_minifier that actually invokes swc_ecma_codegen to verify all possibly problematic input cases

@kdy1 kdy1 added this to the Planned milestone Feb 4, 2026
@claude

This comment has been minimized.

@kdy1
Copy link
Member

kdy1 commented Feb 4, 2026

This PR looks good to me, but I'd prefer to have more e2e tests for this.

@kdy1 kdy1 changed the title fix(es/minifier): prevent convert_tpl_to_str when there's emoji under es5 fix(es/minifier): Prevent convert_tpl_to_str when there's emoji under es5 Feb 4, 2026
Add 36 unit tests that verify template-to-string conversion behavior
with various character types, specifically testing the fix for non-BMP
characters in ES5/ES3 targets.

Test categories:
- Emoji/non-BMP characters (ES5 vs ES2015)
- Unicode escape sequences (\u{1F980})
- Mathematical/musical symbols (U+1D400-U+1D7FF)
- CJK Extension B characters (U+20000+)
- BMP characters (Chinese, Japanese, Korean)
- Edge cases (empty templates, boundary chars, newlines)
- Compound emojis (flags, skin tones, ZWJ sequences)
- Ancient scripts (hieroglyphs, cuneiform)
- Private Use Area characters

Co-authored-by: Donny/강동윤 <kdy1@users.noreply.github.com>
@CLAassistant
Copy link

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you all sign our Contributor License Agreement before we can accept your contribution.
1 out of 2 committers have signed the CLA.

✅ CPunisher
❌ github-actions[bot]
You have signed the CLA already but the status is still pending? Let us recheck it.

@claude

This comment has been minimized.

@kdy1
Copy link
Member

kdy1 commented Feb 5, 2026

@claude fix ci

@claude

This comment has been minimized.

Inline format string variables in assert! macros to fix clippy::uninlined_format_args errors.

Co-authored-by: Donny/강동윤 <kdy1@users.noreply.github.com>
Copilot AI review requested due to automatic review settings February 5, 2026 08:54
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

@claude
Copy link
Contributor

claude bot commented Feb 5, 2026

PR Review: fix(es/minifier): Prevent convert_tpl_to_str when there's emoji under ES5

Summary

This PR adds a check to prevent template literal to string conversion when targeting ES5 (or earlier) and the template contains non-BMP characters (code points > U+FFFF, such as emojis). This is a correctness fix that prevents output size regression.


✅ What's Good

  1. Correct logic: The fix properly checks ch > '\u{ffff}' to detect non-BMP characters. This is the exact boundary - characters above U+FFFF require surrogate pairs in ES5 codegen.

  2. Well-documented comment: The comment in the code clearly explains why this check exists, which is helpful for future maintainers.

  3. Comprehensive tests: The PR includes 36 unit tests covering:

    • Various emoji types (single, flag, ZWJ sequences, skin tones)
    • BMP vs non-BMP boundary cases (U+FFFF vs U+10000)
    • Different scripts (CJK Extension B, hieroglyphs, cuneiform, mathematical symbols)
    • Mixed content scenarios
    • ES3/ES5/ES2015+ target comparisons
    • Size comparison validations
  4. Tests pass: All 36 new tests pass correctly.

  5. Code compiles and is properly formatted.


💭 Suggestions for Consideration

  1. Test helper has unused target parameter in some tests:
    The run_test function takes both ecma (compress option) and target (codegen option). While most tests set them to the same value, it might be worth adding a comment in the helper to clarify what each controls. The ecma parameter controls whether the minifier converts templates to strings, while target controls how the codegen outputs the result.

  2. Consider adding an execution test:
    Per crates/swc_ecma_minifier/AGENTS.md, execution tests (./scripts/exec.sh) can be valuable to verify runtime behavior. While the unit tests are comprehensive, an execution test could validate that the actual emitted code runs correctly in an ES5 environment.

  3. Minor: Consider early return placement:
    The non-BMP check happens inside the _ => branch of the character match. If this function processes very long strings with emojis, it might scan all characters before the emoji before returning. This is unlikely to be a performance concern in practice, but worth noting.


🔍 Edge Cases Verified

The tests cover these edge cases well:

  • U+FFFF (last BMP character) - should convert
  • U+10000 (first non-BMP character) - should NOT convert in ES5
  • Lone surrogates (invalid UTF-16) - handled
  • BMP CJK characters (Chinese, Japanese, Korean) - should convert
  • BMP symbols (arrows, etc.) - should convert

📋 Checklist

  • Code compiles without errors
  • Tests pass (36/36)
  • Code is properly formatted
  • Logic is correct (non-BMP check at the right boundary)
  • Comment explains the rationale
  • No security concerns
  • Follows repository conventions

Verdict

LGTM 👍

This is a well-implemented fix for a legitimate minification issue. The logic is sound, the tests are comprehensive, and the code is well-documented. The fix ensures that minifying code with emojis targeting ES5 won't produce longer output than the original.

Copy link
Member

@kdy1 kdy1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

@kdy1 kdy1 merged commit ff6cf88 into main Feb 5, 2026
187 of 188 checks passed
@kdy1 kdy1 deleted the 02-04-fix/convert_tpl_to_str branch February 5, 2026 09:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

3 participants