Skip to content

✨ Support new SNBT Unicode escapes from 25w09a#1953

Open
Calverin wants to merge 5 commits intoSpyglassMC:mainfrom
Calverin:complex-unicode-escape-sequences
Open

✨ Support new SNBT Unicode escapes from 25w09a#1953
Calverin wants to merge 5 commits intoSpyglassMC:mainfrom
Calverin:complex-unicode-escape-sequences

Conversation

@Calverin
Copy link
Member

@Calverin Calverin commented Jan 21, 2026

Adds support in SNBT Strings for:

  • 2 digit Unicode sequences - \x0A
  • 8 digit Unicode sequences - \U0001F525
  • Named Unicode characters - \N{Fire} (validated against this list with control characters using their secondary name and any character with parentheses in the name being omitted)

Checks a box in #1771

2 digit escapes with \x## and 8 digit escapes with \U########
For example, `\N{Snowman}`

Does not currently check if it's an actual character name.
@Calverin Calverin requested review from MulverineX and misode January 21, 2026 21:22
@Calverin Calverin self-assigned this Jan 21, 2026
Added a bundled Unicode lookup table for validating names, originally fetched from https://unicode.org/Public/UNIDATA/UnicodeData.txt, and added a separate extendedUnicode option for the string parser
@Calverin Calverin requested review from SPGoding and misode January 27, 2026 19:40
@Calverin Calverin changed the title ✨ Support new Unicode escapes from 25w09a ✨ Support new SNBT Unicode escapes from 25w09a Jan 27, 2026
Copy link
Member

@SPGoding SPGoding left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add some test cases for these new escape sequences? A few cases off the top of my mind:

  1. Using them when extendedUnicode is disabled
  2. Using them when extendedUnicode is enabled
  3. Testing various syntax errors you can have with them

And would it be possible to commit the script you used to generate the lookup table as well so we can re-run it in the future? I would prefer it if it's put under somewhere like packages/core/scripts to keep it separate from the production source code but if the script is in TypeScript you might need to update packages/core/tsconfig.json to add a project reference to it along with src and test and create a new packages/core/scripts/tsconfig.json (just putting { "extends": "../../tsconfig-type-strip" } as its content should ba fine).

Copy link
Member

@misode misode left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given some of the discussion on discord, are we now sure that this JSON file 100% matches what the vanilla game accepts?

options.escapable.unicode
&& (c2 === 'u' || (options.escapable.extendedUnicode && UnicodeEscapeChar.is(c2)))
) {
const sequenceLength = UnicodeEscapeLengths.get(c2) || 4
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's more common to use ?? 4 for default values.

cStart = src.cursor
continue
}
const name = src.peekUntil('}')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minecraft actually allows whitespace between the curly braces and the name. So you might need to trim the name or add some src.skipSpace calls.

})
ans.value += c2
} else if (
/^[-a-zA-Z0-9 ]+$/.test(name) && UnicodeLookupTable.has(name.toLowerCase())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of doing a has() and then get()! separately, I would use a single get() and check if the result is not undefined.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants