Create AsyncByteBufferEscapedSurrogateInFieldName1581Test.java#1583
Open
pjfanning wants to merge 1 commit intoFasterXML:2.21from
Open
Create AsyncByteBufferEscapedSurrogateInFieldName1581Test.java#1583pjfanning wants to merge 1 commit intoFasterXML:2.21from
pjfanning wants to merge 1 commit intoFasterXML:2.21from
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
backported #1582
issue #1581
Copilot suggests:
Now let me look at the actual source of the bug — the
_addNamemethod inNonBlockingJsonParserBaseon the 2.21 branch, and the corresponding fix in the 3.x branch.Here's a thorough analysis of why the test in PR #1583 works on the3.xbranch but fails on2.21, and what fix is needed.Root Cause
The
3.xbranch introduced a dedicated field_pendingSurrogateInNameinNonBlockingJsonParserBase(since 3.1), along with a_finishPropertyWithEscape()method that explicitly handles surrogate pairs split across chunk boundaries:The
2.21branch has none of this. Its_finishFieldWithEscape()inNonBlockingUtf8JsonParserBasedecodes the escape character and directly re-encodes it as UTF-8 bytes into the quad buffer — but it has no concept of a pending high surrogate. When a\uD834\uDD1esurrogate pair appears in a field name, the first escape (\uD834) is decoded and the code attempts to re-encode it as a 3-byte UTF-8 sequence. This hits the surrogate check in_addName()at line 773:...throwing
Invalid UTF-8: Illegal surrogate character.Suggested Fix for 2.21
You need to backport the same surrogate-pair handling logic from 3.x into
NonBlockingUtf8JsonParserBaseon the2.21branch. The key changes are:1. Add
_pendingSurrogateInNamefield toNonBlockingJsonParserBase2. Fix
_finishFieldWithEscape()inNonBlockingUtf8JsonParserBaseThe existing
_finishFieldWithEscape()(called from theMINOR_FIELD_NAME_ESCAPEstate for bothNonBlockingByteArrayParserandNonBlockingByteBufferParser) must be updated to:0xD800–0xDBFF): save it in_pendingSurrogateInName, set_quotedDigits = -2as a signal that the next thing to read is a\for the low surrogate, suspend withNOT_AVAILABLE._pendingSurrogateInName != 0: read/decode the low surrogate escape, then combine the pair using UTF-16 surrogate math into a supplementary code point, then encode that as a 4-byte UTF-8 sequence into the quad buffer.Here's the essential structure of the fix (mirroring the
3.x_finishPropertyWithEscape()):3. Extend the quad UTF-8 encoding to handle supplementary characters
In
2.21's_finishFieldWithEscape(), the existing UTF-8 re-encoding only handles up to 3-byte sequences (i.e., BMP characters). You need to add the 4-byte case for supplementary code points (>= 0x10000):Also make sure
_parseEscapedName()(the inline fast path) has the same surrogate-pair detection — check if the decoded escape char is a high surrogate, save it in_pendingSurrogateInName, and suspend — mirroring what_finishFieldWithEscape()does.Summary
3.x2.21(current)2.21(after fix)_pendingSurrogateInNamefield_finishPropertyWithEscapehandles surrogate pairs_parseEscapedNameinline surrogate detectionThe test in PR #1583 will pass on
2.21once these production-code changes are in place. The test itself is correct and complete — it just needs the underlying parser fix to land alongside it.