Create AsyncByteBufferEscapedSurrogateInFieldName1581Test.java by pjfanning · Pull Request #1583 · FasterXML/jackson-core

pjfanning · 2026-03-27T01:23:37Z

backported #1582
issue #1581

Copilot suggests:

Now let me look at the actual source of the bug — the _addName method in NonBlockingJsonParserBase on the 2.21 branch, and the corresponding fix in the 3.x branch.Here's a thorough analysis of why the test in PR #1583 works on the 3.x branch but fails on 2.21, and what fix is needed.

Root Cause

The 3.x branch introduced a dedicated field _pendingSurrogateInName in NonBlockingJsonParserBase (since 3.1), along with a _finishPropertyWithEscape() method that explicitly handles surrogate pairs split across chunk boundaries:

/**
 * High surrogate code point awaiting matching low surrogate during
 * property name parsing, or 0 if none pending.
 *
 * @since 3.1
 */
protected int _pendingSurrogateInName;

protected final JsonToken _finishPropertyWithEscape() throws JacksonException
{
    int ch;

    // [jackson-core#1541]: Check if we have a pending high surrogate
    if (_pendingSurrogateInName != 0) {
        // We have a high surrogate saved, now need to decode the low surrogate escape
        if (_quotedDigits == -2) {
            // Need to read the backslash first
            if (_inputPtr >= _inputEnd) {
                return JsonToken.NOT_AVAILABLE;
            }
            int b = getNextUnsignedByteFromBuffer();
            if (b != INT_BACKSLASH) {
                _reportError("...");
            }
            _quotedDigits = -1;
            _quoted32 = 0;
        }
        ch = _decodeSplitEscaped(_quoted32, _quotedDigits);
        if (ch < 0) {
            _minorState = MINOR_PROPERTY_NAME_ESCAPE;
            return JsonToken.NOT_AVAILABLE;
        }
        ch = _decodeSurrogate(_pendingSurrogateInName, ch);
        _pendingSurrogateInName = 0;
    } else {
        ch = _decodeSplitEscaped(_quoted32, _quotedDigits);
        if (ch < 0) {
            _minorState = MINOR_PROPERTY_NAME_ESCAPE;
            return JsonToken.NOT_AVAILABLE;
        }
        // [jackson-core#1541]: Check if decoded value is a high surrogate
        if (ch >= 0xD800 && ch <= 0xDBFF) {
            _pendingSurrogateInName = ch;
            _quoted32 = 0;
            _quotedDigits = -2;
            _minorState = MINOR_PROPERTY_NAME_ESCAPE;
            return _finishPropertyWithEscape();
        } else if (ch >= 0xDC00 && ch <= 0xDFFF) {
            _reportUnexpectedLowSurrogate(ch);
        }
    }
    // ... encode ch into quads and continue name parsing

The 2.21 branch has none of this. Its _finishFieldWithEscape() in NonBlockingUtf8JsonParserBase decodes the escape character and directly re-encodes it as UTF-8 bytes into the quad buffer — but it has no concept of a pending high surrogate. When a \uD834\uDD1e surrogate pair appears in a field name, the first escape (\uD834) is decoded and the code attempts to re-encode it as a 3-byte UTF-8 sequence. This hits the surrogate check in _addName() at line 773:

// [jackson-core#363]: Surrogates (0xD800 - 0xDFFF) are illegal in UTF-8 for 3-byte sequences
if (needed == 2) {
    if (ch >= 0xD800 && ch <= 0xDFFF) {
        _reportInvalidUTF8Surrogate(ch);
    }

...throwing Invalid UTF-8: Illegal surrogate character.

Suggested Fix for 2.21

You need to backport the same surrogate-pair handling logic from 3.x into NonBlockingUtf8JsonParserBase on the 2.21 branch. The key changes are:

1. Add `_pendingSurrogateInName` field to `NonBlockingJsonParserBase`

// After _quotedDigits:
protected int _quotedDigits;

// ADD THIS:
/**
 * High surrogate code point awaiting matching low surrogate during
 * field name parsing, or 0 if none pending.
 * [jackson-core#1581]
 */
protected int _pendingSurrogateInName;

2. Fix `_finishFieldWithEscape()` in `NonBlockingUtf8JsonParserBase`

The existing _finishFieldWithEscape() (called from the MINOR_FIELD_NAME_ESCAPE state for both NonBlockingByteArrayParser and NonBlockingByteBufferParser) must be updated to:

When a decoded char is a high surrogate (0xD800–0xDBFF): save it in _pendingSurrogateInName, set _quotedDigits = -2 as a signal that the next thing to read is a \ for the low surrogate, suspend with NOT_AVAILABLE.
When resuming with _pendingSurrogateInName != 0: read/decode the low surrogate escape, then combine the pair using UTF-16 surrogate math into a supplementary code point, then encode that as a 4-byte UTF-8 sequence into the quad buffer.

Here's the essential structure of the fix (mirroring the 3.x _finishPropertyWithEscape()):

protected final JsonToken _finishFieldWithEscape() throws IOException
{
    int ch;

    // [jackson-core#1581]: Handle pending high surrogate
    if (_pendingSurrogateInName != 0) {
        if (_quotedDigits == -2) {
            // Waiting for the backslash of the low surrogate escape
            if (_inputPtr >= _inputEnd) {
                return JsonToken.NOT_AVAILABLE;
            }
            int b = getNextUnsignedByteFromBuffer();
            if (b != INT_BACKSLASH) {
                _reportError("Broken surrogate pair in field name: expected '\\\\' to start low surrogate, got 0x"
                        + Integer.toHexString(b));
            }
            _quotedDigits = -1;
            _quoted32 = 0;
        }
        ch = _decodeSplitEscaped(_quoted32, _quotedDigits);
        if (ch < 0) {
            _minorState = MINOR_FIELD_NAME_ESCAPE;
            return JsonToken.NOT_AVAILABLE;
        }
        // Combine high + low surrogate into supplementary code point
        int highSurrogate = _pendingSurrogateInName;
        _pendingSurrogateInName = 0;
        if (ch < 0xDC00 || ch > 0xDFFF) {
            _reportError("Broken surrogate pair in field name: high surrogate 0x"
                    + Integer.toHexString(highSurrogate)
                    + " not followed by valid low surrogate, got 0x"
                    + Integer.toHexString(ch));
        }
        ch = 0x10000 + ((highSurrogate - 0xD800) << 10) + (ch - 0xDC00);
    } else {
        ch = _decodeSplitEscaped(_quoted32, _quotedDigits);
        if (ch < 0) {
            _minorState = MINOR_FIELD_NAME_ESCAPE;
            return JsonToken.NOT_AVAILABLE;
        }
        // [jackson-core#1581]: high surrogate — save and wait for low surrogate
        if (ch >= 0xD800 && ch <= 0xDBFF) {
            _pendingSurrogateInName = ch;
            _quoted32 = 0;
            _quotedDigits = -2;
            _minorState = MINOR_FIELD_NAME_ESCAPE;
            return _finishFieldWithEscape(); // recurse to attempt reading low surrogate
        } else if (ch >= 0xDC00 && ch <= 0xDFFF) {
            _reportError("Unexpected low surrogate character (0x"
                    + Integer.toHexString(ch) + ") in field name");
        }
    }

    // Now encode `ch` into quads as UTF-8
    // ... (existing quad-encoding logic, extended to handle 4-byte / supplementary code points)

3. Extend the quad UTF-8 encoding to handle supplementary characters

In 2.21's _finishFieldWithEscape(), the existing UTF-8 re-encoding only handles up to 3-byte sequences (i.e., BMP characters). You need to add the 4-byte case for supplementary code points (>= 0x10000):

if (ch > 127) {
    if (currQuadBytes >= 4) {
        quads[qlen++] = currQuad;
        currQuad = 0;
        currQuadBytes = 0;
    }
    if (ch < 0x800) { // 2-byte
        currQuad = (currQuad << 8) | (0xc0 | (ch >> 6));
        ++currQuadBytes;
    } else if (ch < 0x10000) { // 3 bytes (BMP, non-surrogate)
        currQuad = (currQuad << 8) | (0xe0 | (ch >> 12));
        ++currQuadBytes;
        if (currQuadBytes >= 4) { quads[qlen++] = currQuad; currQuad = 0; currQuadBytes = 0; }
        currQuad = (currQuad << 8) | (0x80 | ((ch >> 6) & 0x3f));
        ++currQuadBytes;
    } else { // 4 bytes (supplementary, from surrogate pair)
        currQuad = (currQuad << 8) | (0xf0 | (ch >> 18));
        ++currQuadBytes;
        if (currQuadBytes >= 4) { quads[qlen++] = currQuad; currQuad = 0; currQuadBytes = 0; }
        currQuad = (currQuad << 8) | (0x80 | ((ch >> 12) & 0x3f));
        ++currQuadBytes;
        if (currQuadBytes >= 4) { quads[qlen++] = currQuad; currQuad = 0; currQuadBytes = 0; }
        currQuad = (currQuad << 8) | (0x80 | ((ch >> 6) & 0x3f));
        ++currQuadBytes;
    }
    ch = 0x80 | (ch & 0x3f); // last byte, falls through to common output below
}

Also make sure _parseEscapedName() (the inline fast path) has the same surrogate-pair detection — check if the decoded escape char is a high surrogate, save it in _pendingSurrogateInName, and suspend — mirroring what _finishFieldWithEscape() does.

Summary

	`3.x`	`2.21` (current)	`2.21` (after fix)
`_pendingSurrogateInName` field	✅	❌	✅ add it
`_finishPropertyWithEscape` handles surrogate pairs	✅	❌	✅ backport logic
4-byte UTF-8 encoding in quad buffer	✅	❌	✅ add 4-byte case
`_parseEscapedName` inline surrogate detection	✅	❌	✅ add guard

The test in PR #1583 will pass on 2.21 once these production-code changes are in place. The test itself is correct and complete — it just needs the underlying parser fix to land alongside it.

Create AsyncByteBufferEscapedSurrogateInFieldName1581Test.java

c214e1a

pjfanning mentioned this pull request Mar 27, 2026

Add NonBlockingByteBufferParser tests for JSON-escaped surrogate pairs in field names (#1581) #1582

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Create AsyncByteBufferEscapedSurrogateInFieldName1581Test.java#1583

Create AsyncByteBufferEscapedSurrogateInFieldName1581Test.java#1583
pjfanning wants to merge 1 commit intoFasterXML:2.21from
pjfanning:non-blocking-test

pjfanning commented Mar 27, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

pjfanning commented Mar 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Root Cause

Suggested Fix for 2.21

1. Add _pendingSurrogateInName field to NonBlockingJsonParserBase

2. Fix _finishFieldWithEscape() in NonBlockingUtf8JsonParserBase

3. Extend the quad UTF-8 encoding to handle supplementary characters

Summary

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

pjfanning commented Mar 27, 2026 •

edited

Loading

1. Add `_pendingSurrogateInName` field to `NonBlockingJsonParserBase`

2. Fix `_finishFieldWithEscape()` in `NonBlockingUtf8JsonParserBase`