Skip to content

Can UTF8StreamJsonParser#finishString be made faster with VarHandles? #929

@jpountz

Description

@jpountz

I was looking at UTF8StreamJsonParser#finishString which seems to mostly consist of scanning for a trailing quote, with an optimized code path for ASCII strings. This optimized code path for ASCII strings could be further optimized by using VarHandles to compare multiple bytes at once.

The current code looks like this:

        while (ptr < max) {
            int c = (int) inputBuffer[ptr] & 0xFF;
            if (codes[c] != 0) {
                if (c == INT_QUOTE) {
                    _inputPtr = ptr+1;
                    _textBuffer.setCurrentLength(outPtr);
                    return;
                }
                break;
            }
            ++ptr;
            outBuf[outPtr++] = (char) c;
        }

and could become something like:

private static final VarHandle VH_LE_LONG =
      MethodHandles.byteArrayViewVarHandle(long[].class, ByteOrder.LITTLE_ENDIAN);

[...]

        while (ptr <= max - Long.BYTES) {
            long next8Bytes = VH_LE_LONG.get(inputBuffer, ptr);
            if ((next8Bytes & 0x8080808080808080L) != 0) {
                // At least one of the bytes has the higher bit set, this is not a pure ASCII string
                break;
            }
            if (hasValue(next8Bytes, INT_QUOTE)) { // Implement hasValue based on https://graphics.stanford.edu/~seander/bithacks.html#ValueInWord
                // one of the next 8 bytes is a quote
                break;
            }
            // Maybe this loop can become unnecessary via https://github.com/FasterXML/jackson-core/issues/910
            for (int i = 0; i < Long.BYTES; ++i) {
              outBuf[outPtr + i] = (char) inputBuffer[ptr + i];
            }
            ptr += Long.BYTES;
            outPtr += Long.BYTES;
        }
        while (ptr < max) {
            int c = (int) inputBuffer[ptr] & 0xFF;
            if (codes[c] != 0) {
                if (c == INT_QUOTE) {
                    _inputPtr = ptr+1;
                    _textBuffer.setCurrentLength(outPtr);
                    return;
                }
                break;
            }
            ++ptr;
            outBuf[outPtr++] = (char) c;
        }

As VarHandles were introduced in Java 9, this would require releasing a MR JAR or bumping the min required version.

I haven't had a chance to measure if it made a significant difference but wanted to log the idea in case it gets someone's attention.

Metadata

Metadata

Assignees

No one assigned

    Labels

    3.xIssues to be only tackled for Jackson 3.x, not 2.x

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions