Skip to content

Conversation

@BGamboa13
Copy link

Summary of Changes

This PR fixes a logic issue in AbstractJsonLexer where Unicode escape sequences (\uXXXX) could be parsed greedily. Previously, if a valid 4-digit unicode sequence was followed immediately by a character in the range [a-fA-F0-9], the lexer would incorrectly attempt to consume it as part of the hex sequence, leading to corruption or parsing errors.

Example of failure: Input: "\u00f3n" (intended: "ón")
Previous behavior: Parsed correctly.
Input: "\u00f3a" (intended: "óa")
Previous behavior: The lexer aggressively consumed 'a' as a 5th hex digit.

Technical Details

  1. RFC 8259 Compliance: The parser now enforces a strict limit of exactly 4 hex digits following \u, as required by the JSON specification.
  2. Performance Optimization: - Replaced the previous character conversion logic with a precomputed static HEX_TABLE.
    • This allows for O(1) lookups and removes branching/conditional overhead inside the loop.
  3. Safety: Added an explicit bounds check (currentPosition + 4 >= source.length) to prevent IndexOutOfBoundsException on malformed inputs ending abruptly.

Tests

  • Added testUnicodeEscapeWithFollowingHex to verify that \u00f3a is correctly parsed as the character ó followed by the character a.
  • Verified existing tests pass with the new optimized lookup table.

…greedy parsing

The JSON lexer was incorrectly parsing unicode escapes by consuming more
characters than the standard 4 hex digits specified in RFC 8259. This
could lead to incorrect parsing when valid hex characters followed a
unicode escape sequence.

Changes:
- Refactored appendHex() to strictly parse exactly 4 hex digits
- Replaced fromHexChar() with a fast O(1) lookup table (HEX_TABLE)
- Added explicit validation that fails on invalid hex digits
- Added test case to verify correct parsing of \u00f3a as 'óa'
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant