Lexer String Optimization
Optimize the parsing of Unicode and Hexadecimal escape sequences in the lexer to avoid unnecessary heap allocations (Vec<u32> and String).
Proposed Changes
[Lexer]
[MODIFY] string.rs
Refactor the following methods to parse hex digits directly into a u32 accumulator:
-
take_unicode_escape_sequence:
- For the
\u{X..X} case:
- Remove
code_point_buf: Vec<u32>.
- Remove
s: String.
- Iterate with
cursor.next_char() until } or a non-hex digit is found.
- Accumulate the value in a
u32 with value = (value << 4) | digit.
- Limit the number of digits to 6 (as per the code point limit
0x10FFFF).
- For the
\uXXXX case:
- Remove
s: String.
- Parse the 4 hex digits directly into a
u32.
-
take_hex_escape_sequence:
- Remove
s: String.
- Parse the 2 hex digits directly into a
u32.
Verification Plan
Automated Tests
- Run
cargo test --core/parser/src/lexer/tests.rs to ensure all existing lexer tests pass.
- Add a new test case if needed to cover edge cases (e.g., max code point
0x10FFFF, empty \u{}).
Manual Verification
- None required beyond automated tests.