Skip to content

Conversation

@KristofferC
Copy link
Member

Explanation of bug written below by Claude :robot. Fixes JuliaLang/julia#60084


When parsing certain incomplete keywords like "do", the parser would crash
with InexactError: trunc(UInt32, -1) instead of returning a ParseError.

Root cause:

The peek_behind_pos function walks backwards through the flat output array,
subtracting byte_spans to calculate byte positions. However, when a non-terminal
trivia node contains child nodes, both the parent and children exist in the
output array with overlapping byte_spans. The old code would subtract the
byte_span of both the parent and its children, causing byte_idx to go negative.

Example with input "do":

The parser creates this output structure:
[1] TOMBSTONE (byte_span=0)
[2] "do" token (byte_span=2, is_trivia=true, covers bytes 1-2)
[3] error node (byte_span=2, is_trivia=true, node_span=1, covers bytes 1-2)
└── contains node [2] as a child

When peek_behind_pos walks backwards from next_byte=3:
Old behavior:
- Start: byte_idx = 3, node_idx = 3
- Skip error node [3]: byte_idx = 3 - 2 = 1, node_idx = 2
- Skip "do" node [2]: byte_idx = 1 - 2 = -1 ❌ (tries to convert to UInt32)

The problem: Both nodes cover the same bytes (1-2), so subtracting both
spans double-counts the same 2 bytes.

Solution:

When skipping a non-terminal node, also skip all its children without
subtracting their byte_spans, since the parent's byte_span already includes
them:

if is_non_terminal(node)
node_idx -= (1 + node.node_span) # Skip the node + all its children
else
node_idx -= 1
end

New behavior:

  • Start: byte_idx = 3, node_idx = 3
  • Skip error node [3]: byte_idx = 3 - 2 = 1, node_idx = 3 - (1 + 1) = 1
  • Stop at TOMBSTONE [1] (not trivia)
  • Return: byte_idx = 1, node_idx = 1 ✓

Why "do" was unique:

The "do" keyword is the only Julia keyword that:

  1. Gets marked as TRIVIA when appearing standalone (invalid syntax)
  2. Has an error node emitted that wraps it (also marked as TRIVIA)
  3. Continues parsing afterward, eventually calling peek_behind

Other incomplete keywords either parse successfully or fail earlier before
reaching the code path that calls peek_behind.

When parsing certain incomplete keywords like "do", the parser would crash
with `InexactError: trunc(UInt32, -1)` instead of returning a ParseError.

Root cause:
-----------
The peek_behind_pos function walks backwards through the flat output array,
subtracting byte_spans to calculate byte positions. However, when a non-terminal
trivia node contains child nodes, both the parent and children exist in the
output array with overlapping byte_spans. The old code would subtract the
byte_span of both the parent and its children, causing byte_idx to go negative.

Example with input "do":
------------------------
The parser creates this output structure:
  [1] TOMBSTONE (byte_span=0)
  [2] "do" token (byte_span=2, is_trivia=true, covers bytes 1-2)
  [3] error node (byte_span=2, is_trivia=true, node_span=1, covers bytes 1-2)
      └── contains node [2] as a child

When peek_behind_pos walks backwards from next_byte=3:
  Old behavior:
    - Start: byte_idx = 3, node_idx = 3
    - Skip error node [3]: byte_idx = 3 - 2 = 1, node_idx = 2
    - Skip "do" node [2]: byte_idx = 1 - 2 = -1 ❌ (tries to convert to UInt32)

  The problem: Both nodes cover the same bytes (1-2), so subtracting both
  spans double-counts the same 2 bytes.

Solution:
---------
When skipping a non-terminal node, also skip all its children without
subtracting their byte_spans, since the parent's byte_span already includes
them:

  if is_non_terminal(node)
      node_idx -= (1 + node.node_span)  # Skip the node + all its children
  else
      node_idx -= 1
  end

New behavior:
  - Start: byte_idx = 3, node_idx = 3
  - Skip error node [3]: byte_idx = 3 - 2 = 1, node_idx = 3 - (1 + 1) = 1
  - Stop at TOMBSTONE [1] (not trivia)
  - Return: byte_idx = 1, node_idx = 1 ✓

Why "do" was unique:
--------------------
The "do" keyword is the only Julia keyword that:
1. Gets marked as TRIVIA when appearing standalone (invalid syntax)
2. Has an error node emitted that wraps it (also marked as TRIVIA)
3. Continues parsing afterward, eventually calling peek_behind

Other incomplete keywords either parse successfully or fail earlier before
reaching the code path that calls peek_behind.
Copy link
Member

@Keno Keno left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fix seems non-controversial, although I'm not sure that non-terminal trivia are that good an idea, but that's a bit of a separate question. Given that we have them right now, this is the right fix.

@KristofferC KristofferC merged commit e78a222 into main Nov 9, 2025
36 checks passed
@KristofferC KristofferC deleted the kc/do_parse branch November 9, 2025 19:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

typing do in the repl causes Error: Error in the keymap

2 participants