-
Notifications
You must be signed in to change notification settings - Fork 775
Description
class in tokenizer.ts uses single integer variables (this.paren and this.curly) to track the index of the most recent opening token. This logic is used by isRegexStart() to determine if a forward slash (/) is a Regular Expression literal or a Division operator by inspecting the token preceding the current block.
However, because these variables are simply overwritten when a nested group is encountered—and never restored when that group closes—the tokenizer loses track of the outer context.
The Issue:
- When an outer
(is encountered,this.parenis set to its index. - When a nested
(is encountered,this.parenis overwritten with the new index. The reference to the outer(is lost. - When the nested group closes
), no state restoration occurs. - When the outer group closes
),this.parenstill refers to the inner start index.
Reproduction:
Consider the following valid JavaScript. The / following the if condition should be parsed as the start of a Regex literal.
// The condition includes a nested grouping (function call)
if (isValid(x)) /abc/.test(x);Expected Behavior:
The tokenizer sees the ) closing the if statement. It looks back at the matching (. It sees the if keyword preceding it. It determines that /abc/ is a Regex.
Actual Behavior:
this.parenis initially set to the index of the(afterif.this.parenis overwritten by the index of the(afterisValid.- When the tokenizer reaches the
/, it looks back using the current value ofthis.paren(the inner parenthesis). - It checks the token preceding that inner index:
isValid(an Identifier). - Standard grammar rules suggest that an identifier followed by a parenthesized group implies a function call, and a slash following that implies division (e.g.
fn() / 2). - The tokenizer incorrectly identifies
/abc/as a series of division operators and identifiers, likely causing a parse error later.
Proposed Fix:
We should add stacks to the Reader class to maintain the history of open delimiters. We can keep this.paren and this.curly as the properties used by isRegexStart, but they should be updated by popping from these stacks.
1. Update Reader properties and constructor:
class Reader {
readonly values: ReaderEntry[];
curly: number;
paren: number;
// Add stacks to track nesting history
curlyStack: number[];
parenStack: number[];
constructor() {
this.values = [];
this.curly = this.paren = -1;
this.curlyStack = [];
this.parenStack = [];
}
// ...2. Update Reader.push to manage the stack:
push(token): void {
if (token.type === Token.Punctuator || token.type === Token.Keyword) {
if (token.value === '{') {
this.curlyStack.push(this.values.length);
} else if (token.value === '(') {
this.parenStack.push(this.values.length);
} else if (token.value === '}') {
// Pop the stack to restore context to the matching opener
const index = this.curlyStack.pop();
this.curly = (index !== undefined) ? index : -1;
} else if (token.value === ')') {
// Pop the stack to restore context to the matching opener
const index = this.parenStack.pop();
this.paren = (index !== undefined) ? index : -1;
}
this.values.push(token.value);
} else if (token.type === Token.Template && !token.tail) {
this.values.push(null);
// Template head/middle acts as a curly brace opener
this.curlyStack.push(this.values.length - 1);
} else {
this.values.push(null);
}
}This handles two other edge cases not considered by the existing code:
1. Curly Brackets (curlyStack): Object Literals vs. Code Blocks
The fundamental ambiguity the Reader tries to resolve is whether a closing } marks the end of an Object Literal (an expression value) or a Code Block (a statement).
- The Scenario:
fn() { return { a: 1 }; } /regex/ - The Ambiguity:
- If the
/follows an Object Literal (e.g.,x = { a: 1 } / 2), it is a Division operator. - If the
/follows a Code Block (e.g.,if (x) { ... } /regex/), it is the start of a Regex literal.
- If the
- The Failure:
- Without a stack, when the inner object literal
{ a: 1 }closes, thecurlyvariable points to the inner{. - When the function body closes next, the
curlyvariable still points to the inner{(because it was never restored). - The tokenizer looks back from the inner
{, seesreturn(or=), and incorrectly concludes: "This was an Object Literal. The next token/must be Division." - Result: Syntax Error on valid code.
- Without a stack, when the inner object literal
2. Template Literals (${): The Implicit Bracket
Template expressions (e.g., `value: ${expr}`) introduce an interpolation scope that behaves syntactically like a parenthesized group or block.
- The Issue: The token sequence
${acts as an opening delimiter, but it is closed by a standard}Punctuator. - The Failure:
- If we do not push the
${position onto thecurlyStack, the subsequent}could blindly pop the parent scope's entry from the stack. - This corrupts the state for the remainder of the file. The tokenizer will think it has closed a block that it hasn't, or will underflow the stack.
- If we do not push the
- Relevance: Inside an interpolation, we effectively restart the expression parser.
a =${ {a:1} / 2 }``. We must correctly identify that the inner{matches the inner `}` so we can determine that `/` is a division operator inside the template.
The second edge case is more a consequence of using a stack for curly bracket management in the first place, but that stack is essential to maintaining correct code state