Skip to content

LL(2) parsing error: parser commits to subrule and is not able to get out #2094

@jitsedesmet

Description

@jitsedesmet

Hi! This library has been amazing so far. I've been trying to create a round tripping parser that includes round tripping in the syntax.
I am in need of a rule that parser everything that is otherwise skipped.

I came across the issue where the following grammar works in ANTLR4, but not in Chevrotain:

grammar Test;

compilationUnit
  : gramB gramD gramF EOF
  ;
gramB
  : gramC ( gramD gramE )*;
gramD
  : 'D'? ;
gramC: 'C' ;
gramE: 'E' ;
gramF: 'F' ;

I expect (and using ANTLR4 this works) to be in the grammar: CDF CDEDF (does not work in Chevrotain) but also CF and CDEF (does work in Chevrotain).
When using Chevrotain it looks like the parser get's stuck in the gramB rule forgetting that a parse of the gramD rule also allows him to continue in the compilationUnit rule.
I am fairly sure the grammar above is LL(2).
I wonder whether this is a mistake on my part, or whether this is a genuine bug. I have no issue with attempting a PR myself. Maybe you could give some pointers :D

(Issue is also present when the gramB rule uses gramC ( gramD gramE )?;)

Chevrotain code I used:

/* eslint-disable require-unicode-regexp */
import type { ParserMethod } from 'chevrotain';
import { createToken, EmbeddedActionsParser, Lexer } from 'chevrotain';
import { describe, it } from 'vitest';

export const lexC = createToken({ name: 'lexC', pattern: /c/i, label: 'c' });
export const lexD = createToken({ name: 'lexD', pattern: /d/i, label: 'd' });
export const lexE = createToken({ name: 'lexE', pattern: /e/i, label: 'e' });
export const lexF = createToken({ name: 'lexF', pattern: /f/i, label: 'f' });
const allTokens = [ lexC, lexD, lexE, lexF ];

const lexer: Lexer = new Lexer(allTokens, {
  positionTracking: 'onlyStart',
  recoveryEnabled: false,
  safeMode: true,
});

class MyParser extends EmbeddedActionsParser {
  public readonly gramB: ParserMethod<Parameters<() => void>, ReturnType<() => 'gramB'>>;
  public readonly gramD: ParserMethod<Parameters<() => void>, ReturnType<() => 'gramD'>>;
  public readonly gramMain: ParserMethod<Parameters<() => void>, ReturnType<() => 'gramMain'>>;

  public constructor() {
    super(allTokens);

    this.gramD = this.RULE('gramD', () => {
      this.OPTION(() => this.CONSUME(lexD));
      return <const> 'gramD';
    });

    this.gramB = this.RULE('gramB', () => {
      this.CONSUME(lexC);
      this.MANY(() => {
        this.SUBRULE(this.gramD, undefined);
        this.CONSUME(lexE);
      });
      return <const> 'gramB';
    });

    this.gramMain = this.RULE('main', () => {
      this.SUBRULE(this.gramB, undefined);
      this.SUBRULE(this.gramD, undefined);
      this.CONSUME(lexF);
      return <const> 'gramMain';
    });

    this.performSelfAnalysis();
  }
}

describe('bugTest', () => {
  const parser = new MyParser();
  function parse(query: string): string {
    const tokens = lexer.tokenize(query);
    if (tokens.errors.length > 0) {
      throw new Error(tokens.errors[0].message);
    }
    parser.input = tokens.tokens;
    const res = parser.gramMain();
    if (parser.errors.length > 0) {
      console.log(tokens.tokens);
      throw new Error(`Parse error on line ${parser.errors.map(x => x.token.startLine).join(', ')}
${parser.errors.map(x => `${x.token.startLine}: ${x.message}`).join('\n')}
${parser.errors.map(x => x.stack).join('\n')}`);
    }
    return res;
  }

  it('bug recreation', ({ expect }) => {
    // WORKS: 'CF' and 'CDEF'
    // DOESN'T WORK, but should?: 'CDF' 'CDEDF'
    const res = parse(`CDF`);
    expect(res).toEqual('gramMain');
  });
});

Source files: ANTLR4 and Chevrotain.

Using the Intellij Plugin for ANTLR 4 I get the following out or ANTLR4:
A parsetree and What I think is a confirmation that it is LL(2)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions