[ada2012] Implement disambiguating semantics#4755
Open
kaby76 wants to merge 32 commits intoantlr:masterfrom
Open
[ada2012] Implement disambiguating semantics#4755kaby76 wants to merge 32 commits intoantlr:masterfrom
kaby76 wants to merge 32 commits intoantlr:masterfrom
Conversation
…pression, lexer character literal handling - Add missing SEMI terminators to 11 parser rules: subtype_declaration, generic_instantiation (3 alternatives), record_representation_clause, number_declaration, abstract_subprogram_declaration, goto_statement, procedure_call_statement (1st alternative), delay_until_statement, delay_relative_statement, abort_statement, formal_object_declaration (2nd alternative) - Fix record_representation_clause: remove incorrect optional marker on END RECORD - Fix subtype_mark to accept dotted names (identifier (DOT identifier)*) instead of just identifier, allowing qualified names like PACK2.ACC1 - Fix qualified_expression: remove extra parentheses around aggregate alternative, matching Ada RM definition - Fix formal_signed_integer_type_definition: use RANGE_ token instead of range parser rule - Merge '!' into VL token to support Ada 83 obsolescent replacement character (RM J.2) - Add lexer predicate IsCharLiteralAllowed() to disambiguate character literals from attribute ticks, using AdaLexerBase superclass to track previous token - Add AdaLexerBase implementations for all targets: CSharp, Java, Python3, Dart, JavaScript, TypeScript, Antlr4ng, Cpp, Go - Add transformGrammar.py for Python3, Antlr4ng, Cpp, and Go targets - Add the ACATS.
The Ada RM defines pragma placement rules in prose, not in EBNF productions. Rather than deviating from the RM by inserting pragma into multiple parser rules, this uses a two-pass approach: the lexer tokenizes pragma content onto a separate PRAGMA_CHANNEL via a lexer mode, the main parser ignores them, and a ParsePragmas() action in the compilation rule triggers second-pass parsing of each pragma.
- AdaLexer.g4: Add PRAGMA_CHANNEL, PRAGMA_MODE with token rules
that map back to existing token types via type()
- AdaParser.g4: Add superClass=AdaParserBase, ParsePragmas() action
in compilation, pragmaRule and pragma_argument_association per RM 2.8
- Create AdaParserBase for all 9 targets (CSharp, Java, Python3,
Dart, JavaScript, TypeScript, Antlr4ng, Cpp, Go)
- Update transformGrammar.py for Go (parser this.->p.), Cpp and
Antlr4ng (parser header includes)
- Fixes parse failure on ba/ba1020f1.ada (PRAGMA ELABORATE)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…on3, TypeScript Target-specific API fixes for the two-pass pragma parsing: - Antlr4ng: errorListeners -> getErrorListeners() - Cpp: getErrorListeners() loop -> addErrorListener(&getErrorListenerDispatch()) - Dart: getTokens() returns nullable List<Token>?, add ?? [] fallback - Go: no SetLine/SetColumn, no ListTokenSource, no GetChildren() on ErrorListener; replaced with custom pragmaTokenSource embedding BaseLexer, use GetSource() from original tokens, use dispatch proxy - JavaScript: CommonToken(null,...) crashes, ListTokenSource missing; use t.source from original token, add SimpleTokenSource class - Python3: CommonToken not in antlr4 wildcard import, add explicit import from antlr4.Token - TypeScript: fill()/tokens on CommonTokenStream not BufferedTokenStream, ListTokenSource missing, _listeners not in type decls; cast to CommonTokenStream, add SimpleTokenSource class, use as-any casts Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
caseInsensitive=true already maps lowercase to uppercase, so [A-Za-z] and [0-9A-Fa-f] and [Ee] produce duplicate character warnings. Simplified to [A-Z], [0-9A-F], and [E]. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
● Support Ada 83 obsolescent lexer forms (RM Annex J.2)
- BASED_LITERAL: accept ':' as alternative delimiter to '#' (e.g. 2:10:)
- CHARACTER_LITERAL_: allow apostrophe and backslash as the enclosed
character (~['\\\r\n] -> ~[\r\n]), matching RM 2.6 graphic_character
- STRING_LITERAL_: accept '%' as alternative string delimiter with
'%%' for embedded percent (e.g. %%%%%345%)
- Apply same CHARACTER_LITERAL_ fix to PRAGMA_CHAR_LITERAL in
PRAGMA_MODE
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Ada is case-insensitive, so T'DELTA, T'delta, and T'Delta are all valid. The lexer matches DELTA/DIGITS/MOD/ACCESS as keyword tokens regardless of case, but attribute_designator only accepted the title-case-only tokens (DELTA__, DIGITS__, etc.). Add the keyword tokens ACCESS, DELTA, DIGITS, MOD as alternatives so attributes work in any casing. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…ts. Fix block_statement scope underflow. Add TypeClassification, Symbol, SymbolTable data structures and update AdaParserBase with IsAggregate/IsTypeName/EnterDeclaration/EnterScope/ExitScope/PushExpectedType/ PopExpectedType/OutputSymbolTable for the remaining 5 targets. Fix block_statement grammar rule to always call EnterScope before the optional DECLARE clause, preventing scope stack underflow on blocks without DECLARE. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…context, not array Each alternative of generic_instantiation has at most one defining_program_unit_name and one defining_designator. The ANTLR4 generated accessor returns a single context, not an array. Fixed in CSharp, Java, Cpp, Dart, and Go targets. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Package spec symbols (types, subtypes, etc.) must be visible in the corresponding package body and to users of the package. The EnterScope/ ExitScope in package_specification was discarding these symbols, causing qualified_expression like ARR2'(A1) to fail because ARR2 was no longer resolved as a type name. Fixes a74205e.ada parse failure. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…daParser members In antlr4ng, context classes are top-level exports from AdaParser.ts, not nested static members of the AdaParser class. Import all 39 context classes directly and reference them without the AdaParser. prefix. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…Source().GetTokenSource() The Go ANTLR4 runtime's TokenSourceCharStreamPair has unexported fields. Use GetTokenSource() directly on the token to get the source name. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- TypeScript & Antlr4ng: replace instanceof with ruleIndex type guards to avoid tsx module identity issues (matching c grammar pattern) - JavaScript: add missing Symbol import, remove broken await import Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The Python antlr4 runtime's FileStream lacks sourceName, causing AttributeError in _define_symbol. Wrap in try/except. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
When parsing a file with `with Foo;`, the parser automatically locates and parses `foo.ads` to import its visible symbols into the current symbol table. Supports --I<path> for search paths, caches parsed packages, and detects cycles. Updated readme with disambiguation docs and CLI options. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add with-clause import support to Java, Go, Python3, TypeScript, JavaScript, Dart, Antlr4ng, and Cpp targets. Each port includes: - GetExportedSymbols() in SymbolTable - ImportWithClause() with cache, cycle detection, and auto-detection of current file from token stream - --I<path> search path parsing (where applicable) - GNAT file naming convention (dots to hyphens, lowercase, .ads) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The Dart ANTLR4 runtime generates names() (plural) for the list accessor of repeated rule references, not name() (which takes an index parameter). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…etection GetTokenSource().GetSourceName() returns the lexer grammar name (e.g., "AdaLexer.g4"), not the input file. Use GetInputStream().GetSourceName() instead, which returns the actual file path from the CharStream. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
tokenSource.sourceName is undefined in the JS ANTLR4 runtime. Use tokenSource.inputStream.name instead, which holds the CharStream's name property set by the test harness. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
tokenSource.sourceName raises AttributeError in the Python3 ANTLR4 runtime (FileStream has no sourceName attribute). Use tokenSource.inputStream.name instead, and filter out "<empty>" default. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Two fixes:
1. Use tokenSource.inputStream.name instead of tokenSource.sourceName
for file auto-detection (sourceName is undefined in antlr4 JS runtime)
2. Replace require("antlr4") with ESM imports (CharStream, CommonTokenStream)
to fix "require is not defined" error in ESM context
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Updated runtime information for examples in README.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR fixes #4754, addressing (partially) the inherent syntactic ambiguity in the grammar. As a first step to address ambiguity, this PR adds disambiguating predicates. To handle
withstatements in Ada, the parser now reads package .ads files. Options are provided to specify the search path for Ada packages, and debug the symbol table constructed for disambiguation.Almost all ports are supported. I didn't bother with PHP because it has serious bugs.
A new readme.md file has been written for a full description of the grammar. It's modeled after the one for the C grammar.