[ada2012] Implement disambiguating semantics by kaby76 · Pull Request #4755 · antlr/grammars-v4

kaby76 · 2026-02-16T18:32:16Z

This PR fixes #4754, addressing (partially) the inherent syntactic ambiguity in the grammar. As a first step to address ambiguity, this PR adds disambiguating predicates. To handle with statements in Ada, the parser now reads package .ads files. Options are provided to specify the search path for Ada packages, and debug the symbol table constructed for disambiguation.

Almost all ports are supported. I didn't bother with PHP because it has serious bugs.

A new readme.md file has been written for a full description of the grammar. It's modeled after the one for the C grammar.

…pression, lexer character literal handling - Add missing SEMI terminators to 11 parser rules: subtype_declaration, generic_instantiation (3 alternatives), record_representation_clause, number_declaration, abstract_subprogram_declaration, goto_statement, procedure_call_statement (1st alternative), delay_until_statement, delay_relative_statement, abort_statement, formal_object_declaration (2nd alternative) - Fix record_representation_clause: remove incorrect optional marker on END RECORD - Fix subtype_mark to accept dotted names (identifier (DOT identifier)*) instead of just identifier, allowing qualified names like PACK2.ACC1 - Fix qualified_expression: remove extra parentheses around aggregate alternative, matching Ada RM definition - Fix formal_signed_integer_type_definition: use RANGE_ token instead of range parser rule - Merge '!' into VL token to support Ada 83 obsolescent replacement character (RM J.2) - Add lexer predicate IsCharLiteralAllowed() to disambiguate character literals from attribute ticks, using AdaLexerBase superclass to track previous token - Add AdaLexerBase implementations for all targets: CSharp, Java, Python3, Dart, JavaScript, TypeScript, Antlr4ng, Cpp, Go - Add transformGrammar.py for Python3, Antlr4ng, Cpp, and Go targets - Add the ACATS.

The Ada RM defines pragma placement rules in prose, not in EBNF productions. Rather than deviating from the RM by inserting pragma into multiple parser rules, this uses a two-pass approach: the lexer tokenizes pragma content onto a separate PRAGMA_CHANNEL via a lexer mode, the main parser ignores them, and a ParsePragmas() action in the compilation rule triggers second-pass parsing of each pragma. - AdaLexer.g4: Add PRAGMA_CHANNEL, PRAGMA_MODE with token rules that map back to existing token types via type() - AdaParser.g4: Add superClass=AdaParserBase, ParsePragmas() action in compilation, pragmaRule and pragma_argument_association per RM 2.8 - Create AdaParserBase for all 9 targets (CSharp, Java, Python3, Dart, JavaScript, TypeScript, Antlr4ng, Cpp, Go) - Update transformGrammar.py for Go (parser this.->p.), Cpp and Antlr4ng (parser header includes) - Fixes parse failure on ba/ba1020f1.ada (PRAGMA ELABORATE) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…on3, TypeScript Target-specific API fixes for the two-pass pragma parsing: - Antlr4ng: errorListeners -> getErrorListeners() - Cpp: getErrorListeners() loop -> addErrorListener(&getErrorListenerDispatch()) - Dart: getTokens() returns nullable List<Token>?, add ?? [] fallback - Go: no SetLine/SetColumn, no ListTokenSource, no GetChildren() on ErrorListener; replaced with custom pragmaTokenSource embedding BaseLexer, use GetSource() from original tokens, use dispatch proxy - JavaScript: CommonToken(null,...) crashes, ListTokenSource missing; use t.source from original token, add SimpleTokenSource class - Python3: CommonToken not in antlr4 wildcard import, add explicit import from antlr4.Token - TypeScript: fill()/tokens on CommonTokenStream not BufferedTokenStream, ListTokenSource missing, _listeners not in type decls; cast to CommonTokenStream, add SimpleTokenSource class, use as-any casts Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

caseInsensitive=true already maps lowercase to uppercase, so [A-Za-z] and [0-9A-Fa-f] and [Ee] produce duplicate character warnings. Simplified to [A-Z], [0-9A-F], and [E]. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

● Support Ada 83 obsolescent lexer forms (RM Annex J.2) - BASED_LITERAL: accept ':' as alternative delimiter to '#' (e.g. 2:10:) - CHARACTER_LITERAL_: allow apostrophe and backslash as the enclosed character (~['\\\r\n] -> ~[\r\n]), matching RM 2.6 graphic_character - STRING_LITERAL_: accept '%' as alternative string delimiter with '%%' for embedded percent (e.g. %%%%%345%) - Apply same CHARACTER_LITERAL_ fix to PRAGMA_CHAR_LITERAL in PRAGMA_MODE Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Ada is case-insensitive, so T'DELTA, T'delta, and T'Delta are all valid. The lexer matches DELTA/DIGITS/MOD/ACCESS as keyword tokens regardless of case, but attribute_designator only accepted the title-case-only tokens (DELTA__, DIGITS__, etc.). Add the keyword tokens ACCESS, DELTA, DIGITS, MOD as alternatives so attributes work in any casing. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…ts. Fix block_statement scope underflow. Add TypeClassification, Symbol, SymbolTable data structures and update AdaParserBase with IsAggregate/IsTypeName/EnterDeclaration/EnterScope/ExitScope/PushExpectedType/ PopExpectedType/OutputSymbolTable for the remaining 5 targets. Fix block_statement grammar rule to always call EnterScope before the optional DECLARE clause, preventing scope stack underflow on blocks without DECLARE. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…context, not array Each alternative of generic_instantiation has at most one defining_program_unit_name and one defining_designator. The ANTLR4 generated accessor returns a single context, not an array. Fixed in CSharp, Java, Cpp, Dart, and Go targets. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Package spec symbols (types, subtypes, etc.) must be visible in the corresponding package body and to users of the package. The EnterScope/ ExitScope in package_specification was discarding these symbols, causing qualified_expression like ARR2'(A1) to fail because ARR2 was no longer resolved as a type name. Fixes a74205e.ada parse failure. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…daParser members In antlr4ng, context classes are top-level exports from AdaParser.ts, not nested static members of the AdaParser class. Import all 39 context classes directly and reference them without the AdaParser. prefix. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…Source().GetTokenSource() The Go ANTLR4 runtime's TokenSourceCharStreamPair has unexported fields. Use GetTokenSource() directly on the token to get the source name. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

- TypeScript & Antlr4ng: replace instanceof with ruleIndex type guards to avoid tsx module identity issues (matching c grammar pattern) - JavaScript: add missing Symbol import, remove broken await import Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

The Python antlr4 runtime's FileStream lacks sourceName, causing AttributeError in _define_symbol. Wrap in try/except. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

When parsing a file with `with Foo;`, the parser automatically locates and parses `foo.ads` to import its visible symbols into the current symbol table. Supports --I<path> for search paths, caches parsed packages, and detects cycles. Updated readme with disambiguation docs and CLI options. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Add with-clause import support to Java, Go, Python3, TypeScript, JavaScript, Dart, Antlr4ng, and Cpp targets. Each port includes: - GetExportedSymbols() in SymbolTable - ImportWithClause() with cache, cycle detection, and auto-detection of current file from token stream - --I<path> search path parsing (where applicable) - GNAT file naming convention (dots to hyphens, lowercase, .ads) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

The Dart ANTLR4 runtime generates names() (plural) for the list accessor of repeated rule references, not name() (which takes an index parameter). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…etection GetTokenSource().GetSourceName() returns the lexer grammar name (e.g., "AdaLexer.g4"), not the input file. Use GetInputStream().GetSourceName() instead, which returns the actual file path from the CharStream. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

tokenSource.sourceName is undefined in the JS ANTLR4 runtime. Use tokenSource.inputStream.name instead, which holds the CharStream's name property set by the test harness. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

tokenSource.sourceName raises AttributeError in the Python3 ANTLR4 runtime (FileStream has no sourceName attribute). Use tokenSource.inputStream.name instead, and filter out "<empty>" default. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Two fixes: 1. Use tokenSource.inputStream.name instead of tokenSource.sourceName for file auto-detection (sourceName is undefined in antlr4 JS runtime) 2. Replace require("antlr4") with ESM imports (CharStream, CommonTokenStream) to fix "require is not defined" error in ESM context Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Updated runtime information for examples in README.

kaby76 and others added 14 commits February 14, 2026 21:55

Fix label parsing. Fix relational_operator bug.

76dc11a

Fixes to the Antlr4ng port. Add pragma test to examples/.

314e8db

Add semantics.

7ea6ac5

Update settings.local.json

b06d39e

Merge remote-tracking branch 'upstream/master' into with-semantics

cb233e3

teverett added the ada label Feb 16, 2026

kaby76 and others added 14 commits February 16, 2026 17:36

Fix Dart AdaParserBase: add missing dart:io import for stderr

25e1e9f

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Fix Python3 AdaParserBase: handle missing sourceName attribute

f7f2576

The Python antlr4 runtime's FileStream lacks sourceName, causing AttributeError in _define_symbol. Wrap in try/except. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Add package test.

018a62f

Fix Dart AdaParserBase: use names() not name() for list accessor

a578fa4

The Dart ANTLR4 runtime generates names() (plural) for the list accessor of repeated rule references, not name() (which takes an index parameter). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Update Claude file and readme.

ce424e6

kaby76 marked this pull request as ready for review February 18, 2026 00:20

kaby76 marked this pull request as draft February 18, 2026 01:09

kaby76 added 2 commits February 17, 2026 20:09

Fix acats testing.

d531848

Add performance.

cb23a5c

kaby76 marked this pull request as ready for review February 18, 2026 10:05

kaby76 added 2 commits February 18, 2026 05:31

.m files seem to interfere with Go builds.

9b49289

Modify performance section in readme.md

6f17f7d

Updated runtime information for examples in README.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

[ada2012] Implement disambiguating semantics#4755

[ada2012] Implement disambiguating semantics#4755
kaby76 wants to merge 32 commits intoantlr:masterfrom
kaby76:with-semantics

kaby76 commented Feb 16, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Comments

Conversation

kaby76 commented Feb 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

kaby76 commented Feb 16, 2026 •

edited

Loading