Skip to content

Comments

[ada2012] Implement disambiguating semantics#4755

Open
kaby76 wants to merge 32 commits intoantlr:masterfrom
kaby76:with-semantics
Open

[ada2012] Implement disambiguating semantics#4755
kaby76 wants to merge 32 commits intoantlr:masterfrom
kaby76:with-semantics

Conversation

@kaby76
Copy link
Contributor

@kaby76 kaby76 commented Feb 16, 2026

This PR fixes #4754, addressing (partially) the inherent syntactic ambiguity in the grammar. As a first step to address ambiguity, this PR adds disambiguating predicates. To handle with statements in Ada, the parser now reads package .ads files. Options are provided to specify the search path for Ada packages, and debug the symbol table constructed for disambiguation.

Almost all ports are supported. I didn't bother with PHP because it has serious bugs.

A new readme.md file has been written for a full description of the grammar. It's modeled after the one for the C grammar.

kaby76 and others added 14 commits February 14, 2026 21:55
…pression, lexer character literal handling

 - Add missing SEMI terminators to 11 parser rules: subtype_declaration, generic_instantiation (3 alternatives),
  record_representation_clause, number_declaration, abstract_subprogram_declaration, goto_statement, procedure_call_statement (1st
  alternative), delay_until_statement, delay_relative_statement, abort_statement, formal_object_declaration (2nd alternative)
  - Fix record_representation_clause: remove incorrect optional marker on END RECORD
  - Fix subtype_mark to accept dotted names (identifier (DOT identifier)*) instead of just identifier, allowing qualified names like
  PACK2.ACC1
  - Fix qualified_expression: remove extra parentheses around aggregate alternative, matching Ada RM definition
  - Fix formal_signed_integer_type_definition: use RANGE_ token instead of range parser rule
  - Merge '!' into VL token to support Ada 83 obsolescent replacement character (RM J.2)
  - Add lexer predicate IsCharLiteralAllowed() to disambiguate character literals from attribute ticks, using AdaLexerBase superclass
   to track previous token
  - Add AdaLexerBase implementations for all targets: CSharp, Java, Python3, Dart, JavaScript, TypeScript, Antlr4ng, Cpp, Go
  - Add transformGrammar.py for Python3, Antlr4ng, Cpp, and Go targets
  - Add the ACATS.
The Ada RM defines pragma placement rules in prose, not in EBNF productions. Rather than deviating from the RM by inserting pragma into multiple parser rules, this uses a two-pass approach: the lexer tokenizes pragma content onto a separate PRAGMA_CHANNEL via a lexer mode, the main parser ignores them, and a ParsePragmas() action in the compilation rule triggers second-pass parsing of each pragma.

  - AdaLexer.g4: Add PRAGMA_CHANNEL, PRAGMA_MODE with token rules
    that map back to existing token types via type()
  - AdaParser.g4: Add superClass=AdaParserBase, ParsePragmas() action
    in compilation, pragmaRule and pragma_argument_association per RM 2.8
  - Create AdaParserBase for all 9 targets (CSharp, Java, Python3,
    Dart, JavaScript, TypeScript, Antlr4ng, Cpp, Go)
  - Update transformGrammar.py for Go (parser this.->p.), Cpp and
    Antlr4ng (parser header includes)
  - Fixes parse failure on ba/ba1020f1.ada (PRAGMA ELABORATE)

  Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…on3, TypeScript

Target-specific API fixes for the two-pass pragma parsing:
- Antlr4ng: errorListeners -> getErrorListeners()
- Cpp: getErrorListeners() loop -> addErrorListener(&getErrorListenerDispatch())
- Dart: getTokens() returns nullable List<Token>?, add ?? [] fallback
- Go: no SetLine/SetColumn, no ListTokenSource, no GetChildren() on ErrorListener; replaced with custom pragmaTokenSource embedding BaseLexer, use GetSource() from original tokens, use dispatch proxy
- JavaScript: CommonToken(null,...) crashes, ListTokenSource missing; use t.source from original token, add SimpleTokenSource class
- Python3: CommonToken not in antlr4 wildcard import, add explicit import from antlr4.Token
- TypeScript: fill()/tokens on CommonTokenStream not BufferedTokenStream, ListTokenSource missing, _listeners not in type decls; cast to CommonTokenStream, add SimpleTokenSource class, use as-any casts

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
caseInsensitive=true already maps lowercase to uppercase, so [A-Za-z] and [0-9A-Fa-f] and [Ee] produce duplicate character warnings. Simplified to [A-Z], [0-9A-F], and [E].

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
● Support Ada 83 obsolescent lexer forms (RM Annex J.2)

  - BASED_LITERAL: accept ':' as alternative delimiter to '#' (e.g. 2:10:)
  - CHARACTER_LITERAL_: allow apostrophe and backslash as the enclosed
    character (~['\\\r\n] -> ~[\r\n]), matching RM 2.6 graphic_character
  - STRING_LITERAL_: accept '%' as alternative string delimiter with
    '%%' for embedded percent (e.g. %%%%%345%)
  - Apply same CHARACTER_LITERAL_ fix to PRAGMA_CHAR_LITERAL in
    PRAGMA_MODE

  Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
  Ada is case-insensitive, so T'DELTA, T'delta, and T'Delta are all
  valid. The lexer matches DELTA/DIGITS/MOD/ACCESS as keyword tokens
  regardless of case, but attribute_designator only accepted the
  title-case-only tokens (DELTA__, DIGITS__, etc.). Add the keyword
  tokens ACCESS, DELTA, DIGITS, MOD as alternatives so attributes
  work in any casing.

  Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…ts. Fix block_statement scope underflow.

Add TypeClassification, Symbol, SymbolTable data structures and update AdaParserBase
with IsAggregate/IsTypeName/EnterDeclaration/EnterScope/ExitScope/PushExpectedType/
PopExpectedType/OutputSymbolTable for the remaining 5 targets. Fix block_statement
grammar rule to always call EnterScope before the optional DECLARE clause, preventing
scope stack underflow on blocks without DECLARE.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…context, not array

Each alternative of generic_instantiation has at most one defining_program_unit_name
and one defining_designator. The ANTLR4 generated accessor returns a single context,
not an array. Fixed in CSharp, Java, Cpp, Dart, and Go targets.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Package spec symbols (types, subtypes, etc.) must be visible in the
corresponding package body and to users of the package. The EnterScope/
ExitScope in package_specification was discarding these symbols, causing
qualified_expression like ARR2'(A1) to fail because ARR2 was no longer
resolved as a type name. Fixes a74205e.ada parse failure.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@teverett teverett added the ada label Feb 16, 2026
kaby76 and others added 14 commits February 16, 2026 17:36
…daParser members

In antlr4ng, context classes are top-level exports from AdaParser.ts,
not nested static members of the AdaParser class. Import all 39 context
classes directly and reference them without the AdaParser. prefix.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…Source().GetTokenSource()

The Go ANTLR4 runtime's TokenSourceCharStreamPair has unexported fields.
Use GetTokenSource() directly on the token to get the source name.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- TypeScript & Antlr4ng: replace instanceof with ruleIndex type guards
  to avoid tsx module identity issues (matching c grammar pattern)
- JavaScript: add missing Symbol import, remove broken await import

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The Python antlr4 runtime's FileStream lacks sourceName, causing
AttributeError in _define_symbol. Wrap in try/except.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
When parsing a file with `with Foo;`, the parser automatically locates
and parses `foo.ads` to import its visible symbols into the current
symbol table. Supports --I<path> for search paths, caches parsed
packages, and detects cycles. Updated readme with disambiguation
docs and CLI options.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add with-clause import support to Java, Go, Python3, TypeScript,
JavaScript, Dart, Antlr4ng, and Cpp targets. Each port includes:
- GetExportedSymbols() in SymbolTable
- ImportWithClause() with cache, cycle detection, and auto-detection
  of current file from token stream
- --I<path> search path parsing (where applicable)
- GNAT file naming convention (dots to hyphens, lowercase, .ads)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The Dart ANTLR4 runtime generates names() (plural) for the list
accessor of repeated rule references, not name() (which takes an
index parameter).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…etection

GetTokenSource().GetSourceName() returns the lexer grammar name (e.g.,
"AdaLexer.g4"), not the input file. Use GetInputStream().GetSourceName()
instead, which returns the actual file path from the CharStream.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
tokenSource.sourceName is undefined in the JS ANTLR4 runtime. Use
tokenSource.inputStream.name instead, which holds the CharStream's
name property set by the test harness.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
tokenSource.sourceName raises AttributeError in the Python3 ANTLR4
runtime (FileStream has no sourceName attribute). Use
tokenSource.inputStream.name instead, and filter out "<empty>" default.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Two fixes:
1. Use tokenSource.inputStream.name instead of tokenSource.sourceName
   for file auto-detection (sourceName is undefined in antlr4 JS runtime)
2. Replace require("antlr4") with ESM imports (CharStream, CommonTokenStream)
   to fix "require is not defined" error in ESM context

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@kaby76 kaby76 marked this pull request as ready for review February 18, 2026 00:20
@kaby76 kaby76 marked this pull request as draft February 18, 2026 01:09
@kaby76 kaby76 marked this pull request as ready for review February 18, 2026 10:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[ada2012] Ambiguity in the grammar and poor performance.

2 participants