Skip to content

Commit b64fad6

Browse files
committed
Fixed NGrammar ParseTree construction. Added ParseTree.posOpt. Added a new rewriteBinary implementation. Added CLAUDE.md.
1 parent 152f6ee commit b64fad6

File tree

3 files changed

+113
-3
lines changed

3 files changed

+113
-3
lines changed

CLAUDE.md

Lines changed: 37 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,37 @@
1+
# Runtime Library
2+
3+
## Parse Tree Structure (NGrammar.parse)
4+
5+
The LL(k) parser (`NGrammar.parse`) produces a tree of `ParseTree.Leaf` and `ParseTree.Node` values.
6+
7+
### ParseTree.Leaf (tokens)
8+
Created by `LexerDfas.lex` for each token. Fields:
9+
- `text: String` — the matched source text (empty `""` for synthetic EOF)
10+
- `ruleName: String` — the lexer rule name from the grammar (e.g., `"ID"`, `"INT"`, `"LBRACE"`); for string literals, the quoted form (e.g., `"'val'"`); for EOF, `"EOF"`
11+
- `tipe: U32` — unique token type ID from `PredictiveTable.nameMap`
12+
- `isHidden: B``T` for whitespace/comment tokens (skipped by `LexerDfas.tokens` when `skipHidden = T`)
13+
- `posOpt: Option[Position]` — source position
14+
15+
`Leaf` also extends `Token`, so `num` is an alias for `tipe`, and `toLeaf` returns `this`.
16+
17+
### ParseTree.Node (grammar rules)
18+
Created by `NGrammar.parse` for non-terminal rules. Fields:
19+
- `children: ISZ[ParseTree]` — child nodes (Leaf or Node)
20+
- `ruleName: String` — the grammar rule name (e.g., `"file"`, `"exp3"`, `"infixSuffix"`)
21+
- `tipe: U32` — the rule's unique ID from `PredictiveTable.nameMap` (same namespace as token types)
22+
- `posOpt: Option[Position]` — computed from first/last child positions
23+
24+
### Synthetic Rules (isSynthetic)
25+
Grammar normalization (`Grammar.normalize`) converts `*`, `+`, `?` into synthetic recursive rules named `baseName$N` (e.g., `exp3$0`, `program$1`). These have `isSynthetic = T` in the `NRule`.
26+
27+
**Key behavior**: When `NRule.isSynthetic = T`, the parser **does not wrap** the children in a `ParseTree.Node`. Instead, children are inlined flat into the parent. This means:
28+
- `rule*` / `rule+` / `rule?` do NOT produce their own nodes in the parse tree
29+
- Their matched children appear directly as children of the enclosing non-synthetic rule
30+
- For example, `exp3: exp2 infixSuffix*` produces a single `exp3` Node whose children are `[exp2_node, infixSuffix_node, infixSuffix_node, ...]`
31+
32+
### Two NRule Kinds
33+
- `NRule.Elements` — a sequence of elements (single production). If non-synthetic, wraps children in `ParseTree.Node(trees, name, num)`.
34+
- `NRule.Alts` — a choice among alternatives (multi-production). If non-synthetic, wraps the chosen alternative's result in `ParseTree.Node(trees, name, num)`. If synthetic, delegates directly to the chosen alternative without wrapping.
35+
36+
### Name/Type ID Mapping
37+
`PredictiveTable.nameMap: HashSMap[String, U32]` maps both token names and rule names to unique `U32` IDs. `reverseNameMap` provides the inverse. String literal tokens use quoted keys like `"'val'"`. The same `U32` value appears in both `ParseTree.tipe` and `NRule.num`.

library/shared/src/main/scala/org/sireum/parser/NGrammar.scala

Lines changed: 9 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -267,7 +267,15 @@ object NGrammar {
267267
}
268268
def parseAlts(alts: NRule.Alts, i: Z): Option[(Z, ISZ[ParseTree])] = {
269269
pt.predict(alts.num, lookahead(i)) match {
270-
case Some(n) => return parseRule(alts.alts(n), i)
270+
case Some(n) =>
271+
if (alts.isSynthetic) {
272+
return parseRule(alts.alts(n), i)
273+
} else {
274+
parseRule(alts.alts(n), i) match {
275+
case Some((j, trees)) => return Some((j, ISZ(ParseTree.Node(trees, alts.name, alts.num))))
276+
case _ => return None()
277+
}
278+
}
271279
case _ =>
272280
// For synthetic choice rules (star/opt), if the last alt is an empty
273281
// synthetic rule, use it as a default stop/skip when prediction fails.

library/shared/src/main/scala/org/sireum/parser/ParseTree.scala

Lines changed: 67 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -36,6 +36,7 @@ import org.sireum.message.Position
3636
@pure def ruleName: String
3737
@pure def toST: ST
3838
@pure def tipe: U32
39+
@pure def posOpt: Option[Position]
3940

4041
override def string: String = {
4142
return toST.render
@@ -63,7 +64,17 @@ object ParseTree {
6364
st"""$ruleName(
6465
| ${(for (child <- children) yield child.toST, ",\n")}
6566
|)"""
66-
67+
@memoize def posOpt: Option[Position] = {
68+
if (children.isEmpty) {
69+
return None()
70+
}
71+
(children(0).posOpt, children(children.size - 1).posOpt) match {
72+
case (Some(pos1), Some(pos2)) => return Some(pos1.to(pos2))
73+
case (Some(pos1), _) => return Some(pos1)
74+
case (_, Some(pos2)) => return Some(pos2)
75+
case (_, _) => return None()
76+
}
77+
}
6778
}
6879

6980
@record class DotGenerator {
@@ -133,10 +144,65 @@ object ParseTree {
133144

134145

135146
// T1[exp] ( T1[op] T1[exp] )* => T2[exp]
147+
// Uses divide-and-conquer: finds the lowest-precedence operator as the root,
148+
// then recursively builds the left and right subtrees.
149+
// For same-precedence operators: picks the rightmost for left-associative (so the
150+
// left subtree is larger), or the leftmost for right-associative (so the right
151+
// subtree is larger).
136152
def rewriteBinary[Builder, T1, T2](builder: Builder,
137153
bp: BinaryPrecedenceOps[Builder, T1, T2],
138154
trees: ISZ[T1],
139155
reporter: message.Reporter): T2 = {
156+
val acs: ISZ[T2] = for (t <- trees) yield bp.transform(builder, t)
157+
// acs layout: [operand0, op0, operand1, op1, operand2, ...]
158+
// Operand at acs(i * 2), operator at acs(i * 2 + 1)
159+
// lo..hi are operand indices (inclusive), with hi - lo operators between them
160+
def build(lo: Z, hi: Z): T2 = {
161+
if (lo == hi) {
162+
return acs(lo * 2)
163+
}
164+
// Find the split operator: lowest precedence to be the root
165+
var splitIdx: Z = lo
166+
var splitPrec: Z = bp.precedence(acs(lo * 2 + 1)) match {
167+
case Some(n) => n
168+
case _ => bp.lowestPrecedence
169+
}
170+
for (i <- lo + 1 until hi) {
171+
val op = acs(i * 2 + 1)
172+
val p: Z = bp.precedence(op) match {
173+
case Some(n) => n
174+
case _ => bp.lowestPrecedence
175+
}
176+
val isLower = bp.isHigherPrecedence(splitPrec, p)
177+
val isEqual = !isLower && !bp.isHigherPrecedence(p, splitPrec)
178+
if (isLower || (isEqual && !bp.isRightAssoc(op))) {
179+
splitPrec = p
180+
splitIdx = i
181+
}
182+
}
183+
val left = build(lo, splitIdx)
184+
val right = build(splitIdx + 1, hi)
185+
val op = acs(splitIdx * 2 + 1)
186+
var l = left
187+
var r = right
188+
if (bp.shouldParenthesizeOperands(op)) {
189+
if (bp.isBinary(l)) {
190+
l = bp.parenthesize(builder, l)
191+
}
192+
if (bp.isBinary(r)) {
193+
r = bp.parenthesize(builder, r)
194+
}
195+
}
196+
return bp.binary(builder, l, op, r)
197+
}
198+
return build(0, (acs.size - 1) / 2)
199+
}
200+
201+
// T1[exp] ( T1[op] T1[exp] )* => T2[exp]
202+
def rewriteBinaryOld[Builder, T1, T2](builder: Builder,
203+
bp: BinaryPrecedenceOps[Builder, T1, T2],
204+
trees: ISZ[T1],
205+
reporter: message.Reporter): T2 = {
140206
def construct(ts: ISZ[T2], rightAssoc: B, start: Z, stop: Z): T2 = {
141207
if (rightAssoc) {
142208
var r = ts(stop)
@@ -244,5 +310,4 @@ object ParseTree {
244310
}
245311
return acs(0)
246312
}
247-
248313
}

0 commit comments

Comments
 (0)