Skip to content

Commit 7b3505e

Browse files
authored
Add documentation for infix operators (#285)
* Moved the infix operator pages to the reference pages * Add a link with "tree-rewriting actions" section Signed-off-by: Markus Rudolph <markus.rudolph@typefox.io>
1 parent 7428549 commit 7b3505e

File tree

5 files changed

+209
-2
lines changed

5 files changed

+209
-2
lines changed

hugo/content/docs/reference/grammar-language.md renamed to hugo/content/docs/reference/grammar-language/_index.md

Lines changed: 20 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -300,6 +300,7 @@ SimpleExpression:
300300
'(' Addition ')' | value=INT;
301301
```
302302
Essentially this means that when a `+` keyword is found, a new object of type `Addition` is created and the current object is assigned to the `left` property of the new object. The `Addition` then becomes the new current object. In imperative pseudo code it may look like this:
303+
303304
```js
304305
function Addition() {
305306
let current = SimpleExpression()
@@ -311,7 +312,10 @@ function Addition() {
311312
}
312313
}
313314
```
314-
Please refer to [this blog post](https://www.typefox.io/blog/parsing-expressions-with-xtext) for further details.
315+
316+
If you want to learn more about parsing binary operators in Langium, please refer to this [dedicated documentation page](/docs/reference/grammar-language/infix-operators).
317+
318+
You can also refer to [this blog post](https://www.typefox.io/blog/parsing-expressions-with-xtext) for further details how this was done in Xtext.
315319

316320
### Data Type Rules
317321
Data type rules are similar to terminal rules as they match a sequence of characters. However, they are parser rules and are therefore context-dependent. This allows for more flexible parsing, as they can be interspersed with hidden terminals, such as whitespaces or comments. Contrary to terminal rules, they cannot use *regular expressions* to match a stream of characters, so they have to be composed of keywords, terminal rules or other data type rules.
@@ -398,6 +402,21 @@ Element<isRoot>:
398402

399403
The parser will always exclude alternatives whose guard conditions evaluate to `false`. All other alternatives remain possible options for the parser to choose from.
400404

405+
### Infix operators
406+
407+
[Infix operators](/docs/reference/grammar-language/infix-operators) can be defined using the `infix` keyword. This new syntax allows to define operators with different precedence levels and associativity in a more readable way. For more information on infix operators, please refer to the [dedicated documentation page](/docs/reference/grammar-language/infix-operators/syntactical-implementation). Be aware that the `infix` notation is syntactic sugar. You can do it also the [manual way using parser rules](/docs/reference/grammar-language/infix-operators/manual-implementation).
408+
409+
```langium
410+
Expression: BinaryExpr;
411+
412+
infix BinaryExpr on PrimaryExpression:
413+
right assoc '^'
414+
> '*' | '/'
415+
> '+' | '-'
416+
;
417+
418+
PrimaryExpression: '(' expr=Expression ')' | value=Number;
419+
```
401420

402421
### More Examples
403422

Lines changed: 49 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,49 @@
1+
---
2+
title: "Infix Operators"
3+
weight: 125
4+
---
5+
6+
Infix operators are binary operators - they require two operands and an operator in between them. For example, in the expression `a + b`, `+` is an infix operator with operands `a` and `b`.
7+
8+
## Grammar Ambiguities
9+
10+
With infix operator we are confronted with grammar ambiguities.
11+
12+
Ambiguities are a problem for the parsing process, because the parser needs to be able to determine the structure of an expression unambiguously.
13+
14+
For example, consider these expressions:
15+
16+
```plain
17+
a + b * c
18+
a - b - c
19+
```
20+
21+
### Ambiguity between different operators
22+
23+
Without additional rules, the first expression can be interpreted in two different ways:
24+
25+
1. As `(a + b) * c`, where addition is performed first, followed by multiplication.
26+
2. As `a + (b * c)`, where multiplication is performed first, followed by addition.
27+
28+
Solution: The parser needs to know the **precedence** of the operators. In this case, multiplication has higher precedence than addition, so the correct interpretation is `a + (b * c)`. A precedence of an operator is higher, lower or equal to another operator's precedence.
29+
30+
### Ambiguity between same operators
31+
32+
The second expression can also be interpreted in two different ways:
33+
34+
1. As `(a - b) - c`, where the first subtraction is performed first.
35+
2. As `a - (b - c)`, where the second subtraction is performed first.
36+
37+
Solution: The parser needs to know the **associativity** of the operator. In this case, subtraction is left-associative, so the correct interpretation is `(a - b) - c`. An operator can be left-associative, right-associative or non-associative. A way to remember what is what:
38+
Imagine a operand between two same operators (like `... - b - ...`), which operator is executed first? If the left one is executed first, the operator is left-associative. If the right one is executed first, the operator is right-associative. If neither can be executed first (like in `a < b < c`), the operator is non-associative.
39+
40+
Operator candidates for right-associativity are usually assignment operators (e.g., `=`) and exponentiation operators (e.g., `^`).
41+
42+
## Implementation
43+
44+
In order to embed precedence and associativity rules into the parsing process, we can use a technique called **precedence climbing** (also known as operator-precedence parsing). Precedence climbing is a recursive parsing technique that allows us to handle infix operators with different precedences and associativities in a straightforward manner.
45+
46+
Independent of the used parser generator or language framework, this technique can be often implemented using grammar rules. In Langium, you have two options to implement precedence climbing:
47+
48+
* using [non-terminal rules](/docs/reference/grammar-language/infix-operators/manual-implementation)
49+
* using the [infix syntax](/docs/reference/grammar-language/infix-operators/syntactical-implementation)
Lines changed: 70 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,70 @@
1+
---
2+
title: "Using non-terminal rules"
3+
weight: 100
4+
---
5+
6+
Infix operators can also be implemented using non-terminal rules. This approach provides more flexibility and allows for better handling of operator precedence and associativity.
7+
8+
## Operator table
9+
10+
Let's implement a simple expression grammar with the following operators:
11+
12+
| Operator | Precedence | Associativity |
13+
|----------|------------|---------------|
14+
| `+` | 1 | Left |
15+
| `-` | 1 | Left |
16+
| `*` | 2 | Left |
17+
| `/` | 2 | Left |
18+
| `^` | 3 | Right |
19+
20+
## Grammar rules
21+
22+
```langium
23+
Expression:
24+
Addition;
25+
26+
Addition infers Expression:
27+
Multiplication ({infer BinaryExpression.left=current} operator=('+' | '-') right=Multiplication)*;
28+
29+
Multiplication infers Expression:
30+
Exponentiation ({infer BinaryExpression.left=current} operator=('*' | '/') right=Exponentiation)*;
31+
32+
Exponentiation infers Expression:
33+
Primary ({infer BinaryExpression.left=current} operator='^' right=Exponentiation)?;
34+
35+
Primary infers Expression:
36+
'(' Expression ')'
37+
| ... //prefix, literals, identifiers
38+
;
39+
```
40+
41+
## Explanation
42+
43+
### Precedence
44+
45+
Precedence is handled by creating a hierarchy of non-terminal rules. Each rule corresponds to a level of precedence, with higher-precedence operators being defined in rules that are called by lower-precedence rules. For example, `Addition` calls `Multiplication`, which in turn calls `Exponentiation`. This ensures that multiplication and division are evaluated before addition and subtraction, and exponentiation is evaluated before both.
46+
47+
### Associativity
48+
49+
Associativity is handled by three patterns. They make sure that operators are grouped correctly based on their associativity.
50+
51+
#### Pattern 1 (Left Associative):
52+
53+
```langium
54+
Current infers Expression:
55+
Next ({infer BinaryExpression.left=current} operator=('op1' | 'op2' | ...) right=Next)*;
56+
```
57+
58+
#### Pattern 2 (Right Associative):
59+
60+
```langium
61+
Current infers Expression:
62+
Next ({infer BinaryExpression.left=current} operator=('op1'| 'op2' | ...) right=Current)?;
63+
```
64+
65+
#### Pattern 3 (Non-Associative):
66+
67+
```langium
68+
Current infers Expression:
69+
Next ({infer BinaryExpression.left=current} operator=('op1' | 'op2' | ...) right=Next)?;
70+
```
Lines changed: 69 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,69 @@
1+
---
2+
title: "Using the infix syntax"
3+
weight: 200
4+
---
5+
6+
## Operator table
7+
8+
Let's implement a simple expression grammar with the following operators:
9+
10+
| Operator | Precedence | Associativity |
11+
|----------|------------|---------------|
12+
| `+` | 1 | Left |
13+
| `-` | 1 | Left |
14+
| `*` | 2 | Left |
15+
| `/` | 2 | Left |
16+
| `^` | 3 | Right |
17+
18+
## Grammar rules
19+
20+
```langium
21+
Expression: BinaryExpr;
22+
23+
infix BinaryExpr on PrimaryExpression:
24+
right assoc '^'
25+
> '*' | '/'
26+
> '+' | '-'
27+
;
28+
29+
PrimaryExpression: '(' expr=Expression ')' | value=Number;
30+
```
31+
32+
In addition to better readability, the new notation also makes use of **performance optimizations** to speed up expression parsing by roughly 50% compared to the typical way of writing expressions.
33+
34+
### Primary expression
35+
36+
The `PrimaryExpression` rule defines the basic building blocks of our expressions, which can be (for example) a parenthesized expression, an unary expression, or a number literal.
37+
38+
```langium
39+
Expression: BinaryExpr;
40+
41+
infix BinaryExpr on PrimaryExpression:
42+
...
43+
;
44+
45+
PrimaryExpression: '(' expr=Expression ')' | '-' value=PrimaryExpression | value=Number | ...;
46+
```
47+
48+
### Precedence
49+
50+
Use the `>` operator to define precedence levels. Operators listed after a `>` have lower precedence than those before it. In the example above, `^` has the highest precedence, followed by `*` and `/`, and finally `+` and `-` with the lowest precedence.
51+
52+
```langium
53+
infix BinaryExpr on ...:
54+
A > B > C
55+
//A has higher precedence than B, B higher than C
56+
;
57+
```
58+
59+
If you have multiple operators with the same precedence, list them on the same line separated by `|`.
60+
61+
### Associativity
62+
63+
The default associativity for infix operators is left associative. To specify right associativity, use the `assoc` keyword preceded by `right` before the operator.
64+
65+
```langium
66+
infix BinaryExpr on ...:
67+
...> right assoc '^' >...
68+
;
69+
```

hugo/static/prism/langium-prism.js

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -18,7 +18,7 @@ Prism.languages.langium = {
1818
greedy: true
1919
},
2020
keyword: {
21-
pattern: /\b(interface|fragment|terminal|boolean|current|extends|grammar|returns|bigint|hidden|import|infers|number|string|entry|false|infer|Date|true|type|with)\b/
21+
pattern: /\b(interface|fragment|terminal|boolean|current|extends|grammar|returns|bigint|hidden|import|infers|number|string|entry|false|infer|Date|true|type|with|on|infix|left|right|assoc)\b/
2222
},
2323
property: {
2424
pattern: /\b[a-z][\w]*(?==|\?=|\+=|\??:|>)\b/

0 commit comments

Comments
 (0)