You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: hugo/content/guides/multi-mode-lexing.md
+98-39Lines changed: 98 additions & 39 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -7,66 +7,96 @@ Many modern programming languages such as [JavaScript](https://developer.mozilla
7
7
They are a way to easily concatenate or interpolate string values while maintaining great code readability.
8
8
This guide will show you how to support template literals in Langium.
9
9
10
+
For this specific example, our template literal starts and ends using backticks `` ` `` and are interupted by expressions that are wrapped in curly braces `{}`.
11
+
So in our example, usage of template literals might look something like this:
12
+
13
+
```js
14
+
println(`hello {name}!`);
15
+
```
16
+
17
+
Conceptually, template strings work by reading a start terminal which starts with `` ` `` and ends with `{`,
18
+
followed by an expression and then an end terminal which is effectively just the start terminal in reverse using `}` and `` ` ``.
19
+
Since we don't want to restrict users to only a single expression in their template literals, we also need a "middle" terminal reading from `}` to `{`.
20
+
Of course, there's also the option that a user only uses a template literal without any expressions in there.
21
+
So we additionally need a "full" terminal that reads from the start of the literal all the way to the end in one go.
22
+
23
+
To achieve this, we will define a `TemplateLiteral` parser rule and a few terminals.
24
+
These terminals will adhere to the requirements that we just defined.
25
+
To make it a bit easier to read and maintain, we also define a special terminal fragment that we can reuse in all our terminal definitions:
26
+
10
27
```antlr
11
28
TemplateLiteral:
12
29
// Either just the full content
13
-
content+=TemplateContent |
14
-
// Or template string parts with expressions in between
30
+
content+=TEMPLATE_LITERAL_FULL |
31
+
// Or template literal parts with expressions in between
15
32
(
16
-
content+=TemplateContentStart
17
-
content+=Expression?
33
+
content+=TEMPLATE_LITERAL_START
34
+
content+=Expression?
18
35
(
19
-
content+=TemplateContentMiddle
36
+
content+=TEMPLATE_LITERAL_MIDDLE
20
37
content+=Expression?
21
-
)*
22
-
content+=TemplateContentEnd
23
-
);
38
+
)*
39
+
content+=TEMPLATE_LITERAL_END
40
+
)
41
+
;
24
42
25
-
TemplateContent returns TextLiteral:
26
-
value=RICH_TEXT;
43
+
terminal TEMPLATE_LITERAL_FULL:
44
+
'`' IN_TEMPLATE_LITERAL* '`';
27
45
28
-
TemplateContentStart returns TextLiteral:
29
-
value=RICH_TEXT_START;
46
+
terminal TEMPLATE_LITERAL_START:
47
+
'`' IN_TEMPLATE_LITERAL* '{';
30
48
31
-
TemplateContentMiddle returns TextLiteral:
32
-
value=RICH_TEXT_INBETWEEN;
49
+
terminal TEMPLATE_LITERAL_MIDDLE:
50
+
'}' IN_TEMPLATE_LITERAL* '{';
33
51
34
-
TemplateContentEnd returns TextLiteral:
35
-
value=RICH_TEXT_END;
52
+
terminal TEMPLATE_LITERAL_END:
53
+
'}' IN_TEMPLATE_LITERAL* '`';
36
54
37
-
terminal RICH_TEXT:
38
-
'`' IN_RICH_TEXT* '`';
55
+
// '{{' is handled in a special way so we can escape normal '{' characters
56
+
// '``' is doing the same for the '`' character
57
+
terminal fragment IN_TEMPLATE_LITERAL:
58
+
/[^{`]|{{|``/;
59
+
```
39
60
40
-
terminal RICH_TEXT_START:
41
-
'`' IN_RICH_TEXT* '{';
61
+
If we go ahead and start parsing files with these changes, most things should work as expected.
62
+
However, depending on the structure of your existing grammar, some of these new terminals might be in conflict with existing terminals of your language.
63
+
For example, if your language supports block statements, chaining multiple blocks together will make this issue apparent:
42
64
43
-
terminal RICH_TEXT_INBETWEEN:
44
-
'}' IN_RICH_TEXT* '{';
65
+
```js
66
+
{
67
+
console.log('hi');
68
+
}
69
+
{
70
+
console.log('hello');
71
+
}
72
+
```
45
73
46
-
terminal RICH_TEXT_END:
47
-
'}' IN_RICH_TEXT* '`';
74
+
The `} ... {` block in this example won't be parsed as separate `}` and `{` tokens, but instead as a single `TEMPLATE_LITERAL_MIDDLE` token, resulting in a parser error due to the unexpected token.
75
+
This doesn't make a lot of sense, since we aren't in the middle of a template literal at this point anyway.
76
+
However, our lexer doesn't know yet that the `TEMPLATE_LITERAL_MIDDLE` and `TEMPLATE_LITERAL_END` terminals are only allowed to show up within a `TemplateLiteral` rule.
77
+
To rectify this, we will need to make use of lexer modes. They will give us the necessary context to know whether we're inside a template literal or outside of it.
78
+
Depending on the current selected mode, we can lex different terminals. In our case, we want to exclude the `TEMPLATE_LITERAL_MIDDLE` and `TEMPLATE_LITERAL_END` terminals.
48
79
49
-
terminal fragment IN_RICH_TEXT:
50
-
/[^{`]|{{|``/;
51
-
```
80
+
The following implementation of a `TokenBuilder` will do the job for us. It creates two lexing modes, which are almost identical except for the `TEMPLATE_LITERAL_MIDDLE` and `TEMPLATE_LITERAL_END` terminals.
81
+
We will also need to make sure that the modes are switched based on the `TEMPLATE_LITERAL_START` and `TEMPLATE_LITERAL_END` terminals. We use `PUSH_MODE` and `POP_MODE` for this.
With this change in place, the parser will work as expected. There is one last issue which we need to resolve in order to get everything working perfectly.
146
+
When inspecting our AST, the `TemplateLiteral` object will contain strings with input artifacts in there (mainly `` ` ``, `{` and `}`).
147
+
These aren't actually part of the semantic value of these strings, so we should get rid of them.
148
+
We will need to create a custom `ValueConverter` and remove these artifacts:
0 commit comments