You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
As many other language recognition tools, `better-parse` abstracts away from raw character input by
43
-
pre-processing it with a `Lexer`, that can match `Token`s by their patterns (regular expressions) against an input sequence.
43
+
pre-processing it with a `Tokenizer`, that can match `Token`s by their patterns (regular expressions) against an input sequence.
44
44
45
-
A `Lexer` tokenizes an input sequence such as `InputStream` or a `String` into a `Sequence<TokenMatch>`, providing each with a position in the input.
45
+
A `Tokenizer` tokenizes an input sequence such as `InputStream` or a `String` into a `Sequence<TokenMatch>`, providing each with a position in the input.
46
46
47
-
One way to create a `Lexer` is to first define the `Tokens` to be matched:
47
+
One way to create a `Tokenizer` is to first define the `Tokens` to be matched:
48
48
49
49
```kotlin
50
-
val id =Token("identifier", pattern ="\\w+")
51
-
val cm =Token("comma", pattern =",")
52
-
val ws =Token("whitespace", pattern ="\\s+", ignore =true)
50
+
val id =Token("\\w+")
51
+
val cm =Token(",")
52
+
val ws =Token("\\s+", ignore =true)
53
53
```
54
54
55
55
>A `Token` can be ignored by setting its `ignore =true`. An ignored token can still be matched explicitly, but if
@@ -59,14 +59,16 @@ another token is expected, the ignored one is just dropped from the sequence.
59
59
val tokenizer =DefaultTokenizer(listOf(id, cm, ws))
60
60
```
61
61
62
-
>Note: the tokens order matters in some cases, because the tokenizer tries to match them in exactly this order. For instance, if `Token("singleA", "a")`
63
-
is listed before `Token("doubleA", "aa")`, the latter will never be matched. Be careful with keyword tokens!
62
+
>Note: the tokens order matters in some cases, because the tokenizer tries to match them in exactly this order. For instance, if `Token("a")`
63
+
is listed before `Token("aa")`, the latter will never be matched. Be careful with keyword tokens!
64
64
65
65
```kotlin
66
66
val tokenMatches:Sequence<TokenMatch> = tokenizer.tokenize("hello, world") // Support other types of input as well.
67
67
```
68
68
69
-
>A more convenient way of defining tokens and creating a tokenizer is described in the **Grammar** section.
69
+
>A more convenient way of defining tokens is described in the [**Grammar**](#grammar) section.
70
+
71
+
Itis possible to provide a custom implementation of a `Tokenizer`.
70
72
71
73
## Parser ##
72
74
@@ -86,8 +88,8 @@ with the match of this token itself _(possibly, skipping some **ignored** tokens
86
88
_(and, possibly, some ignored tokens)_ from the remainder.
87
89
88
90
```kotlin
89
-
val a =Token(name ="a", pattern ="a+")
90
-
val b =Token(name ="b", pattern ="b+")
91
+
val a =Token("a+")
92
+
val b =Token("b+")
91
93
val tokenMatches =Lexer(listOf(a, b)).tokenize("aabbaaa")
92
94
val result = a.tryParse(tokenMatches) // contains the match for "aa" and the remainder with "bbaaa" in it
93
95
```
@@ -103,7 +105,7 @@ There are several kinds of combinators included in `better-parse`:
103
105
The error results are returned unchanged.
104
106
105
107
```kotlin
106
-
val id =Token("identifier", pattern ="\\w+")
108
+
val id =Token("\\w+")
107
109
val aText = a map { it.text } // Parser<String>, returns the matched text from the input sequence
108
110
```
109
111
@@ -156,6 +158,8 @@ There are several kinds of combinators included in `better-parse`:
156
158
157
159
* ```val fCall = id and lpar and id and rpar use { FunctionCall(t1, t3) }```
158
160
161
+
* ```val fCall = id * -lpar * id * -rpar use { FunctionCall(t1, t2) }``` (see operators below)
162
+
159
163
> There are `Tuple` classes up to `Tuple16` and the corresponding `and` overloads.
160
164
161
165
##### Operators
@@ -177,7 +181,7 @@ There are several kinds of combinators included in `better-parse`:
177
181
The result type for the combined parsers is the least common supertype (which is possibly `Any`).
There are optional arguments for customizing the transformation:
276
+
277
+
* `LiftToSyntaxTreeOptions`
278
+
* `retainSkipped` -- whether the resulting syntax tree should include skipped `and` components;
279
+
* `retainSeparators` -- whether the `Separated` combinator parsed separators should be included;
280
+
* `structureParsers` -- defines the parsers that are retained in the syntax tree; the nodes with parsers that are
281
+
not in this set are flattened so that their children are attached to their parents in their place.
282
+
283
+
For `Parser<T>`, the default is `null`, which means no nodes are flattened.
284
+
285
+
In case of `Grammar<T>`, `structureParsers` defaults to the grammar's `declaredParsers`.
286
+
287
+
* `transformer` -- a strategy to transform non-built-in parsers. If you define your own combinators and want them
288
+
to be lifted to syntax tree parsers, pass a `LiftToSyntaxTreeTransformer` that will be called on the parsers. When
289
+
a custom combinator nests another parser, a trnsformer implementation should call `default.transform(...)` on that parser.
290
+
291
+
See [`SyntaxTreeDemo.kt`](https://github.com/h0tk3y/better-parse/blob/master/demo/src/main/kotlin/com/example/SyntaxTreeDemo.kt) for an example of working with syntax trees.
292
+
245
293
# Examples
246
294
247
295
*A boolean expressions parser that constructs a simple AST: [`BooleanExpression.kt`](https://github.com/h0tk3y/better-parse/blob/master/demo/src/main/kotlin/com/example/BooleanExpression.kt)
0 commit comments