Skip to content

Commit 813d1cb

Browse files
committed
Refactor default params of syntax tree lifting.
Advance version to 0.3.0 Update README.md
1 parent 1282b8c commit 813d1cb

File tree

5 files changed

+89
-41
lines changed

5 files changed

+89
-41
lines changed

README.md

Lines changed: 66 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -34,22 +34,22 @@ repositories {
3434
}
3535

3636
dependencies {
37-
compile 'com.github.h0tk3y.betterParse:better-parse:0.2.1'
37+
compile 'com.github.h0tk3y.betterParse:better-parse:0.3.0'
3838
}
3939
```
4040

41-
## Lexer & tokens ##
41+
## Tokens ##
4242
As many other language recognition tools, `better-parse` abstracts away from raw character input by
43-
pre-processing it with a `Lexer`, that can match `Token`s by their patterns (regular expressions) against an input sequence.
43+
pre-processing it with a `Tokenizer`, that can match `Token`s by their patterns (regular expressions) against an input sequence.
4444

45-
A `Lexer` tokenizes an input sequence such as `InputStream` or a `String` into a `Sequence<TokenMatch>`, providing each with a position in the input.
45+
A `Tokenizer` tokenizes an input sequence such as `InputStream` or a `String` into a `Sequence<TokenMatch>`, providing each with a position in the input.
4646

47-
One way to create a `Lexer` is to first define the `Tokens` to be matched:
47+
One way to create a `Tokenizer` is to first define the `Tokens` to be matched:
4848

4949
```kotlin
50-
val id = Token("identifier", pattern = "\\w+")
51-
val cm = Token("comma", pattern = ",")
52-
val ws = Token("whitespace", pattern = "\\s+", ignore = true)
50+
val id = Token("\\w+")
51+
val cm = Token(",")
52+
val ws = Token("\\s+", ignore = true)
5353
```
5454

5555
> A `Token` can be ignored by setting its `ignore = true`. An ignored token can still be matched explicitly, but if
@@ -59,14 +59,16 @@ another token is expected, the ignored one is just dropped from the sequence.
5959
val tokenizer = DefaultTokenizer(listOf(id, cm, ws))
6060
```
6161

62-
> Note: the tokens order matters in some cases, because the tokenizer tries to match them in exactly this order. For instance, if `Token("singleA", "a")`
63-
is listed before `Token("doubleA", "aa")`, the latter will never be matched. Be careful with keyword tokens!
62+
> Note: the tokens order matters in some cases, because the tokenizer tries to match them in exactly this order. For instance, if `Token("a")`
63+
is listed before `Token("aa")`, the latter will never be matched. Be careful with keyword tokens!
6464

6565
```kotlin
6666
val tokenMatches: Sequence<TokenMatch> = tokenizer.tokenize("hello, world") // Support other types of input as well.
6767
```
6868

69-
> A more convenient way of defining tokens and creating a tokenizer is described in the **Grammar** section.
69+
> A more convenient way of defining tokens is described in the [**Grammar**](#grammar) section.
70+
71+
It is possible to provide a custom implementation of a `Tokenizer`.
7072

7173
## Parser ##
7274

@@ -86,8 +88,8 @@ with the match of this token itself _(possibly, skipping some **ignored** tokens
8688
_(and, possibly, some ignored tokens)_ from the remainder.
8789

8890
```kotlin
89-
val a = Token(name = "a", pattern = "a+")
90-
val b = Token(name = "b", pattern = "b+")
91+
val a = Token("a+")
92+
val b = Token("b+")
9193
val tokenMatches = Lexer(listOf(a, b)).tokenize("aabbaaa")
9294
val result = a.tryParse(tokenMatches) // contains the match for "aa" and the remainder with "bbaaa" in it
9395
```
@@ -103,7 +105,7 @@ There are several kinds of combinators included in `better-parse`:
103105
The error results are returned unchanged.
104106

105107
```kotlin
106-
val id = Token("identifier", pattern = "\\w+")
108+
val id = Token("\\w+")
107109
val aText = a map { it.text } // Parser<String>, returns the matched text from the input sequence
108110
```
109111

@@ -156,6 +158,8 @@ There are several kinds of combinators included in `better-parse`:
156158
157159
* ```val fCall = id and lpar and id and rpar use { FunctionCall(t1, t3) }```
158160
161+
* ```val fCall = id * -lpar * id * -rpar use { FunctionCall(t1, t2) }``` (see operators below)
162+
159163
> There are `Tuple` classes up to `Tuple16` and the corresponding `and` overloads.
160164
161165
##### Operators
@@ -177,7 +181,7 @@ There are several kinds of combinators included in `better-parse`:
177181
The result type for the combined parsers is the least common supertype (which is possibly `Any`).
178182
179183
```kotlin
180-
val expr = const or var or fCall
184+
val expr = const or variable or fCall
181185
```
182186
183187
* `zeroOrMore(...)`, `oneOrMore(...)`, `N times`, `N timesOrMore`, `N..M times`
@@ -211,8 +215,11 @@ There are several kinds of combinators included in `better-parse`:
211215
212216
# Grammar
213217
214-
As a convenient way of defining a grammar of a language, there is an abstract class `Grammar`, that collects the `by token(...)`-delegated
215-
properties into a `Lexer` automatically, and also behaves as a composition of the `Lexer` and the `rootParser`.
218+
As a convenient way of defining a grammar of a language, there is an abstract class `Grammar`, that collects the `by`-delegated
219+
properties into a `Tokenizer` automatically, and also behaves as a composition of the `Lexer` and the `rootParser`.
220+
221+
*Note:* a `Grammar` also collects `by`-delegated `Parser<T>` properties so that they can be accessed as
222+
`declaredParsers` along with the tokens. As a good style, declare the parsers inside a `Grammar` by delegation as well.
216223
217224
```kotlin
218225
interface Item
@@ -241,7 +248,48 @@ val term by
241248
variableParser or
242249
(-lpar and parser(this::term) and -rpar)
243250
```
244-
251+
252+
A `Grammar` implementation can override the `tokenizer` property to provide a custom implementation of `Tokenizer`.
253+
254+
# Syntax trees
255+
256+
A `Parser<T>` can be converted to another `Parser<SyntaxTree<T>>`, where a `SyntaxTree<T>`, along with the parsed `T`
257+
contains the children syntax trees, the reference to the parser and the positions in the input sequence.
258+
This can be done with `parser.liftToSyntaxTreeParser()`.
259+
260+
This can be used for syntax highlighting and inspecting the resulting tree in case the parsed result
261+
does not contain the full syntactic structure.
262+
263+
For convenience, a `Grammar` can also be lifted to that parsing a `SyntaxTree` with
264+
`grammar.liftToSyntaxTreeGrammar()`.
265+
266+
```kotlin
267+
val treeGrammar = booleanGrammar.liftToSyntaxTreeGrammar()
268+
val tree = treeGrammar.parseToEnd("a & !b | c -> d")
269+
assertTrue(tree.parser == booleanGrammar.implChain)
270+
val firstChild = tree.children.first()
271+
assertTrue(firstChild.parser == booleanGrammar.orChain)
272+
assertTrue(firstChild.range == 0..9)
273+
```
274+
275+
There are optional arguments for customizing the transformation:
276+
277+
* `LiftToSyntaxTreeOptions`
278+
* `retainSkipped` -- whether the resulting syntax tree should include skipped `and` components;
279+
* `retainSeparators` -- whether the `Separated` combinator parsed separators should be included;
280+
* `structureParsers` -- defines the parsers that are retained in the syntax tree; the nodes with parsers that are
281+
not in this set are flattened so that their children are attached to their parents in their place.
282+
283+
For `Parser<T>`, the default is `null`, which means no nodes are flattened.
284+
285+
In case of `Grammar<T>`, `structureParsers` defaults to the grammar's `declaredParsers`.
286+
287+
* `transformer` -- a strategy to transform non-built-in parsers. If you define your own combinators and want them
288+
to be lifted to syntax tree parsers, pass a `LiftToSyntaxTreeTransformer` that will be called on the parsers. When
289+
a custom combinator nests another parser, a trnsformer implementation should call `default.transform(...)` on that parser.
290+
291+
See [`SyntaxTreeDemo.kt`](https://github.com/h0tk3y/better-parse/blob/master/demo/src/main/kotlin/com/example/SyntaxTreeDemo.kt) for an example of working with syntax trees.
292+
245293
# Examples
246294

247295
* A boolean expressions parser that constructs a simple AST: [`BooleanExpression.kt`](https://github.com/h0tk3y/better-parse/blob/master/demo/src/main/kotlin/com/example/BooleanExpression.kt)

build.gradle

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
group 'com.github.h0tk3y.betterParse'
2-
version '0.2.1'
2+
version '0.3.0'
33

44
buildscript {
55
ext.kotlin_version = '1.1.51'

demo/src/main/kotlin/com/example/SyntaxTreeDemo.kt

Lines changed: 14 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -17,29 +17,30 @@ fun main(args: Array<String>) {
1717
if (result.isNullOrBlank()) null else result
1818
}
1919

20-
(exprs.asSequence() + readExprSequence).forEach { printParseTree(it); println("\n") }
20+
(exprs.asSequence() + readExprSequence).forEach { parseAndPrintTree(it); println("\n") }
2121
}
2222

2323
val booleanSyntaxTreeGrammar = BooleanGrammar.liftToSyntaxTreeGrammar()
2424

25-
fun printParseTree(expr: String) {
25+
fun parseAndPrintTree(expr: String) {
2626
println(expr)
2727

2828
val result = booleanSyntaxTreeGrammar.tryParseToEnd(expr)
2929

3030
when (result) {
3131
is ErrorResult -> println("Could not parse expression: $result")
32-
is Parsed<SyntaxTree<BooleanExpression>> -> {
33-
val tree = result.value
34-
var currentLayer: List<SyntaxTree<*>> = listOf(tree)
35-
while (currentLayer.isNotEmpty()) {
36-
val underscores = currentLayer.flatMap { t -> t.range.map { index -> index to charByTree(t) } }.toMap()
37-
val underscoreStr = expr.indices.map { underscores[it] ?: ' ' }.joinToString("")
38-
println(underscoreStr)
39-
40-
currentLayer = currentLayer.flatMap { it.children }
41-
}
42-
}
32+
is Parsed<SyntaxTree<BooleanExpression>> -> printSyntaxTree(expr, result.value)
33+
}
34+
}
35+
36+
fun printSyntaxTree(expr: String, syntaxTree: SyntaxTree<*>) {
37+
var currentLayer: List<SyntaxTree<*>> = listOf(syntaxTree)
38+
while (currentLayer.isNotEmpty()) {
39+
val underscores = currentLayer.flatMap { t -> t.range.map { index -> index to charByTree(t) } }.toMap()
40+
val underscoreStr = expr.indices.map { underscores[it] ?: ' ' }.joinToString("")
41+
println(underscoreStr)
42+
43+
currentLayer = currentLayer.flatMap { it.children }
4344
}
4445
}
4546

src/main/kotlin/com/github/h0tk3y/betterParse/st/LiftToSyntaxTree.kt

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -35,23 +35,23 @@ data class LiftToSyntaxTreeOptions(
3535
* empty set to retain all nodes. */
3636
fun <T> Parser<T>.liftToSyntaxTreeParser(
3737
liftOptions: LiftToSyntaxTreeOptions = LiftToSyntaxTreeOptions(),
38-
transformer: LiftToSyntaxTreeTransformer? = null,
39-
structureParsers: Set<Parser<*>> = emptySet()
38+
structureParsers: Set<Parser<*>>? = null,
39+
transformer: LiftToSyntaxTreeTransformer? = null
4040
): Parser<SyntaxTree<T>> {
4141
val astParser = ParserToSyntaxTreeLifter(liftOptions, transformer).lift(this)
42-
return if (structureParsers.isEmpty())
42+
return if (structureParsers == null)
4343
astParser else
4444
astParser.flattened(structureParsers)
4545
}
4646

4747
/** Converts a [Grammar] so that its [Grammar.rootParser] parses a [SyntaxTree]. See: [liftToSyntaxTreeParser]. */
4848
fun <T> Grammar<T>.liftToSyntaxTreeGrammar(
4949
liftOptions: LiftToSyntaxTreeOptions = LiftToSyntaxTreeOptions(),
50-
tranformer: LiftToSyntaxTreeTransformer? = null,
51-
structureParsers: Set<Parser<*>> = declaredParsers
50+
structureParsers: Set<Parser<*>> = declaredParsers,
51+
transformer: LiftToSyntaxTreeTransformer? = null
5252
) = object : Grammar<SyntaxTree<T>>() {
5353
override val rootParser: Parser<SyntaxTree<T>> = this@liftToSyntaxTreeGrammar.rootParser
54-
.liftToSyntaxTreeParser(liftOptions, tranformer, structureParsers)
54+
.liftToSyntaxTreeParser(liftOptions, structureParsers, transformer)
5555

5656
override val tokens: List<Token> get() = this@liftToSyntaxTreeGrammar.tokens
5757
override val declaredParsers: Set<Parser<Any?>> = this@liftToSyntaxTreeGrammar.declaredParsers

src/test/kotlin/TestLiftToAst.kt

Lines changed: 2 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,6 @@ import com.github.h0tk3y.betterParse.combinators.*
22
import com.github.h0tk3y.betterParse.grammar.Grammar
33
import com.github.h0tk3y.betterParse.grammar.parseToEnd
44
import com.github.h0tk3y.betterParse.grammar.parser
5-
import com.github.h0tk3y.betterParse.grammar.token
65
import com.github.h0tk3y.betterParse.lexer.TokenMatch
76
import com.github.h0tk3y.betterParse.parser.*
87
import com.github.h0tk3y.betterParse.st.*
@@ -181,8 +180,8 @@ class TestLiftToAst {
181180
val parser = ForcedDuplicate(listOf(booleanGrammar.and, booleanGrammar.or, booleanGrammar.impl))
182181

183182
val lifted = parser.liftToSyntaxTreeParser(
184-
transformer = transformer,
185-
structureParsers = booleanGrammar.declaredParsers
183+
structureParsers = booleanGrammar.declaredParsers,
184+
transformer = transformer
186185
)
187186

188187
val result = lifted.tryParse(booleanGrammar.tokenizer.tokenize("||"))

0 commit comments

Comments
 (0)