@@ -9,145 +9,197 @@ vignette: >
9
9
%\VignetteEncoding{UTF-8}
10
10
---
11
11
12
- This vignette illustrates how the core of ` styler ` currently^[ at commit ` e6ddee0f510d3c9e3e22ef68586068fa5c6bc140 ` ] works, i.e. how
13
- rules are applied to a parse table and how limitations of this approach can be
14
- overcome with a refined approach.
15
-
16
- ## Status quo - the flat approach
17
-
18
- Roughly speaking, a string containing code to be formatted is parsed with ` parse `
19
- and the output is passed to ` getParseData ` in order to obtain a parse
20
- table with detailed information about every token. For a simple example string
21
- "` a <- function(x) { if(x > 1) { 1+1 } else {x} } ` " to be formatted, the parse
22
- table on which ` styler ` performs the manipulations looks similar to the one
23
- presented below.
24
-
25
- ``` {r, message = FALSE}
26
- library("styler")
27
- library("dplyr")
28
-
29
- code <- "a <- function(x) { if(x > 1) { 1+1 } else {x} }"
30
-
31
- (parse_table <- styler:::compute_parse_data_flat_enhanced(code))
32
- ```
33
- The column ` spaces ` was computed from the columns ` col1 ` and ` col2 ` , ` newlines `
34
- was computed from ` line1 ` and ` line2 ` respectively.
35
-
36
- So far, styler can set the spaces around the operators correctly. In our example,
37
- that involves adding spaces around ` + ` , so in the ` spaces ` column, element nine
38
- and ten must be set to one. This means that a space is added after ` 1 ` and after ` + ` .
39
- To get the spacing right and cover the various cases, a set of functions has to
40
- be applied to the parse table subsequently (and in the right order),
41
- which is essentially done via ` Reduce() ` .
42
- After all modifications on the table are completed, ` serialize_parse_data() `
43
- collapses the ` text ` column and adds the number of spaces and
44
- line breaks specified in ` spaces ` and ` newlines ` in between the elements of
45
- ` text ` . If we serialize our table and don't perform any modification, we
46
- obviously just get back what we started with.
47
- ``` {r}
48
- styler:::serialize_parse_data_flat(parse_table)
49
- ```
50
-
51
- ## Refining the flat approach - nesting the parse table
52
-
53
- Although the flat approach is good place to start, e.g. for fixing spaces
54
- between operators, it has its limitations. In particular, it treats each token
55
- the same way in the sense that it does not account for the context of the token,
56
- i.e. in which sub-expression it appears.
57
- To set the indention correctly, we need a hierarchical view on the parse data,
58
- since all tokens in a sub-expression have the same indention level. Hence,
59
- a natural approach would be to create a nested parse table instead of a flat
60
- parse table and then take a recursion over all elements in the table, so for
61
- each sub(-sub etc.)-expression, a separate parse table would be created and the
62
- modifications would be applied to this table before putting everything back
63
- together. A function to create a nested parse table already exists in ` styler ` .
64
- Let's have a look at the top level:
65
-
66
- ``` {r}
67
- (l1 <- styler:::compute_parse_data_nested(code)[-1])
68
-
69
- ```
70
-
71
- The tibble contains the column ` child ` , which itself contains a tibble.
72
- If we "enter" the first child, we can see that the expression was split up
73
- further.
74
-
75
- ``` {r}
76
- l1$child[[1]] %>%
77
- select(text, terminal, child, token)
78
- ```
12
+ This vignette illustrates how the core of ` styler ` currently[ 1] works,
13
+ i.e. how rules are applied to a parse table and how limitations of this
14
+ approach can be overcome with a refined approach.
15
+
16
+ Status quo - the flat approach
17
+ ------------------------------
18
+
19
+ Roughly speaking, a string containing code to be formatted is parsed
20
+ with ` parse ` and the output is passed to ` getParseData ` in order to
21
+ obtain a parse table with detailed information about every token. For a
22
+ simple example string
23
+ "` a <- function(x) { if(x > 1) { 1+1 } else {x} } ` " to be formatted, the
24
+ parse table on which ` styler ` performs the manipulations looks similar
25
+ to the one presented below.
26
+
27
+ library("styler")
28
+ library("dplyr")
29
+
30
+ code <- "a <- function(x) { if(x > 1) { 1+1 } else {x} }"
31
+
32
+ (parse_table <- styler:::compute_parse_data_flat_enhanced(code))
33
+
34
+ ## # A tibble: 24 x 14
35
+ ## line1 col1 line2 col2 token text terminal short newlines
36
+ ## <int> <int> <int> <int> <chr> <chr> <lgl> <chr> <int>
37
+ ## 1 1 0 1 0 START NA <NA> 0
38
+ ## 2 1 1 1 1 SYMBOL a TRUE a 0
39
+ ## 3 1 3 1 4 LEFT_ASSIGN <- TRUE <- 0
40
+ ## 4 1 6 1 13 FUNCTION function TRUE funct 0
41
+ ## 5 1 14 1 14 '(' ( TRUE ( 0
42
+ ## 6 1 15 1 15 SYMBOL_FORMALS x TRUE x 0
43
+ ## 7 1 16 1 16 ')' ) TRUE ) 0
44
+ ## 8 1 18 1 18 '{' { TRUE { 0
45
+ ## 9 1 20 1 21 IF if TRUE if 0
46
+ ## 10 1 22 1 22 '(' ( TRUE ( 0
47
+ ## # ... with 14 more rows, and 5 more variables: lag_newlines <int>,
48
+ ## # spaces <int>, multi_line <lgl>, indention_ref_id <lgl>, indent <dbl>
49
+
50
+ The column ` spaces ` was computed from the columns ` col1 ` and ` col2 ` ,
51
+ ` newlines ` was computed from ` line1 ` and ` line2 ` respectively.
52
+
53
+ So far, styler can set the spaces around the operators correctly. In our
54
+ example, that involves adding spaces around ` + ` , so in the ` spaces `
55
+ column, element nine and ten must be set to one. This means that a space
56
+ is added after ` 1 ` and after ` + ` . To get the spacing right and cover the
57
+ various cases, a set of functions has to be applied to the parse table
58
+ subsequently (and in the right order), which is essentially done via
59
+ ` Reduce() ` . After all modifications on the table are completed,
60
+ ` serialize_parse_data() ` collapses the ` text ` column and adds the number
61
+ of spaces and line breaks specified in ` spaces ` and ` newlines ` in
62
+ between the elements of ` text ` . If we serialize our table and don't
63
+ perform any modification, we obviously just get back what we started
64
+ with.
65
+
66
+ styler:::serialize_parse_data_flat(parse_table)
67
+
68
+ ## [1] "a <- function(x) { if(x > 1) { 1+1 } else {x} }"
69
+
70
+ Refining the flat approach - nesting the parse table
71
+ ----------------------------------------------------
72
+
73
+ Although the flat approach is good place to start, e.g. for fixing
74
+ spaces between operators, it has its limitations. In particular, it
75
+ treats each token the same way in the sense that it does not account for
76
+ the context of the token, i.e. in which sub-expression it appears. To
77
+ set the indention correctly, we need a hierarchical view on the parse
78
+ data, since all tokens in a sub-expression have the same indention
79
+ level. Hence, a natural approach would be to create a nested parse table
80
+ instead of a flat parse table and then take a recursion over all
81
+ elements in the table, so for each sub(-sub etc.)-expression, a separate
82
+ parse table would be created and the modifications would be applied to
83
+ this table before putting everything back together. A function to create
84
+ a nested parse table already exists in ` styler ` . Let's have a look at
85
+ the top level:
86
+
87
+ (l1 <- styler:::compute_parse_data_nested(code)[-1])
88
+
89
+ ## # A tibble: 1 x 13
90
+ ## col1 line2 col2 id parent token terminal text short token_before
91
+ ## <int> <int> <int> <int> <int> <chr> <lgl> <chr> <chr> <chr>
92
+ ## 1 1 1 47 49 0 expr FALSE <NA>
93
+ ## # ... with 3 more variables: token_after <chr>, internal <lgl>,
94
+ ## # child <list>
95
+
96
+ The tibble contains the column ` child ` , which itself contains a tibble.
97
+ If we "enter" the first child, we can see that the expression was split
98
+ up further.
99
+
100
+ l1$child[[1]] %>%
101
+ select(text, terminal, child, token)
102
+
103
+ ## # A tibble: 3 x 4
104
+ ## text terminal child token
105
+ ## <chr> <lgl> <list> <chr>
106
+ ## 1 FALSE <tibble [1 x 14]> expr
107
+ ## 2 <- TRUE <NULL> LEFT_ASSIGN
108
+ ## 3 FALSE <tibble [5 x 14]> expr
79
109
80
110
And further...
81
- ``` {r}
82
- l1$child[[1]]$child[[3]]$child[[5]]
83
- ```
84
-
85
- ... and so on. Every child that is not a terminal contains another tibble where
86
- the sub-expression is split up further - until we are left with tibbles that
87
- only contain terminals.
88
-
89
-
90
- Recall the above example. ` a <- function(x) { if(x > 1) { 1+1 } else {x} } ` .
91
- In the last printed parse table, we can see that see that the whole if condition
92
- is a sub-expression of ` code ` , surrounded by two curly brackets. Hence,
93
- one would like to set the indention level for this sub-expression before
94
- doing anything with it in more detail. Later, when we progressed deeper into
95
- the nested table, we hit a similar pattern:
96
-
97
- ``` {r}
98
- l1$child[[1]]$child[[3]]$child[[5]]$child[[2]]$child[[5]]
99
- ```
100
- Again, we have two curly brackets and an expression inside. We would like to
101
- set the indention level for the expression ` 1+1 ` in the same way as for the
102
- whole if condition.
103
-
104
- The simple example above makes it evident that a recursive approach to this
105
- problem would be the most natural.
106
-
107
- The code for a function that kind of sketches the idea and illustrates such a
108
- recursion is given below.
109
-
110
- It takes a nested parse table as input and then does the recursion over all
111
- children. If the child is a terminal, it returns the text, otherwise,
112
- it "enters" the child to find the terminals inside of the child and returns them.
113
-
114
- ``` {r}
115
- serialize <- function(x) {
116
- out <- Map(
117
- function(terminal, text, child) {
118
- if (terminal)
119
- text
120
- else
121
- serialize(child)
122
- },
123
- x$terminal, x$text, x$child
124
- )
125
- out
126
- }
127
-
128
- x <- styler:::compute_parse_data_nested(code)
129
- serialize(x) %>% unlist
130
- ```
131
-
132
- How to exactly implement a similar recursion to not just return each text
133
- token separately, but
134
- the styled text as one string (or one string per line) is subject to future work,
135
- so would be the functions to be
136
- applied to a sub-expression parse table that create correct indention.
137
- Similar to ` compute_parse_data_flat_enhanced ` , the column ` spaces ` and ` newlines `
138
- would be required to be computed by ` compute_parse_data_nested ` as well as a
139
- new column ` indention ` .
140
-
141
-
142
- ## Final Remarks
143
-
144
- Although a flat structure would possibly also allow us to solve the problem of
145
- indention, it is a less elegant and flexible solution to the problem. It would
146
- involve looking for an opening curly bracket in the parse table, set the
147
- indention level for all subsequent rows in the parse table until the next
148
- opening or closing curly bracket is hit and then intending one level further or
149
- setting indention back to where it was at the beginning of the table.
150
-
151
- Note that the vignette just addressed the question of indention caused by
152
- curly brackets and has not dealt with other operators that would trigger
153
- indention, such as ` ( ` or ` + ` .
111
+
112
+ l1$child[[1]]$child[[3]]$child[[5]]
113
+
114
+ ## # A tibble: 3 x 14
115
+ ## line1 col1 line2 col2 id parent token terminal text short
116
+ ## <int> <int> <int> <int> <int> <int> <chr> <lgl> <chr> <chr>
117
+ ## 1 1 18 1 18 9 45 '{' TRUE { {
118
+ ## 2 1 20 1 45 42 45 expr FALSE
119
+ ## 3 1 47 1 47 40 45 '}' TRUE } }
120
+ ## # ... with 4 more variables: token_before <chr>, token_after <chr>,
121
+ ## # internal <lgl>, child <list>
122
+
123
+ ... and so on. Every child that is not a terminal contains another
124
+ tibble where the sub-expression is split up further - until we are left
125
+ with tibbles that only contain terminals.
126
+
127
+ Recall the above example.
128
+ ` a <- function(x) { if(x > 1) { 1+1 } else {x} } ` . In the last printed
129
+ parse table, we can see that see that the whole if condition is a
130
+ sub-expression of ` code ` , surrounded by two curly brackets. Hence, one
131
+ would like to set the indention level for this sub-expression before
132
+ doing anything with it in more detail. Later, when we progressed deeper
133
+ into the nested table, we hit a similar pattern:
134
+
135
+ l1$child[[1]]$child[[3]]$child[[5]]$child[[2]]$child[[5]]
136
+
137
+ ## # A tibble: 3 x 14
138
+ ## line1 col1 line2 col2 id parent token terminal text short
139
+ ## <int> <int> <int> <int> <int> <int> <chr> <lgl> <chr> <chr>
140
+ ## 1 1 30 1 30 20 30 '{' TRUE { {
141
+ ## 2 1 32 1 34 27 30 expr FALSE
142
+ ## 3 1 36 1 36 26 30 '}' TRUE } }
143
+ ## # ... with 4 more variables: token_before <chr>, token_after <chr>,
144
+ ## # internal <lgl>, child <list>
145
+
146
+ Again, we have two curly brackets and an expression inside. We would
147
+ like to set the indention level for the expression ` 1+1 ` in the same way
148
+ as for the whole if condition.
149
+
150
+ The simple example above makes it evident that a recursive approach to
151
+ this problem would be the most natural.
152
+
153
+ The code for a function that kind of sketches the idea and illustrates
154
+ such a recursion is given below.
155
+
156
+ It takes a nested parse table as input and then does the recursion over
157
+ all children. If the child is a terminal, it returns the text,
158
+ otherwise, it "enters" the child to find the terminals inside of the
159
+ child and returns them.
160
+
161
+ serialize <- function(x) {
162
+ out <- Map(
163
+ function(terminal, text, child) {
164
+ if (terminal)
165
+ text
166
+ else
167
+ serialize(child)
168
+ },
169
+ x$terminal, x$text, x$child
170
+ )
171
+ out
172
+ }
173
+
174
+ x <- styler:::compute_parse_data_nested(code)
175
+ serialize(x) %>% unlist
176
+
177
+ ## [1] "a" "<-" "function" "(" "x" ")"
178
+ ## [7] "{" "if" "(" "x" ">" "1"
179
+ ## [13] ")" "{" "1" "+" "1" "}"
180
+ ## [19] "else" "{" "x" "}" "}"
181
+
182
+ How to exactly implement a similar recursion to not just return each
183
+ text token separately, but the styled text as one string (or one string
184
+ per line) is subject to future work, so would be the functions to be
185
+ applied to a sub-expression parse table that create correct indention.
186
+ Similar to ` compute_parse_data_flat_enhanced ` , the column ` spaces ` and
187
+ ` newlines ` would be required to be computed by
188
+ ` compute_parse_data_nested ` as well as a new column ` indention ` .
189
+
190
+ Final Remarks
191
+ -------------
192
+
193
+ Although a flat structure would possibly also allow us to solve the
194
+ problem of indention, it is a less elegant and flexible solution to the
195
+ problem. It would involve looking for an opening curly bracket in the
196
+ parse table, set the indention level for all subsequent rows in the
197
+ parse table until the next opening or closing curly bracket is hit and
198
+ then intending one level further or setting indention back to where it
199
+ was at the beginning of the table.
200
+
201
+ Note that the vignette just addressed the question of indention caused
202
+ by curly brackets and has not dealt with other operators that would
203
+ trigger indention, such as ` ( ` or ` + ` .
204
+
205
+ [ 1] at commit ` e6ddee0f510d3c9e3e22ef68586068fa5c6bc140 `
0 commit comments