Skip to content

Commit 096851d

Browse files
authored
Merge pull request #146 from lorenzwalthert/customizing_vignette
- Vignette on customizing styler (#145).
2 parents 1b2573e + 60806e5 commit 096851d

File tree

1 file changed

+279
-0
lines changed

1 file changed

+279
-0
lines changed

vignettes/customizing_styler.Rmd

Lines changed: 279 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,279 @@
1+
---
2+
title: "Customizing styler"
3+
author: "Lorenz Walthert"
4+
date: "8/10/2017"
5+
output: rmarkdown::html_vignette
6+
vignette: >
7+
%\VignetteIndexEntry{Customizing styler}
8+
%\VignetteEngine{knitr::rmarkdown}
9+
%\VignetteEncoding{UTF-8}
10+
---
11+
12+
This vignette gives a high-level overview about how styler works and how you
13+
can define your own style guide and format code according to it.
14+
15+
# How styler works
16+
17+
There are three major steps that styler performs in order to style code:
18+
19+
1. Create a abstract syntax tree (AST) from `utils::getParseData()` that
20+
contains positional information of every token. We call
21+
this a nested parse table. You can learn more about how that is done
22+
exactly in the vignettes "Data Structures" and "Manipulating the nested
23+
parse table".
24+
2. Apply transformer functions at each level of the nested parse table. We
25+
use a visitor approach, i.e. a function that takes functions as arguments and
26+
applies them to every level of nesting. You can find out more about it on
27+
the help file for `visit`. Note that the function is not exported by styler.
28+
The visitor will take care of applying the functions on every
29+
level of nesting - and we can supply transformer functions that operate on
30+
one level of nesting. In the sequel, we use the term *nest* to refer to
31+
such a parse table at one level of nesting. A *nest* always represents a
32+
complete expression. Before we apply the transformers, we have to initialize
33+
two columns `lag_newlines` and `spaces`, which contain
34+
the number of line breaks before the token and the number of spaces
35+
after the token. These will be the columns that most of our transformer
36+
functions will modify.
37+
3. Serialize the nested parse table, that is, extract the terminal tokens from
38+
the nested parse table and add spaces and line breaks between them as
39+
specified in the nested parse table.
40+
41+
The `transformers` argument is, apart from the code to style, the key argument
42+
of functions such as `style_text()` and friends. By default, it is created
43+
via the argument `style`. The transformers are a named
44+
list of transformer functions and other arguments passed to styler. To use the
45+
default style guide of styler (the tidyverse style guide), call
46+
`tidyverse_style()` to get the list of the transformer functions. Let's quickly
47+
look at what those are.
48+
```{r, message = FALSE}
49+
library("styler")
50+
library("dplyr")
51+
names(tidyverse_style())
52+
str(tidyverse_style(), give.attr = FALSE, list.len = 3)
53+
```
54+
55+
We note that there are different types of transformer functions. `filler` is
56+
initializing some variables in the nested parse table (so it is not actually a
57+
transformer), the other elements modify either spacing, line break or tokens.
58+
`use_raw_indention` is not a function, it is just an option. All transformer
59+
functions have a similar structure. Let's pick one and look at it:
60+
```{r}
61+
tidyverse_style()$space$remove_space_after_opening_paren
62+
```
63+
64+
As the name says, this function removes spaces after the opening parenthesis. But
65+
how?
66+
Its input is a *nest*. Since the visitor will go through all levels of nesting,
67+
we just need a function that can be applied to a *nest*, that is, to a parse
68+
table at one level of nesting.
69+
We can compute the nested parse table and look at one of the levels of nesting
70+
that is interesting for us (more on the data structure in the vignettes
71+
"Data structures" and "Manipulating the parse table"):
72+
73+
```{r}
74+
string_to_format <- "call( 3)"
75+
pd <- styler:::compute_parse_data_nested(string_to_format) %>%
76+
styler:::pre_visit(c(styler:::create_filler))
77+
pd$child[[1]] %>%
78+
select(token, terminal, text, newlines, spaces)
79+
```
80+
81+
`create_filler()` is called to initialize some variables, it does not actually
82+
transform the parse table.
83+
84+
All the function `remove_space_after_opening_paren()` now does is looking for
85+
the opening bracket and setting the column `spaces` of the token to zero. Note
86+
that it is very important to check whether there is also a line break
87+
following after that token. If so, `spaces` should not be touched because of
88+
the way `spaces` and `newlines` are defined. `spaes` are the number of spaces
89+
after a token and `newlines`. Hence, if a line break follows, spaces are not
90+
EOL spaces, but rather the spaces directly before the next token. If there is
91+
a line break after the token and the value of `use_raw_indention` is set to
92+
`TRUE` (which means indention is not touched) and the rule would not check for
93+
that, indention for the token following `(` would be removed, which we don't
94+
want.
95+
If we apply the rule to our parse table, we can see that the column `spaes`
96+
changes and is now zero for all tokens:
97+
98+
```{r}
99+
styler:::remove_space_after_opening_paren(pd$child[[1]]) %>%
100+
select(token, terminal, text, newlines, spaces)
101+
```
102+
103+
All top-level styling functions have an argument `style` (which defaults
104+
to `tidyverse_style`). If you check out the help file, you can see that the
105+
argument `style` is only used to create the default argument `transformers`,
106+
which defaults to `style(...)`. This allows to specify options of the styling
107+
without specifying them inside the function passed to `transformers`.
108+
109+
Let's clarify that with an example. The following yields the same result:
110+
```{r}
111+
all.equal(
112+
style_text(string_to_format, transformers = tidyverse_style(strict = FALSE)),
113+
style_text(string_to_format, style = tidyverse_style, strict = FALSE),
114+
style_text(string_to_format, strict = FALSE),
115+
)
116+
```
117+
118+
Now let's do the whole styling of a string with just this
119+
one transformer introduced above. We do this by first creating a style guide
120+
with the designated wrapper function `create_style_guide()`.
121+
It takes transformer functions as input and returns them in a named list that
122+
meets the formal requirements for styling functions.
123+
124+
```{r}
125+
space_after_opening_style <- function(are_you_sure) {
126+
create_style_guide(space = if (are_you_sure) styler:::remove_space_after_opening_paren)
127+
}
128+
style_text("call( 1,1)", style = space_after_opening_style, are_you_sure = FALSE)
129+
```
130+
131+
Well, we probably want:
132+
133+
```{r}
134+
style_text("call( 1,1)", style = space_after_opening_style, are_you_sure = TRUE)
135+
```
136+
137+
Note that the return value of your `style` function may not contain `NULL`
138+
elements.
139+
140+
I hope you have acquired a basic understanding of how styler transforms code.
141+
You can provide your own transformer functions and use `create_style_guide()`
142+
to create customized code styling. If you do so, there are a few more things you
143+
should be aware of, which are described in the next section.
144+
145+
# Implementation details
146+
147+
For both spaces and line break information in the nested parse table, we use
148+
four attributes in total: `lag_newlines`, `newlines`, `spaces`, `lag_spaces`.
149+
`lag_spaces` is created from `spaces` only just before the parse table is
150+
serialized, so it is not relevant for manipulating the parse table as
151+
described above. These columns are to some degree redundant, but with just lag
152+
or lead, we would lose information on the first or the last element
153+
respectively, so we need both.
154+
155+
The sequence in which styler applies rules on each level of nesting
156+
is given in the list below:
157+
158+
* call `create_filler()` to initialize some variables.
159+
* modify the line breaks (modifying `lag_newlines` only based on
160+
`token`, `token_before`, `token_after` and `text`).
161+
* modify the spaces (modifying `spaces` only based on `lag_newlines`,
162+
`newlines`, `multi_line`, `token`, `token_before`, `token_after` and `text`).
163+
* modify the tokens (based on `newlines` `lag_newlines`, `spaces` `multi_line`,
164+
`token`, `token_before`, `token_after` and `text`).
165+
* modify the indention by changing `indention_ref_id` (based on `newlines`
166+
`lag_newlines`, `spaces` `multi_line`, `token`, `token_before`, `token_after`
167+
and `text`).
168+
169+
You can also look it up in the function that applies the transformers:
170+
`apply_transformers()`:
171+
```{r}
172+
styler:::apply_transformers
173+
```
174+
175+
This means that the order of the styling is clearly defined and it is for
176+
example not possible to modify line breaks based on spacing, because spacing
177+
will be set after line breaks are set. Do not rely on the column `col1`,
178+
`col2`, `line1` and `line2` in the parse table in any of your function since
179+
these columns do only reflect the position of tokens at the point of parsing,
180+
i.e. they are not kept up to date through the process of styling.
181+
182+
Also, as indicated above, work with `lag_nelwines` only in your line break
183+
rules. For development purposes, you also may want to use the unexported
184+
function `test_collection()` to help you with testing your style guide. You can
185+
find more information in the help file of the function.
186+
187+
If you write functions that modify spaces, don't forget make sure
188+
you don't modify EOL spacing, since that is needed for `use_raw_indention`, as
189+
indicated in the previous paragraph.
190+
191+
Finally, take note of the naming convention. All function names starting with
192+
`set-*` correspond to the `strict` option, that is, setting some value to an
193+
exact number. `add-*` Is softer: For example, `add_spaces_around_op()`, only
194+
makes sure that there is at least one space around operators, but if the
195+
code to style contains multiple, the transformer will not change that.
196+
197+
# Showcasing the development of a styling rule
198+
199+
For illustrative purposes, we create a new style guide that has one rule only:
200+
Curly braces are always on a new line. So for example,
201+
```{r}
202+
add_one <- function(x) {
203+
x + 1
204+
}
205+
```
206+
207+
Should be transformed to
208+
209+
```{r}
210+
add_one <- function(x)
211+
{
212+
x + 1
213+
}
214+
```
215+
216+
We first need to get familiar with the structure of the nested parse table.
217+
Note that the structure of the nested parse table is not affected by the
218+
position of line breaks and spaces.
219+
Let's first create the nested parse table.
220+
```{r}
221+
code <- c("add_one <- function(x) { x + 1 }")
222+
styler:::create_tree(code)
223+
pd <- styler:::compute_parse_data_nested(code)
224+
```
225+
The token of interest here has id number 10. Let's navigate there. Since
226+
line break rules manipulate the lags *before* the token, we need to change
227+
`lag_newlines` at the token "'{'".
228+
```{r}
229+
pd$child[[1]]$child[[3]]$child[[5]]
230+
```
231+
232+
Remember what we said above: A transformer takes a flat parse table as input,
233+
updates it and returns it. So here it's actually simple:
234+
```{r}
235+
set_line_break_before_curly_opening <- function(pd_flat) {
236+
op <- pd_flat$token %in% "'{'"
237+
pd_flat$lag_newlines[op] <- 1L
238+
pd_flat
239+
}
240+
```
241+
242+
Almost done. Now, the last thing we need to do is to use `create_style_guide()`
243+
to create our style guide consisting of that function.
244+
```{r}
245+
set_line_break_before_curly_opening_style <- function() {
246+
create_style_guide(line_break = set_line_break_before_curly_opening)
247+
}
248+
```
249+
250+
Now you can style your string according to it.
251+
```{r}
252+
style_text(code, style = set_line_break_before_curly_opening_style)
253+
```
254+
255+
Note that when removing line breaks, always take care of comments, since you
256+
don't want
257+
```{r, eval = FALSE}
258+
a <- function() # comments should remain EOL
259+
{
260+
3
261+
}
262+
```
263+
264+
to become
265+
266+
```{r, eval = FALSE}
267+
a <- function() # comments should remain EOL {
268+
3
269+
}
270+
```
271+
272+
The easiest way of taking care of that is not applying the rule if there is a
273+
comment before the token of interest, which can be checked for within your
274+
transformer function. The transformer function from the tidyverse style that
275+
removes line breaks before the curly opening bracket looks as follows:
276+
```{r}
277+
styler:::remove_line_break_before_curly_opening
278+
```
279+

0 commit comments

Comments
 (0)