Skip to content

Commit 0d94260

Browse files
draft on customizing styler vignette
1 parent c9339ba commit 0d94260

File tree

1 file changed

+268
-0
lines changed

1 file changed

+268
-0
lines changed

vignettes/customizing_styler.Rmd

Lines changed: 268 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,268 @@
1+
---
2+
title: "Customizing styler"
3+
author: "Lorenz Walthert"
4+
date: "8/10/2017"
5+
output: rmarkdown::html_vignette
6+
vignette: >
7+
%\VignetteIndexEntry{Customizing styler}
8+
%\VignetteEngine{knitr::rmarkdown}
9+
%\VignetteEncoding{UTF-8}
10+
---
11+
12+
This vignette gives a high-level overview about how styler works and how you
13+
can define your own style guide and format code according to it.
14+
15+
# How styler works
16+
17+
There are three major steps that styler performs in order to style code:
18+
19+
1. Create a abstract syntax tree (AST) from `utils::getParseData()` that
20+
contains positional information of every token. We call
21+
this a nested parse table. You can learn more about how that is done
22+
exactly in the vignettes "Data Structures" and "Manipulating the nested
23+
parse table".
24+
2. Apply transformer functions at each level of the nested parse table. In
25+
particular, we create two columns `lag_newlines` and `spaces`, which contain
26+
the number of line breaks before the token and the number of spaces
27+
after the token. These will be the columns that most of our transformer
28+
functions will modify. To go through the whole nested structure and apply
29+
the transformer functions repeatedly on every level of nesting, we use
30+
a visitor approach, i.e. a function that takes functions as arguments and
31+
applies them to every level of nesting.
32+
3. Serialize the nested parse table, that is, extract the terminal tokens from
33+
the nested parse table and add spaces and line breaks between them as
34+
specified in the nested parse table.
35+
36+
The argument transformers is, apart from the code to style, the key argument
37+
of functions such as `style_text()` and friends. By default, it is created
38+
via the argument `style`. The transformers are a named
39+
list of transformer functions and other arguments passed to styler. To use the
40+
default style guide of styler (the tidyverse style guide), call
41+
`tidyverse_style()` to get the list of the transformer functions. Let's quickly
42+
look at what those are.
43+
```{r, message = FALSE}
44+
library("styler")
45+
library("dplyr")
46+
names(tidyverse_style())
47+
str(tidyverse_style(), give.attr = FALSE, list.len = 3)
48+
```
49+
50+
We note that there are different types of transformer functions. `filler` is
51+
initializing some variables in the nested parse table (so it is not actually a
52+
transformer), the other elements modify either spacing, line break or tokens.
53+
`use_raw_indention` is not a function, it is just an option. All transformer
54+
functions have a similar structure. Let's pick one and look at it:
55+
```{r}
56+
tidyverse_style()$space$remove_space_after_opening_paren
57+
```
58+
59+
As the name says, this function removes spaces after the opening parenthesis. But
60+
how? Its input is a flat parse table. Since the visitor functions `pre_visit()`
61+
and `post_visit()` will go through all levels of nesting, we don't need to care
62+
about that - we just need a function that can be applied to a flat parse table,
63+
i.e at one level of nesting.
64+
We can compute the nested parse table and look at one of the levels of nesting
65+
that is interesting for us (more on the data structure in the vignettes
66+
"Data structures" and "Manipulating the parse table"):
67+
68+
```{r}
69+
string_to_format <- "call( 3)"
70+
pd <- styler:::compute_parse_data_nested(string_to_format) %>%
71+
styler:::pre_visit(c(styler:::create_filler))
72+
pd$child[[1]] %>%
73+
select(token, terminal, text, newlines, spaces)
74+
```
75+
76+
`create_filler()` is called to initialize some variables, it does not actually
77+
transform the parse table.
78+
79+
All the function `remove_space_after_opening_paren()` now does is looking for
80+
the opening bracket and setting the column `spaces` of the token to zero. Note
81+
that it is very important to check whether there is also a line break
82+
following after that token. If so, `spaces` should not be touched because of
83+
the way `spaces` and `newlines` are defined. `spaes` are the number of spaces
84+
after a token and `newlines`. Hence, if a line break follows, spaces are not
85+
EOL spaces, but rather the spaces directly before the next token. If there is
86+
a line break after the token and the value of `use_raw_indention` is set to
87+
`TRUE` (which means indention is not touched) and the rule would not check for
88+
that, indention for the token following `(` would be removed, which we don't
89+
want.
90+
If we apply the rule to our parse table, we can see that the column `spaes`
91+
changes and is now zero for all tokens:
92+
93+
```{r}
94+
styler:::remove_space_after_opening_paren(pd$child[[1]]) %>%
95+
select(token, terminal, text, newlines, spaces)
96+
```
97+
98+
All top-level styling functions have an argument `style` (which defaults
99+
to `tidyverse_style`). If you check out the help file, you can see that the
100+
argument `style` is only used to create the default argument `transformers`,
101+
which defaults to `style(...)`. This allows to specify options of the styling
102+
without specifying them inside the function passed to `transformers`.
103+
104+
Let's clarify that with an example. The following yields the same result:
105+
```{r}
106+
all.equal(
107+
style_text(string_to_format, transformers = tidyverse_style(strict = FALSE)),
108+
style_text(string_to_format, style = tidyverse_style, strict = FALSE),
109+
style_text(string_to_format, strict = FALSE),
110+
)
111+
```
112+
113+
Now let's do the whole styling of a string with just this
114+
one transformer introduced above. We do this by first creating a style guide
115+
with the designated wrapper function `create_style_guide()`.
116+
It takes transformer functions as input and returns them in a named list that
117+
meets the formal requirements for styling functions.
118+
119+
```{r}
120+
space_after_opening_style <- function(are_you_sure) {
121+
create_style_guide(space = if (are_you_sure) styler:::remove_space_after_opening_paren)
122+
}
123+
style_text("call( 1,1)", style = space_after_opening_style, are_you_sure = FALSE)
124+
```
125+
126+
Well, we probably want:
127+
128+
```{r}
129+
style_text("call( 1,1)", style = space_after_opening_style, are_you_sure = TRUE)
130+
```
131+
132+
Note that the return value of your `style` function may not contain `NULL`
133+
elements.
134+
135+
I hope you have acquired a basic understanding of how styler transforms code.
136+
You can provide your own transformer functions and use `create_style_guide()`
137+
to create customized code styling. If you do so, there are a few more things you
138+
should be aware of, which are described in the next section.
139+
140+
# Implementation details
141+
142+
For both spaces and line break information in the nested parse table, we use
143+
four attributes in total: `lag_newlines`, `newlines`, `spaces`, `lag_spaces`.
144+
`lag_spaces` is created from `spaces` only just before the parse table is
145+
serialized, so it is not relevant for manipulating the parse table as
146+
described above. These columns are to some degree redundant, but with just lag
147+
or lead, we would loose information on the first or the last element
148+
respectively, so we need both.
149+
150+
The sequence in which styler applies rules on each level of nesting
151+
is given in the list below:
152+
153+
* call `create_filler()` to initialize some variables.
154+
* modify the line breaks (modifying `lag_newlines` only based on
155+
`token`, `token_before`, `token_after` and `text`).
156+
* modify the spaces (modifying `spaces` only based on `lag_newlines`,
157+
`newlines`, `multi_line`, `token`, `token_before`, `token_after` and `text`).
158+
* modify the tokens (based on `newlines` `lag_newlines`, `spaces` `multi_line`,
159+
`token`, `token_before`, `token_after` and `text`).
160+
* modify the indention by changing `indention_ref_id` (based on `newlines`
161+
`lag_newlines`, `spaces` `multi_line`, `token`, `token_before`, `token_after`
162+
and `text`).
163+
164+
You can also look it up in the function that applies the transformers:
165+
`apply_transformers()`:
166+
```{r}
167+
styler:::apply_transformers
168+
```
169+
170+
This means that the order of the styling is clearly defined and it is for
171+
example not possible to modify line breaks based on spacing, because spacing
172+
will be set after line breaks are set. Do not rely on the column `col1`,
173+
`col2`, `line1` and `line2` in the parse table in any of your function since
174+
these columns do only reflect the position of tokens at the point of parsing,
175+
i.e. they are not kept up to date through the process of styling.
176+
177+
Also, as indicated above, work with `lag_nelwines` only in your line break
178+
rules. For development purposes, you also may want to use the unexported
179+
function `test_collection()` to help you with testing your style guide. You can
180+
find more information in the help file of the function.
181+
182+
Finally, take note of the naming convention. All function names starting with
183+
`set-*` correspond to the `strict` option, that is, setting some value to an
184+
exact number. `add-*` Is softer: For example, `add_spaces_around_op()`, only
185+
makes sure that there is at least one space around operators, but if the
186+
code to style contains multiple, the transformer will not change that.
187+
188+
# Showcasing the development of a styling rule
189+
190+
For illustrative purposes, we create a new style guide that has one rule only:
191+
Curly braces are always on a new line. So for example,
192+
```{r}
193+
add_one <- function(x) {
194+
x + 1
195+
}
196+
```
197+
198+
Should be transformed to
199+
200+
```{r}
201+
add_one <- function(x)
202+
{
203+
x + 1
204+
}
205+
```
206+
207+
We first need to get familiar with the structure of the nested parse table.
208+
Note that the structure of the nested parse table is not affected by the
209+
position of line breaks and spaces.
210+
Let's first create the nested parse table.
211+
```{r}
212+
code <- c("add_one <- function(x) { x + 1 }")
213+
styler:::create_tree(code)
214+
pd <- styler:::compute_parse_data_nested(code)
215+
```
216+
The token of interest here has id number 10. Let's navigate there. Since
217+
line break rules manipulate the lags *before* the token, we need to change
218+
`lag_newlines` at the token "'{'".
219+
```{r}
220+
pd$child[[1]]$child[[3]]$child[[5]]
221+
```
222+
223+
Remember what we said above: A transformer takes a flat parse table as input,
224+
updates it and returns it. So here it's actually simple:
225+
```{r}
226+
set_line_break_before_curly_opening <- function(pd_flat) {
227+
op <- pd_flat$token %in% "'{'"
228+
pd_flat$lag_newlines[op] <- 1L
229+
pd_flat
230+
}
231+
```
232+
233+
Almost done. Now, the last thing we need to do is to use `create_style_guide()`
234+
to create our style guide consisting of that function.
235+
```{r}
236+
set_line_break_before_curly_opening_style <- function() {
237+
create_style_guide(line_break = set_line_break_before_curly_opening)
238+
}
239+
```
240+
241+
Now you can style your string according to it.
242+
```{r}
243+
style_text(code, style = set_line_break_before_curly_opening_style)
244+
```
245+
Note that:
246+
247+
* We have not implemented a rule that sets the line breaks before the curly
248+
opening brace, which you probably want to for the above rule to be complete.
249+
* Note that when removing line breaks, always take care of comments, since you
250+
don't want this
251+
```{r, eval = FALSE}
252+
a <- function() # comments should remain EOL
253+
{
254+
3
255+
}
256+
```
257+
258+
to become
259+
260+
```{r, eval = FALSE}
261+
a <- function() # comments should remain EOL {
262+
3
263+
}
264+
```
265+
266+
The easiest way of taking care of that is not applying the rule
267+
if there is a comment, which can be checked for within your transformer
268+
function.

0 commit comments

Comments
 (0)