|
| 1 | +--- |
| 2 | +title: "Customizing styler" |
| 3 | +author: "Lorenz Walthert" |
| 4 | +date: "8/10/2017" |
| 5 | +output: rmarkdown::html_vignette |
| 6 | +vignette: > |
| 7 | + %\VignetteIndexEntry{Customizing styler} |
| 8 | + %\VignetteEngine{knitr::rmarkdown} |
| 9 | + %\VignetteEncoding{UTF-8} |
| 10 | +--- |
| 11 | + |
| 12 | +This vignette gives a high-level overview about how styler works and how you |
| 13 | +can define your own style guide and format code according to it. |
| 14 | + |
| 15 | +# How styler works |
| 16 | + |
| 17 | +There are three major steps that styler performs in order to style code: |
| 18 | + |
| 19 | +1. Create a abstract syntax tree (AST) from `utils::getParseData()` that |
| 20 | + contains positional information of every token. We call |
| 21 | + this a nested parse table. You can learn more about how that is done |
| 22 | + exactly in the vignettes "Data Structures" and "Manipulating the nested |
| 23 | + parse table". |
| 24 | +2. Apply transformer functions at each level of the nested parse table. We |
| 25 | + use a visitor approach, i.e. a function that takes functions as arguments and |
| 26 | + applies them to every level of nesting. You can find out more about it on |
| 27 | + the help file for `visit`. Note that the function is not exported by styler. |
| 28 | + The visitor will take care of applying the functions on every |
| 29 | + level of nesting - and we can supply transformer functions that operate on |
| 30 | + one level of nesting. In the sequel, we use the term *nest* to refer to |
| 31 | + such a parse table at one level of nesting. A *nest* always represents a |
| 32 | + complete expression. Before we apply the transformers, we have to initialize |
| 33 | + two columns `lag_newlines` and `spaces`, which contain |
| 34 | + the number of line breaks before the token and the number of spaces |
| 35 | + after the token. These will be the columns that most of our transformer |
| 36 | + functions will modify. |
| 37 | +3. Serialize the nested parse table, that is, extract the terminal tokens from |
| 38 | + the nested parse table and add spaces and line breaks between them as |
| 39 | + specified in the nested parse table. |
| 40 | + |
| 41 | +The `transformers` argument is, apart from the code to style, the key argument |
| 42 | +of functions such as `style_text()` and friends. By default, it is created |
| 43 | +via the argument `style`. The transformers are a named |
| 44 | +list of transformer functions and other arguments passed to styler. To use the |
| 45 | +default style guide of styler (the tidyverse style guide), call |
| 46 | +`tidyverse_style()` to get the list of the transformer functions. Let's quickly |
| 47 | +look at what those are. |
| 48 | +```{r, message = FALSE} |
| 49 | +library("styler") |
| 50 | +library("dplyr") |
| 51 | +names(tidyverse_style()) |
| 52 | +str(tidyverse_style(), give.attr = FALSE, list.len = 3) |
| 53 | +``` |
| 54 | + |
| 55 | +We note that there are different types of transformer functions. `filler` is |
| 56 | +initializing some variables in the nested parse table (so it is not actually a |
| 57 | +transformer), the other elements modify either spacing, line break or tokens. |
| 58 | +`use_raw_indention` is not a function, it is just an option. All transformer |
| 59 | +functions have a similar structure. Let's pick one and look at it: |
| 60 | +```{r} |
| 61 | +tidyverse_style()$space$remove_space_after_opening_paren |
| 62 | +``` |
| 63 | + |
| 64 | +As the name says, this function removes spaces after the opening parenthesis. But |
| 65 | +how? |
| 66 | +Its input is a *nest*. Since the visitor will go through all levels of nesting, |
| 67 | +we just need a function that can be applied to a *nest*, that is, to a parse |
| 68 | +table at one level of nesting. |
| 69 | +We can compute the nested parse table and look at one of the levels of nesting |
| 70 | +that is interesting for us (more on the data structure in the vignettes |
| 71 | +"Data structures" and "Manipulating the parse table"): |
| 72 | + |
| 73 | +```{r} |
| 74 | +string_to_format <- "call( 3)" |
| 75 | +pd <- styler:::compute_parse_data_nested(string_to_format) %>% |
| 76 | + styler:::pre_visit(c(styler:::create_filler)) |
| 77 | +pd$child[[1]] %>% |
| 78 | + select(token, terminal, text, newlines, spaces) |
| 79 | +``` |
| 80 | + |
| 81 | +`create_filler()` is called to initialize some variables, it does not actually |
| 82 | +transform the parse table. |
| 83 | + |
| 84 | +All the function `remove_space_after_opening_paren()` now does is looking for |
| 85 | +the opening bracket and setting the column `spaces` of the token to zero. Note |
| 86 | +that it is very important to check whether there is also a line break |
| 87 | +following after that token. If so, `spaces` should not be touched because of |
| 88 | +the way `spaces` and `newlines` are defined. `spaes` are the number of spaces |
| 89 | +after a token and `newlines`. Hence, if a line break follows, spaces are not |
| 90 | +EOL spaces, but rather the spaces directly before the next token. If there is |
| 91 | +a line break after the token and the value of `use_raw_indention` is set to |
| 92 | +`TRUE` (which means indention is not touched) and the rule would not check for |
| 93 | +that, indention for the token following `(` would be removed, which we don't |
| 94 | +want. |
| 95 | +If we apply the rule to our parse table, we can see that the column `spaes` |
| 96 | +changes and is now zero for all tokens: |
| 97 | + |
| 98 | +```{r} |
| 99 | +styler:::remove_space_after_opening_paren(pd$child[[1]]) %>% |
| 100 | + select(token, terminal, text, newlines, spaces) |
| 101 | +``` |
| 102 | + |
| 103 | +All top-level styling functions have an argument `style` (which defaults |
| 104 | +to `tidyverse_style`). If you check out the help file, you can see that the |
| 105 | +argument `style` is only used to create the default argument `transformers`, |
| 106 | +which defaults to `style(...)`. This allows to specify options of the styling |
| 107 | +without specifying them inside the function passed to `transformers`. |
| 108 | + |
| 109 | +Let's clarify that with an example. The following yields the same result: |
| 110 | +```{r} |
| 111 | +all.equal( |
| 112 | + style_text(string_to_format, transformers = tidyverse_style(strict = FALSE)), |
| 113 | + style_text(string_to_format, style = tidyverse_style, strict = FALSE), |
| 114 | + style_text(string_to_format, strict = FALSE), |
| 115 | +) |
| 116 | +``` |
| 117 | + |
| 118 | +Now let's do the whole styling of a string with just this |
| 119 | +one transformer introduced above. We do this by first creating a style guide |
| 120 | +with the designated wrapper function `create_style_guide()`. |
| 121 | +It takes transformer functions as input and returns them in a named list that |
| 122 | +meets the formal requirements for styling functions. |
| 123 | + |
| 124 | +```{r} |
| 125 | +space_after_opening_style <- function(are_you_sure) { |
| 126 | + create_style_guide(space = if (are_you_sure) styler:::remove_space_after_opening_paren) |
| 127 | +} |
| 128 | +style_text("call( 1,1)", style = space_after_opening_style, are_you_sure = FALSE) |
| 129 | +``` |
| 130 | + |
| 131 | +Well, we probably want: |
| 132 | + |
| 133 | +```{r} |
| 134 | +style_text("call( 1,1)", style = space_after_opening_style, are_you_sure = TRUE) |
| 135 | +``` |
| 136 | + |
| 137 | +Note that the return value of your `style` function may not contain `NULL` |
| 138 | +elements. |
| 139 | + |
| 140 | +I hope you have acquired a basic understanding of how styler transforms code. |
| 141 | +You can provide your own transformer functions and use `create_style_guide()` |
| 142 | +to create customized code styling. If you do so, there are a few more things you |
| 143 | +should be aware of, which are described in the next section. |
| 144 | + |
| 145 | +# Implementation details |
| 146 | + |
| 147 | +For both spaces and line break information in the nested parse table, we use |
| 148 | +four attributes in total: `lag_newlines`, `newlines`, `spaces`, `lag_spaces`. |
| 149 | +`lag_spaces` is created from `spaces` only just before the parse table is |
| 150 | +serialized, so it is not relevant for manipulating the parse table as |
| 151 | +described above. These columns are to some degree redundant, but with just lag |
| 152 | +or lead, we would lose information on the first or the last element |
| 153 | +respectively, so we need both. |
| 154 | + |
| 155 | +The sequence in which styler applies rules on each level of nesting |
| 156 | +is given in the list below: |
| 157 | + |
| 158 | +* call `create_filler()` to initialize some variables. |
| 159 | +* modify the line breaks (modifying `lag_newlines` only based on |
| 160 | + `token`, `token_before`, `token_after` and `text`). |
| 161 | +* modify the spaces (modifying `spaces` only based on `lag_newlines`, |
| 162 | + `newlines`, `multi_line`, `token`, `token_before`, `token_after` and `text`). |
| 163 | +* modify the tokens (based on `newlines` `lag_newlines`, `spaces` `multi_line`, |
| 164 | + `token`, `token_before`, `token_after` and `text`). |
| 165 | +* modify the indention by changing `indention_ref_id` (based on `newlines` |
| 166 | + `lag_newlines`, `spaces` `multi_line`, `token`, `token_before`, `token_after` |
| 167 | + and `text`). |
| 168 | + |
| 169 | +You can also look it up in the function that applies the transformers: |
| 170 | +`apply_transformers()`: |
| 171 | +```{r} |
| 172 | +styler:::apply_transformers |
| 173 | +``` |
| 174 | + |
| 175 | +This means that the order of the styling is clearly defined and it is for |
| 176 | +example not possible to modify line breaks based on spacing, because spacing |
| 177 | +will be set after line breaks are set. Do not rely on the column `col1`, |
| 178 | +`col2`, `line1` and `line2` in the parse table in any of your function since |
| 179 | +these columns do only reflect the position of tokens at the point of parsing, |
| 180 | +i.e. they are not kept up to date through the process of styling. |
| 181 | + |
| 182 | +Also, as indicated above, work with `lag_nelwines` only in your line break |
| 183 | +rules. For development purposes, you also may want to use the unexported |
| 184 | +function `test_collection()` to help you with testing your style guide. You can |
| 185 | +find more information in the help file of the function. |
| 186 | + |
| 187 | +If you write functions that modify spaces, don't forget make sure |
| 188 | +you don't modify EOL spacing, since that is needed for `use_raw_indention`, as |
| 189 | +indicated in the previous paragraph. |
| 190 | + |
| 191 | +Finally, take note of the naming convention. All function names starting with |
| 192 | +`set-*` correspond to the `strict` option, that is, setting some value to an |
| 193 | +exact number. `add-*` Is softer: For example, `add_spaces_around_op()`, only |
| 194 | +makes sure that there is at least one space around operators, but if the |
| 195 | +code to style contains multiple, the transformer will not change that. |
| 196 | + |
| 197 | +# Showcasing the development of a styling rule |
| 198 | + |
| 199 | +For illustrative purposes, we create a new style guide that has one rule only: |
| 200 | +Curly braces are always on a new line. So for example, |
| 201 | +```{r} |
| 202 | +add_one <- function(x) { |
| 203 | + x + 1 |
| 204 | +} |
| 205 | +``` |
| 206 | + |
| 207 | +Should be transformed to |
| 208 | + |
| 209 | +```{r} |
| 210 | +add_one <- function(x) |
| 211 | +{ |
| 212 | + x + 1 |
| 213 | +} |
| 214 | +``` |
| 215 | + |
| 216 | +We first need to get familiar with the structure of the nested parse table. |
| 217 | +Note that the structure of the nested parse table is not affected by the |
| 218 | +position of line breaks and spaces. |
| 219 | +Let's first create the nested parse table. |
| 220 | +```{r} |
| 221 | +code <- c("add_one <- function(x) { x + 1 }") |
| 222 | +styler:::create_tree(code) |
| 223 | +pd <- styler:::compute_parse_data_nested(code) |
| 224 | +``` |
| 225 | +The token of interest here has id number 10. Let's navigate there. Since |
| 226 | +line break rules manipulate the lags *before* the token, we need to change |
| 227 | +`lag_newlines` at the token "'{'". |
| 228 | +```{r} |
| 229 | +pd$child[[1]]$child[[3]]$child[[5]] |
| 230 | +``` |
| 231 | + |
| 232 | +Remember what we said above: A transformer takes a flat parse table as input, |
| 233 | +updates it and returns it. So here it's actually simple: |
| 234 | +```{r} |
| 235 | +set_line_break_before_curly_opening <- function(pd_flat) { |
| 236 | + op <- pd_flat$token %in% "'{'" |
| 237 | + pd_flat$lag_newlines[op] <- 1L |
| 238 | + pd_flat |
| 239 | +} |
| 240 | +``` |
| 241 | + |
| 242 | +Almost done. Now, the last thing we need to do is to use `create_style_guide()` |
| 243 | +to create our style guide consisting of that function. |
| 244 | +```{r} |
| 245 | +set_line_break_before_curly_opening_style <- function() { |
| 246 | + create_style_guide(line_break = set_line_break_before_curly_opening) |
| 247 | +} |
| 248 | +``` |
| 249 | + |
| 250 | +Now you can style your string according to it. |
| 251 | +```{r} |
| 252 | +style_text(code, style = set_line_break_before_curly_opening_style) |
| 253 | +``` |
| 254 | + |
| 255 | +Note that when removing line breaks, always take care of comments, since you |
| 256 | +don't want |
| 257 | +```{r, eval = FALSE} |
| 258 | +a <- function() # comments should remain EOL |
| 259 | +{ |
| 260 | + 3 |
| 261 | +} |
| 262 | +``` |
| 263 | + |
| 264 | +to become |
| 265 | + |
| 266 | +```{r, eval = FALSE} |
| 267 | +a <- function() # comments should remain EOL { |
| 268 | + 3 |
| 269 | +} |
| 270 | +``` |
| 271 | + |
| 272 | +The easiest way of taking care of that is not applying the rule if there is a |
| 273 | +comment before the token of interest, which can be checked for within your |
| 274 | +transformer function. The transformer function from the tidyverse style that |
| 275 | +removes line breaks before the curly opening bracket looks as follows: |
| 276 | +```{r} |
| 277 | +styler:::remove_line_break_before_curly_opening |
| 278 | +``` |
| 279 | + |
0 commit comments