Merge pull request #146 from lorenzwalthert/customizing_vignette

krlmlr · web-flow · commit 096851d57385 · 2017-08-24T14:37:24.000+02:00
- Vignette on customizing styler (#145).
diff --git a/vignettes/customizing_styler.Rmd b/vignettes/customizing_styler.Rmd
@@ -0,0 +1,279 @@
+---
+title: "Customizing styler"
+author: "Lorenz Walthert"
+date: "8/10/2017"
+output: rmarkdown::html_vignette
+vignette: >
+  %\VignetteIndexEntry{Customizing styler}
+  %\VignetteEngine{knitr::rmarkdown}
+  %\VignetteEncoding{UTF-8}
+---
+
+This vignette gives a high-level overview about how styler works and how you
+can define your own style guide and format code according to it. 
+
+# How styler works
+
+There are three major steps that styler performs in order to style code:
+
+1. Create a abstract syntax tree (AST) from `utils::getParseData()` that
+   contains positional information of every token. We call
+   this a nested parse table. You can learn more about how that is done
+   exactly in the vignettes "Data Structures" and "Manipulating the nested 
+   parse table".
+2. Apply transformer functions at each level of the nested parse table. We 
+   use a visitor approach, i.e. a function that takes functions as arguments and
+   applies them to every level of nesting. You can find out more about it on
+   the help file for `visit`. Note that the function is not exported by styler.
+   The visitor will take care of applying the functions on every 
+   level of nesting - and we can supply transformer functions that operate on 
+   one level of nesting. In the sequel, we use the term *nest* to refer to 
+   such a parse table at one level of nesting. A *nest* always represents a 
+   complete expression. Before we apply the transformers, we have to initialize
+   two columns `lag_newlines` and `spaces`, which contain
+   the number of line breaks before the token and the number of spaces 
+   after the token. These will be the columns that most of our transformer 
+   functions will modify. 
+3. Serialize the nested parse table, that is, extract the terminal tokens from 
+   the nested parse table and add spaces and line breaks between them as 
+   specified in the nested parse table. 
+
+The `transformers` argument is, apart from the code to style, the key argument 
+of functions such as `style_text()` and friends. By default, it is created 
+via the argument `style`. The transformers are a named
+list of transformer functions and other arguments passed to styler. To use the
+default style guide of styler (the tidyverse style guide), call
+`tidyverse_style()` to get the list of the transformer functions. Let's quickly
+look at what those are.
+```{r, message = FALSE}
+library("styler")
+library("dplyr")
+names(tidyverse_style())
+str(tidyverse_style(), give.attr = FALSE, list.len = 3)
+```
+
+We note that there are different types of transformer functions. `filler` is
+initializing some variables in the nested parse table (so it is not actually a
+transformer), the other elements modify either spacing, line break or tokens.
+`use_raw_indention` is not a function, it is just an option. All transformer
+functions have a similar structure. Let's pick one and look at it:
+```{r}
+tidyverse_style()$space$remove_space_after_opening_paren
+```
+
+As the name says, this function removes spaces after the opening parenthesis. But
+how?
+Its input is a *nest*. Since the visitor will go through all levels of nesting, 
+we just need a function that can be applied to a *nest*, that is, to a parse 
+table at one level of nesting.
+We can compute the nested parse table and look at one of the levels of nesting
+that is interesting for us (more on the data structure in the vignettes 
+"Data structures" and "Manipulating the parse table"):
+
+```{r}
+string_to_format <- "call( 3)"
+pd <- styler:::compute_parse_data_nested(string_to_format) %>%
+  styler:::pre_visit(c(styler:::create_filler))
+pd$child[[1]] %>%
+  select(token, terminal, text, newlines, spaces)
+```
+
+`create_filler()` is called to initialize some variables, it does not actually
+transform the parse table.
+
+All the function `remove_space_after_opening_paren()` now does is looking for
+the opening bracket and setting the column `spaces` of the token to zero. Note
+that it is very important to check whether there is also a line break
+following after that token. If so, `spaces` should not be touched because of
+the way `spaces` and `newlines` are defined. `spaes` are the number of spaces
+after a token and `newlines`. Hence, if a line break follows, spaces are not
+EOL spaces, but rather the spaces directly before the next token. If there is
+a line break after the token and the value of `use_raw_indention` is set to
+`TRUE` (which means indention is not touched) and the rule would not check for
+that, indention for the token following `(` would be removed, which we don't
+want.
+If we apply the rule to our parse table, we can see that the column `spaes` 
+changes and is now zero for all tokens:
+
+```{r}
+styler:::remove_space_after_opening_paren(pd$child[[1]]) %>%
+  select(token, terminal, text, newlines, spaces)
+```
+
+All top-level styling functions have an argument `style` (which defaults
+to `tidyverse_style`). If you check out the help file, you can see that the 
+argument `style` is only used to create the default argument `transformers`, 
+which defaults to `style(...)`. This allows to specify options of the styling
+without specifying them inside the function passed to `transformers`.
+
+Let's clarify that with an example. The following yields the same result:
+```{r}
+all.equal(
+  style_text(string_to_format, transformers = tidyverse_style(strict = FALSE)),
+  style_text(string_to_format, style = tidyverse_style, strict = FALSE),
+  style_text(string_to_format, strict = FALSE),
+)
+```
+
+Now let's do the whole styling of a string with just this
+one transformer introduced above. We do this by first creating a style guide 
+with the designated wrapper function `create_style_guide()`.
+It takes transformer functions as input and returns them in a named list that 
+meets the formal requirements for styling functions.
+
+```{r}
+space_after_opening_style <- function(are_you_sure) {
+  create_style_guide(space = if (are_you_sure) styler:::remove_space_after_opening_paren)
+}
+style_text("call( 1,1)", style = space_after_opening_style, are_you_sure = FALSE)  
+```
+
+Well, we probably want:
+
+```{r}
+style_text("call( 1,1)", style = space_after_opening_style, are_you_sure = TRUE)  
+```
+
+Note that the return value of your `style` function may not contain `NULL` 
+elements.
+
+I hope you have acquired a basic understanding of how styler transforms code.
+You can provide your own transformer functions and use `create_style_guide()`
+to create customized code styling. If you do so, there are a few more things you
+should be aware of, which are described in the next section.
+
+# Implementation details
+
+For both spaces and line break information in the nested parse table, we use
+four attributes in total: `lag_newlines`, `newlines`, `spaces`, `lag_spaces`.
+`lag_spaces` is created from `spaces` only just before the parse table is
+serialized, so it is not relevant for manipulating the parse table as
+described above. These columns are to some degree redundant, but with just lag
+or lead, we would lose information on the first or the last element 
+respectively, so we need both.
+
+The sequence in which styler applies rules on each level of nesting 
+is given in the list below:
+
+* call `create_filler()` to initialize some variables.
+* modify the line breaks (modifying `lag_newlines` only based on 
+  `token`, `token_before`, `token_after` and `text`).
+* modify the spaces (modifying `spaces` only based on `lag_newlines`, 
+  `newlines`, `multi_line`, `token`, `token_before`, `token_after` and `text`).
+* modify the tokens (based on `newlines` `lag_newlines`, `spaces` `multi_line`, 
+  `token`, `token_before`, `token_after` and `text`).
+* modify the indention by changing `indention_ref_id` (based on `newlines` 
+  `lag_newlines`, `spaces` `multi_line`, `token`, `token_before`, `token_after` 
+  and `text`).
+
+You can also look it up in the function that applies the transformers: 
+`apply_transformers()`:
+```{r}
+styler:::apply_transformers
+```
+
+This means that the order of the styling is clearly defined and it is for 
+example not possible to modify line breaks based on spacing, because spacing
+will be set after line breaks are set. Do not rely on the column `col1`,
+`col2`, `line1` and `line2` in the parse table in any of your function since 
+these columns do only reflect the position of tokens at the point of parsing, 
+i.e. they are not kept up to date through the process of styling.
+
+Also, as indicated above, work with `lag_nelwines` only in your line break 
+rules. For development purposes, you also may want to use the unexported 
+function `test_collection()` to help you with testing your style guide. You can 
+find more information in the help file of the function. 
+
+If you write functions that modify spaces, don't forget make sure
+you don't modify EOL spacing, since that is needed for `use_raw_indention`, as 
+indicated in the previous paragraph.
+
+Finally, take note of the naming convention. All function names starting with
+`set-*` correspond to the `strict` option, that is, setting some value to an 
+exact number. `add-*` Is softer: For example, `add_spaces_around_op()`, only
+makes sure that there is at least one space around operators, but if the 
+code to style contains multiple, the transformer will not change that.
+
+# Showcasing the development of a styling rule
+
+For illustrative purposes, we create a new style guide that has one rule only:
+Curly braces are always on a new line. So for example, 
+```{r}
+add_one <- function(x) {
+  x + 1
+}
+```
+
+Should be transformed to 
+
+```{r}
+add_one <- function(x) 
+{
+  x + 1
+}
+```
+
+We first need to get familiar with the structure of the nested parse table.
+Note that the structure of the nested parse table is not affected by the 
+position of line breaks and spaces.
+Let's first create the nested parse table.
+```{r}
+code <- c("add_one <- function(x) { x + 1 }")
+styler:::create_tree(code)
+pd <- styler:::compute_parse_data_nested(code)
+```
+The token of interest here has id number 10. Let's navigate there. Since
+line break rules manipulate the lags *before* the token, we need to change 
+`lag_newlines` at the token "'{'".
+```{r}
+pd$child[[1]]$child[[3]]$child[[5]]
+```
+
+Remember what we said above: A transformer takes a flat parse table as input, 
+updates it and returns it. So here it's actually simple:
+```{r}
+set_line_break_before_curly_opening <- function(pd_flat) {
+  op <- pd_flat$token %in% "'{'"
+  pd_flat$lag_newlines[op] <- 1L
+  pd_flat
+}
+```
+
+Almost done. Now, the last thing we need to do is to use `create_style_guide()` 
+to create our style guide consisting of that function.
+```{r}
+set_line_break_before_curly_opening_style <- function() {
+  create_style_guide(line_break = set_line_break_before_curly_opening)
+}
+```
+
+Now you can style your string according to it.
+```{r}
+style_text(code, style = set_line_break_before_curly_opening_style)
+```
+
+Note that when removing line breaks, always take care of comments, since you 
+don't want
+```{r, eval = FALSE}
+a <- function() # comments should remain EOL
+{ 
+  3
+}
+```
+
+to become 
+
+```{r, eval = FALSE}
+a <- function() # comments should remain EOL { 
+  3
+}
+```
+
+The easiest way of taking care of that is not applying the rule if there is a
+comment before the token of interest, which can be checked for within your
+transformer function. The transformer function from the tidyverse style that
+removes line breaks before the curly opening bracket looks as follows: 
+```{r}
+styler:::remove_line_break_before_curly_opening
+```
+