Skip to content

Commit a87198f

Browse files
Merge pull request #578 from lorenzwalthert/caching-top-level-expr
- Caching top level expressions (#578).
2 parents b7c4d9b + 4f6a890 commit a87198f

File tree

192 files changed

+6193
-5029
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

192 files changed

+6193
-5029
lines changed

DESCRIPTION

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
Type: Package
22
Package: styler
33
Title: Non-Invasive Pretty Printing of R Code
4-
Version: 1.2.0.9000
4+
Version: 1.2.0.9001
55
Authors@R:
66
c(person(given = "Kirill",
77
family = "Müller",
@@ -81,6 +81,7 @@ Collate:
8181
'testing-public-api.R'
8282
'testing.R'
8383
'token-create.R'
84+
'transform-block.R'
8485
'transform-code.R'
8586
'transform-files.R'
8687
'ui-caching.R'

NEWS.md

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -20,8 +20,10 @@
2020
before will be instantaneous. This brings large speed boosts in many
2121
situations, e.g. when `style_pkg()` is run but only a few files have changed
2222
since the last styling or when using the [styler pre-commit
23-
hook](https://github.com/lorenzwalthert/precommit). See `help("caching")`
24-
for details (#538).
23+
hook](https://github.com/lorenzwalthert/precommit). Because styler caches
24+
by expression, you will also get speed boosts in large files with many
25+
expressions when you only change a few o them. See `help("caching")` for
26+
details (#538, #578).
2527

2628
* `create_style_guide()` gains two arguments `style_guide_name` and
2729
`style_guide_version` that are carried as meta data, in particular to version

R/initialize.R

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -20,6 +20,8 @@ default_style_guide_attributes <- function(pd_flat) {
2020
validate_parse_data()
2121
}
2222

23+
24+
2325
#' Initialize attributes
2426
#'
2527
#' @name initialize_attributes

R/nest.R

Lines changed: 111 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -7,23 +7,112 @@
77
#' of the parse table.
88
#' @importFrom purrr when
99
#' @keywords internal
10-
compute_parse_data_nested <- function(text) {
10+
compute_parse_data_nested <- function(text,
11+
transformers) {
1112
parse_data <- tokenize(text) %>%
1213
add_terminal_token_before() %>%
1314
add_terminal_token_after() %>%
14-
add_stylerignore()
15+
add_stylerignore() %>%
16+
add_attributes_caching(transformers) %>%
17+
drop_cached_children()
1518

1619
env_add_stylerignore(parse_data)
1720

1821
parse_data$child <- rep(list(NULL), length(parse_data$text))
1922
pd_nested <- parse_data %>%
2023
nest_parse_data() %>%
2124
flatten_operators() %>%
22-
when(any(parse_data$token == "EQ_ASSIGN") ~ relocate_eq_assign(.), ~.)
25+
when(any(parse_data$token == "EQ_ASSIGN") ~ relocate_eq_assign(.), ~.) %>%
26+
add_cache_block()
2327

2428
pd_nested
2529
}
2630

31+
#' Add the block id to a parse table
32+
#'
33+
#' Must be after [nest_parse_data()] because requires a nested parse table as
34+
#' input.
35+
#' @param pd_nested A top level nest.
36+
#' @keywords internal
37+
#' @importFrom rlang seq2
38+
add_cache_block <- function(pd_nested) {
39+
if (cache_is_activated()) {
40+
pd_nested$block <- cache_find_block(pd_nested)
41+
} else {
42+
pd_nested$block <- rep(1, nrow(pd_nested))
43+
}
44+
pd_nested
45+
}
46+
47+
#' Drop all children of a top level expression that are cached
48+
#'
49+
#' Note that we do cache top-level comments. Because package code has a lot of
50+
#' roxygen comments and each of them is a top level expresion, checking is
51+
#' very expensive.
52+
#' @param pd A top-level nest.
53+
#' @details
54+
#' Because we process in blocks of expressions for speed, a cached expression
55+
#' will always end up in a block that won't be styled again (usual case), unless
56+
#' it's on a line where multiple expressions sit and at least one is not styled
57+
#' (exception).
58+
#'
59+
#' **usual case: All other expressions in a block are cached**
60+
#'
61+
#' Cached expressiond don't need to be transformed with `transformers` in
62+
#' [parse_transform_serialize_r_block()], we simply return `text` for the top
63+
#' level token. For that
64+
#' reason, the nested parse table can, at the rows where these expressions are
65+
#' located, be shallow, i.e. it does not have to contain a child, because it
66+
#' will neither be transformed nor serialized anytime. This function drops all
67+
#' associated tokens except the top-level token for such expressions, which will
68+
#' result in large speed improvements in [compute_parse_data_nested()] because
69+
#' nesting is expensive and will not be done for cached expressions.
70+
#'
71+
#' **exception: Not all other expressions in a block are cached**
72+
#'
73+
#' As described in [cache_find_block()], expressions on the same line are always
74+
#' put into one block. If any element of a block is not cached, the block will
75+
#' be styled as a whole. If the parse table was made shallow (and the top level)
76+
#' expresion is still marked as non-terminal, `text` will never be used in the
77+
#' transformation process and eventually lost. Hence, we must change the top
78+
#' level expression to a terminal. It will act like a comment in the sense that
79+
#' it is a fixed `text`.
80+
#'
81+
#' Because for the usual case, it does not even matter if the cached expression
82+
#' is a terminal or not (because it is not processed), we can safely set
83+
#' `terminal = TRUE` in general.
84+
#' @section Implementation:
85+
#' Because the structure of the parse table is not always "top-level expression
86+
#' first, then children", this function creates a temporary parse table that has
87+
#' this property and then extract the ids and subset the original parse table so
88+
#' it is shallow in the right places.
89+
#' @keywords internal
90+
drop_cached_children <- function(pd) {
91+
92+
if (cache_is_activated()) {
93+
94+
pd_parent_first <- pd[order(pd$line1, pd$col1, -pd$line2, -pd$col2, as.integer(pd$terminal)),]
95+
pos_ids_to_keep <- pd_parent_first %>%
96+
split(cumsum(pd_parent_first$parent == 0)) %>%
97+
map(find_pos_id_to_keep) %>%
98+
unlist() %>%
99+
unname()
100+
pd[pd$pos_id %in% pos_ids_to_keep,]
101+
} else {
102+
pd
103+
}
104+
105+
}
106+
107+
find_pos_id_to_keep <- function(pd) {
108+
if (pd$is_cached[1]) {
109+
pd$pos_id[1]
110+
} else {
111+
pd$pos_id
112+
}
113+
}
114+
115+
27116
#' Turn off styling for parts of the code
28117
#'
29118
#' Using stylerignore markers, you can temporarily turn off styler. See a
@@ -137,6 +226,25 @@ add_terminal_token_before <- function(pd_flat) {
137226
left_join(pd_flat, ., by = "id")
138227
}
139228

229+
#' Initialise variables related to caching
230+
#'
231+
#' @param transformers A list with transformer functions, used to check if
232+
#' the code is cached.
233+
#' @describeIn add_token_terminal Initializes `newlines` and `lag_newlines`.
234+
#' @keywords internal
235+
add_attributes_caching <- function(pd_flat, transformers) {
236+
pd_flat$block <- pd_flat$is_cached <- rep(NA, nrow(pd_flat))
237+
if (cache_is_activated()) {
238+
pd_flat$is_cached[pd_flat$parent == 0] <- map_lgl(
239+
pd_flat$text[pd_flat$parent == 0],
240+
is_cached, transformers, cache_dir_default()
241+
)
242+
is_comment <- pd_flat$token == "COMMENT"
243+
pd_flat$is_cached[is_comment] <- rep(FALSE, sum(is_comment))
244+
}
245+
pd_flat
246+
}
247+
140248
#' @describeIn add_token_terminal Removes column `terimnal_token_before`. Might
141249
#' be used to prevent the use of invalidated information, e.g. if tokens were
142250
#' added to the nested parse table.
@@ -220,13 +328,3 @@ combine_children <- function(child, internal_child) {
220328
}
221329
bound[order(bound$pos_id), ]
222330
}
223-
224-
#' Get the start right
225-
#'
226-
#' On what line does the first token occur?
227-
#' @param pd_nested A nested parse table.
228-
#' @return The line number on which the first token occurs.
229-
#' @keywords internal
230-
find_start_line <- function(pd_nested) {
231-
pd_nested$line1[1]
232-
}

R/nested-to-tree.R

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -30,6 +30,7 @@ create_tree_from_pd_with_default_style_attributes <- function(pd, structure_only
3030
#' @return An object of class "Node" and "R6".
3131
#' @examples
3232
#' if (rlang::is_installed("data.tree")) {
33+
#' cache_deactivate() # keep things simple
3334
#' code <- "a <- function(x) { if(x > 1) { 1+1 } else {x} }"
3435
#' nested_pd <- styler:::compute_parse_data_nested(code)
3536
#' initialized <- styler:::pre_visit(nested_pd, c(default_style_guide_attributes))

R/parse.R

Lines changed: 6 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -71,19 +71,22 @@ has_crlf_as_first_line_sep <- function(message, initial_text) {
7171
#' * A column "child" that contains *nest*s.
7272
#'
7373
#' @param text A character vector.
74+
#' @inheritParams get_parse_data
7475
#' @return A flat parse table
7576
#' @importFrom rlang seq2
7677
#' @keywords internal
7778
tokenize <- function(text) {
78-
get_parse_data(text, include_text = NA) %>%
79+
get_parse_data(text, include_text = TRUE) %>%
7980
ensure_correct_str_txt(text) %>%
8081
enhance_mapping_special()
8182
}
8283

8384
#' Obtain robust parse data
8485
#'
8586
#' Wrapper around `utils::getParseData(parse(text = text))` that returns a flat
86-
#' parse table.
87+
#' parse table. When caching information should be added, make sure that
88+
#' the cache is activated with `cache_activate()` and both `transformers` and
89+
#' `cache_dir` are non-`NULL`.
8790
#' @param text The text to parse.
8891
#' @param include_text Passed to [utils::getParseData()] as `includeText`.
8992
#' @param ... Other arguments passed to [utils::getParseData()].
@@ -96,6 +99,7 @@ get_parse_data <- function(text, include_text = TRUE, ...) {
9699
utils::getParseData(parsed, includeText = include_text),
97100
.name_repair = "minimal") %>%
98101
add_id_and_short()
102+
99103
parser_version_set(parser_version_find(pd))
100104
pd
101105
}

R/relevel.R

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -183,7 +183,10 @@ relocate_eq_assign_nest <- function(pd) {
183183
#' Two assignment tokens `EQ_ASSIGN` belong to the same block if they are not
184184
#' separated by more than one token. Token between `EQ_ASSIGN` tokens belong
185185
#' to the `EQ_ASSIGN` token occurring before them, except the token right before
186-
#' `EQ_ASSING` already belongs to the `EQ_ASSING` after it.
186+
#' `EQ_ASSING` already belongs to the `EQ_ASSING` after it. Note that this
187+
#' notion is unrelated to the column *block* in the parse table, which is used
188+
#' to [parse_transform_serialize_r()] code blocks and leave out the ones that
189+
#' are cached.
187190
#' @param pd A parse table.
188191
#' @keywords internal
189192
find_block_id <- function(pd) {

R/serialize.R

Lines changed: 2 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -2,11 +2,10 @@
22
#'
33
#' Collapses a flattened parse table into character vector representation.
44
#' @inheritParams apply_stylerignore
5-
#' @param start_line The line number on which the code starts.
65
#' @keywords internal
7-
serialize_parse_data_flattened <- function(flattened_pd, start_line = 1) {
8-
flattened_pd$lag_newlines[1] <- start_line - 1
6+
serialize_parse_data_flattened <- function(flattened_pd) {
97
flattened_pd <- apply_stylerignore(flattened_pd)
8+
flattened_pd$lag_newlines[1] <- 0 # resolve start_line elsewhere
109
res <- with(
1110
flattened_pd,
1211
paste0(

R/token-create.R

Lines changed: 10 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -18,6 +18,10 @@
1818
#' @param terminal Boolean vector indicating whether a token is a terminal or
1919
#' not.
2020
#' @param child The children of the tokens.
21+
#' @param stylerignore Boolean to indicate if the line should be ignored by
22+
#' styler.
23+
#' @param block The block (of caching) to which the token belongs. An integer.
24+
#' @param is_cached Whether the token is cached already.
2125
#' @family token creators
2226
#' @keywords internal
2327
create_tokens <- function(tokens,
@@ -31,7 +35,9 @@ create_tokens <- function(tokens,
3135
indents = 0,
3236
terminal = TRUE,
3337
child = NULL,
34-
stylerignore = FALSE) {
38+
stylerignore = FALSE,
39+
block = NA,
40+
is_cached = NA) {
3541
len_text <- length(texts)
3642
new_tibble(
3743
list(
@@ -50,7 +56,9 @@ create_tokens <- function(tokens,
5056
indention_ref_pos_id = indention_ref_pos_ids,
5157
indent = indents,
5258
child = rep(list(child), len_text),
53-
stylerignore = stylerignore
59+
stylerignore = stylerignore,
60+
block = block,
61+
is_cached = is_cached
5462
),
5563
nrow = len_text
5664
)

R/transform-block.R

Lines changed: 96 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,96 @@
1+
#' Parse, transform and serialize a nested parse table
2+
#'
3+
#' We process blocks of nested parse tables for speed. See [cache_find_block()]
4+
#' for details on how a top level nest is split into blocks.
5+
#' @param pd_nested A block of the nested parse table.
6+
#' @param start_line The line number on which the code starts.
7+
#' @inheritParams apply_transformers
8+
#' @keywords internal
9+
parse_transform_serialize_r_block <- function(pd_nested,
10+
start_line,
11+
transformers) {
12+
if (!all(pd_nested$is_cached, na.rm = TRUE) || !cache_is_activated()) {
13+
transformed_pd <- apply_transformers(pd_nested, transformers)
14+
flattened_pd <- post_visit(transformed_pd, list(extract_terminals)) %>%
15+
enrich_terminals(transformers$use_raw_indention) %>%
16+
apply_ref_indention() %>%
17+
set_regex_indention(
18+
pattern = transformers$reindention$regex_pattern,
19+
target_indention = transformers$reindention$indention,
20+
comments_only = transformers$reindention$comments_only
21+
)
22+
serialized_transformed_text <- serialize_parse_data_flattened(flattened_pd)
23+
} else {
24+
serialized_transformed_text <- map2(
25+
c(0, find_blank_lines_to_next_expr(pd_nested)[-1] - 1L),
26+
pd_nested$text,
27+
~ c(rep("", .x), .y)
28+
) %>%
29+
unlist()
30+
}
31+
c(rep("", start_line - 1), serialized_transformed_text)
32+
}
33+
34+
#' Find the groups of expressions that should be processed together
35+
#'
36+
#' Every expression is an expression itself, Expressions on same line are in
37+
#' same block.
38+
#' Multiple expressions can sit on one row, e.g. in line comment and commands
39+
#' separated with ";". This creates a problem when processing each expression
40+
#' separately because when putting them together, we need complicated handling
41+
#' of line breaks between them, as it is not apriory clear that there is a line
42+
#' break separating them. To avoid this, we put top level expressions that sit
43+
#' on the same line into one block, so the assumption that there is a line break
44+
#' between each block of expressions holds.
45+
#' @param pd A top level parse table.
46+
#' @details
47+
#' we want to for turning points:
48+
#' - change in cache state is a turning point
49+
#' - expressions that are not on a new line cannot be a turning point. In this
50+
#' case, the turning point is moved to the first expression on the line
51+
#' @param pd A top level nest.
52+
#' @keywords internal
53+
cache_find_block <- function(pd) {
54+
55+
first_after_cache_state_switch <- pd$is_cached != lag(pd$is_cached, default = !pd$is_cached[1])
56+
57+
not_first_on_line <- find_blank_lines_to_next_expr(pd) == 0
58+
invalid_turning_point_idx <- which(
59+
not_first_on_line & first_after_cache_state_switch
60+
)
61+
62+
first_on_line_idx <- which(!not_first_on_line)
63+
valid_replacements <- map_int(invalid_turning_point_idx, function(x) {
64+
last(which(x > first_on_line_idx))
65+
})
66+
sort(unique(c(
67+
setdiff(which(first_after_cache_state_switch), invalid_turning_point_idx),
68+
valid_replacements
69+
))) %>%
70+
unwhich(nrow(pd)) %>%
71+
cumsum()
72+
}
73+
74+
75+
#' Find blank lines
76+
#'
77+
#' What number of line breaks lay between the expressions?
78+
#' @param pd_nested A nested parse table.
79+
#' @return The line number on which the first token occurs.
80+
#' @keywords internal
81+
find_blank_lines_to_next_expr <- function(pd_nested) {
82+
pd_nested$line1 - lag(pd_nested$line2, default = 0)
83+
}
84+
85+
#' Number of lines between cache blocks
86+
#'
87+
#' This is relevant when putting expressions together into a block and preserve
88+
#' blank lines between them. Note that because code does not need to start on
89+
#' line 1, the first element of the output is the number of lines until the
90+
#' first block.
91+
#' @param pd A top level nest.
92+
find_blank_lines_to_next_block <- function(pd) {
93+
block_boundary <- pd$block != lag(pd$block, default = 0)
94+
find_blank_lines_to_next_expr(pd)[block_boundary]
95+
}
96+

0 commit comments

Comments
 (0)