|
86 | 86 | 15. New function `isoyear()` has been implemented as a complement to `isoweek()`, returning the ISO 8601 year corresponding to a given date, [#7154](https://github.com/Rdatatable/data.table/issues/7154). Thanks to @ben-schwen and @MichaelChirico for the suggestion and @venom1204 for the implementation. |
87 | 87 |
|
88 | 88 | 16. Multiple improvements has been added to rolling functions. Request came from @gpierard who needed left aligned, adaptive, rolling max, [#5438](https://github.com/Rdatatable/data.table/issues/5438). There was no `frollmax` function yet. Adaptive rolling functions did not have support for `align="left"`. `frollapply` did not support `adaptive=TRUE`. Available alternatives were base R `mapply` or self-join using `max` and grouping `by=.EACHI`. As a follow up of his request, following features has been added: |
89 | | -- new function `frollmax`, applies `max` over a rolling window. |
90 | | -- support for `align="left"` for adaptive rolling function. |
91 | | -- support for `adaptive=TRUE` in `frollapply`. |
92 | | -- `partial` argument to trim window width to available observations rather than returning `NA` whenever window is not complete. |
93 | | -- `give.names` argument that can be used to automatically give the names based on the names of `x` and `n`. |
94 | | -- `frollmean` and `frollsum` no longer treat `Inf` and `-Inf` as `NA`s as it used to be for `algo="fast"` (breaking change). |
95 | | -- `hasNA` argument has been renamed to `has.nf` to convey that it is not only related to `NA/NaN` but other non-finite values (`Inf/-Inf`) as well. |
96 | | - |
97 | | -Thanks to @jangorecki for implementation and @MichaelChirico and others for work on splitting into smaller PRs and reviews. |
98 | | -For a comprehensive description about all available features see `?froll` manual. |
99 | | - |
100 | | -Adaptive `frollmax` has observed to be around 80 times faster than second fastest solution (data.table self-join using `max` and grouping `by=.EACHI`). Note that important factor in performance is width of the rolling window. Code for the benchmark below has been taken from [this SO answer](https://stackoverflow.com/a/73408459/2490497). |
101 | | -```r |
102 | | -set.seed(108) |
103 | | -setDTthreads(16) |
104 | | -x = data.table( |
105 | | - value = cumsum(rnorm(1e6, 0.1)), |
106 | | - end_window = 1:1e6 + sample(50:500, 1e6, TRUE), |
107 | | - row = 1:1e6 |
108 | | -)[, "end_window" := pmin(end_window, .N) |
109 | | - ][, "len_window" := end_window-row+1L] |
110 | | - |
111 | | -baser = function(x) x[, mapply(function(from, to) max(value[from:to]), row, end_window)] |
112 | | -sj = function(x) x[x, max(value), on=.(row >= row, row <= end_window), by=.EACHI]$V1 |
113 | | -frmax = function(x) x[, frollmax(value, len_window, adaptive=TRUE, align="left", has.nf=FALSE)] |
114 | | -frapply = function(x) x[, frollapply(value, len_window, max, adaptive=TRUE, align="left")] |
115 | | -microbenchmark::microbenchmark( |
116 | | - baser(x), sj(x), frmax(x), frapply(x), |
117 | | - times=10, check="identical" |
118 | | -) |
119 | | -#Unit: milliseconds |
120 | | -# expr min lq mean median uq max neval |
121 | | -# baser(x) 3094.88357 3097.84966 3186.74832 3163.58050 3251.66753 3370.33785 10 |
122 | | -# sj(x) 2221.55456 2255.12083 2306.61382 2303.47883 2346.70293 2412.62975 10 |
123 | | -# frmax(x) 17.45124 24.16809 28.10062 28.58153 32.79802 34.83941 10 |
124 | | -# frapply(x) 272.07830 316.47060 366.94771 396.23566 416.06699 421.38701 10 |
125 | | -``` |
| 89 | + - new function `frollmax`, applies `max` over a rolling window. |
| 90 | + - support for `align="left"` for adaptive rolling function. |
| 91 | + - support for `adaptive=TRUE` in `frollapply`. |
| 92 | + - `partial` argument to trim window width to available observations rather than returning `NA` whenever window is not complete. |
| 93 | + - `give.names` argument that can be used to automatically give the names based on the names of `x` and `n`. |
| 94 | + - `frollmean` and `frollsum` no longer treat `Inf` and `-Inf` as `NA`s as it used to be for `algo="fast"` (breaking change). |
| 95 | + - `hasNA` argument has been renamed to `has.nf` to convey that it is not only related to `NA/NaN` but other non-finite values (`Inf/-Inf`) as well. |
| 96 | + |
| 97 | + Thanks to @jangorecki for implementation and @MichaelChirico and others for work on splitting into smaller PRs and reviews. |
| 98 | + For a comprehensive description about all available features see `?froll` manual. |
| 99 | + |
| 100 | + Adaptive `frollmax` has observed to be around 80 times faster than second fastest solution (data.table self-join using `max` and grouping `by=.EACHI`). Note that important factor in performance is width of the rolling window. Code for the benchmark below has been taken from [this SO answer](https://stackoverflow.com/a/73408459/2490497). |
| 101 | + ```r |
| 102 | + set.seed(108) |
| 103 | + setDTthreads(16) |
| 104 | + x = data.table( |
| 105 | + value = cumsum(rnorm(1e6, 0.1)), |
| 106 | + end_window = 1:1e6 + sample(50:500, 1e6, TRUE), |
| 107 | + row = 1:1e6 |
| 108 | + )[, "end_window" := pmin(end_window, .N) |
| 109 | + ][, "len_window" := end_window-row+1L] |
| 110 | + baser = function(x) x[, mapply(function(from, to) max(value[from:to]), row, end_window)] |
| 111 | + sj = function(x) x[x, max(value), on=.(row >= row, row <= end_window), by=.EACHI]$V1 |
| 112 | + frmax = function(x) x[, frollmax(value, len_window, adaptive=TRUE, align="left", has.nf=FALSE)] |
| 113 | + frapply = function(x) x[, frollapply(value, len_window, max, adaptive=TRUE, align="left")] |
| 114 | + microbenchmark::microbenchmark( |
| 115 | + baser(x), sj(x), frmax(x), frapply(x), |
| 116 | + times=10, check="identical" |
| 117 | + ) |
| 118 | + #Unit: milliseconds |
| 119 | + # expr min lq mean median uq max neval |
| 120 | + # baser(x) 3094.88357 3097.84966 3186.74832 3163.58050 3251.66753 3370.33785 10 |
| 121 | + # sj(x) 2221.55456 2255.12083 2306.61382 2303.47883 2346.70293 2412.62975 10 |
| 122 | + # frmax(x) 17.45124 24.16809 28.10062 28.58153 32.79802 34.83941 10 |
| 123 | + # frapply(x) 272.07830 316.47060 366.94771 396.23566 416.06699 421.38701 10 |
| 124 | + ``` |
126 | 125 |
|
127 | | -As of now, adaptive rolling max has no _on-line_ implementation (`algo="fast"`), it uses a naive approach (`algo="exact"`). Therefore further speed up is still possible if `algo="fast"` gets implemented. |
| 126 | + As of now, adaptive rolling max has no _on-line_ implementation (`algo="fast"`), it uses a naive approach (`algo="exact"`). Therefore further speed up is still possible if `algo="fast"` gets implemented. |
128 | 127 |
|
129 | 128 | ### BUG FIXES |
130 | 129 |
|
|
0 commit comments