|
20 | 20 | frollsum(c(1,2,3,Inf,5,6), 2) |
21 | 21 | #[1] NA 3 5 Inf Inf 11 |
22 | 22 |
|
| 23 | +4. `frollapply` result is not coerced to numeric anymore. Users' code could possibly break if it depends on forced coercion of input/output to numeric type. |
| 24 | + ```r |
| 25 | + ## before |
| 26 | + frollapply(c(F,T,F,F,F,T), 2, any) |
| 27 | + #[1] NA 1 1 0 0 1 |
| 28 | +
|
| 29 | + ## now |
| 30 | + frollapply(c(F,T,F,F,F,T), 2, any) |
| 31 | + #[1] NA TRUE TRUE FALSE FALSE TRUE |
| 32 | + ``` |
| 33 | + Additionally argument names in `frollapply` has been renamed from `x` to `X` and `n` to `N` to avoid conflicts with common argument names that may be passed to `...`, aligning to base R API of `lapply`. `x` and `n` continue to work with a warning, for now. |
| 34 | +
|
| 35 | +5. Negative and missing values of `n` argument of adaptive rolling functions trigger an error. |
| 36 | +
|
23 | 37 | ### NOTICE OF INTENDED FUTURE POTENTIAL BREAKING CHANGES |
24 | 38 |
|
25 | 39 | 1. `data.table(x=1, <expr>)`, where `<expr>` is an expression resulting in a 1-column matrix without column names, will eventually have names `x` and `V2`, not `x` and `V1`, consistent with `data.table(x=1, <expr>)` where `<expr>` results in an atomic vector, for example `data.table(x=1, cbind(1))` and `data.table(x=1, 1)` will both have columns named `x` and `V2`. In this release, the matrix case continues to be named `V1`, but the new behavior can be activated by setting `options(datatable.old.matrix.autoname)` to `FALSE`. See point 5 under Bug Fixes for more context; this change will provide more internal consistency as well as more consistency with `data.frame()`. |
|
93 | 107 |
|
94 | 108 | 13. New `mergelist()` and `setmergelist()` similarly work _a la_ `Reduce()` to recursively merge a `list` of data.tables, [#599](https://github.com/Rdatatable/data.table/issues/599). Different join modes (_left_, _inner_, _full_, _right_, _semi_, _anti_, and _cross_) are supported through the `how` argument; duplicate handling goes through the `mult` argument. `setmergelist()` carefully avoids copies where one is not needed, e.g. in a 1:1 left join. Thanks Patrick Nicholson for the FR (in 2013!), @jangorecki for the PR, and @MichaelChirico for extensive reviews and fine-tuning. |
95 | 109 |
|
96 | | -```r |
| 110 | + ```r |
97 | 111 | l = list( |
98 | 112 | data.table(id = c(1L, 2L, 3L), x = c("a", "b", "c")), |
99 | 113 | data.table(id = c(1L, 2L, 4L), y = c("d", "e", "f")), |
|
157 | 171 |
|
158 | 172 | As of now, adaptive rolling max has no _on-line_ implementation (`algo="fast"`), it uses a naive approach (`algo="exact"`). Therefore further speed up is still possible if `algo="fast"` gets implemented. |
159 | 173 |
|
| 174 | +17. Function `frollapply` has been completely rewritten. Thanks to @jangorecki for implementation. Be sure to read `frollapply` manual before using the function. There are following changes: |
| 175 | + - all basic types are now supported on input/output, not only double. Users' code could possibly break if it depends on forced coercion of input/output to double type. |
| 176 | + - new argument `by.column` allowing to pass a multi-column subset of a data.table into a rolling function, closes [#4887](https://github.com/Rdatatable/data.table/issues/4887). |
| 177 | + ```r |
| 178 | + x = data.table(v1=rnorm(120), v2=rnorm(120)) |
| 179 | + f = function(x) coef(lm(v2 ~ v1, data=x)) |
| 180 | + frollapply(x, 4, f, by.column=FALSE) |
| 181 | + # (Intercept) v1 |
| 182 | + # <num> <num> |
| 183 | + # 1: NA NA |
| 184 | + # 2: NA NA |
| 185 | + # 3: NA NA |
| 186 | + # 4: -0.04648236 -0.6349687 |
| 187 | + # 5: 0.09208733 -0.4964023 |
| 188 | + #--- |
| 189 | + #116: -0.21169439 0.7421358 |
| 190 | + #117: -0.19729119 0.4926939 |
| 191 | + #118: -0.04217896 0.0452713 |
| 192 | + #119: 0.22472549 -0.5245874 |
| 193 | + #120: 0.54540359 -0.1638333 |
| 194 | + ``` |
| 195 | + - uses multiple CPU threads (on a decent OS); evaluation of UDF is inherently slow so this can be a great help. |
| 196 | + ```r |
| 197 | + x = rnorm(1e5) |
| 198 | + n = 500 |
| 199 | + setDTthreads(1) |
| 200 | + system.time( |
| 201 | + th1 <- frollapply(x, n, median, simplify=unlist) |
| 202 | + ) |
| 203 | + # user system elapsed |
| 204 | + # 3.078 0.005 3.084 |
| 205 | + setDTthreads(4) |
| 206 | + system.time( |
| 207 | + th4 <- frollapply(x, n, median, simplify=unlist) |
| 208 | + ) |
| 209 | + # user system elapsed |
| 210 | + # 2.453 0.135 0.897 |
| 211 | + all.equal(th1, th4) |
| 212 | + #[1] TRUE |
| 213 | + ``` |
| 214 | + |
| 215 | +18. New helper `frolladapt` to facilitate applying rolling functions over windows of fixed calendar-time width in irregularly-spaced data sets, thereby bypassing the need to "augment" such data with placeholder rows, [#3241](https://github.com/Rdatatable/data.table/issues/3241). Thanks to @jangorecki for implementation. |
| 216 | + ```r |
| 217 | + idx = as.Date("2025-09-05") + c(0,4,7,8,9,10,12,13,17) |
| 218 | + dt = data.table(index=idx, value=seq_along(idx)) |
| 219 | + dt |
| 220 | + # index value |
| 221 | + # <Date> <int> |
| 222 | + #1: 2025-09-05 1 |
| 223 | + #2: 2025-09-09 2 |
| 224 | + #3: 2025-09-12 3 |
| 225 | + #4: 2025-09-13 4 |
| 226 | + #5: 2025-09-14 5 |
| 227 | + #6: 2025-09-15 6 |
| 228 | + #7: 2025-09-17 7 |
| 229 | + #8: 2025-09-18 8 |
| 230 | + #9: 2025-09-22 9 |
| 231 | + dt[, c("rollmean3","rollmean3days") := list( |
| 232 | + frollmean(value, 3), |
| 233 | + frollmean(value, frolladapt(index, 3), adaptive=TRUE) |
| 234 | + )] |
| 235 | + dt |
| 236 | + # index value rollmean3 rollmean3days |
| 237 | + # <Date> <int> <num> <num> |
| 238 | + #1: 2025-09-05 1 NA NA |
| 239 | + #2: 2025-09-09 2 NA 2.0 |
| 240 | + #3: 2025-09-12 3 2 3.0 |
| 241 | + #4: 2025-09-13 4 3 3.5 |
| 242 | + #5: 2025-09-14 5 4 4.0 |
| 243 | + #6: 2025-09-15 6 5 5.0 |
| 244 | + #7: 2025-09-17 7 6 6.5 |
| 245 | + #8: 2025-09-18 8 7 7.5 |
| 246 | + #9: 2025-09-22 9 8 9.0 |
| 247 | + ``` |
| 248 | + |
| 249 | +19. New rolling functions, `frollmin` and `frollprod`, have been implemented, towards [#2778](https://github.com/Rdatatable/data.table/issues/2778). Thanks to @jangorecki for implementation. |
| 250 | + |
160 | 251 | ### BUG FIXES |
161 | 252 |
|
162 | 253 | 1. `fread()` no longer warns on certain systems on R 4.5.0+ where the file owner can't be resolved, [#6918](https://github.com/Rdatatable/data.table/issues/6918). Thanks @ProfFancyPants for the report and PR. |
|
222 | 313 |
|
223 | 314 | 6. Using a double vector in `set()`'s `i=` and/or `j=` no longer throws a warning about preferring integer, [#6594](https://github.com/Rdatatable/data.table/issues/6594). While it may improve efficiency to use integer, there's no guarantee it's an improvement and the difference is likely to be minimal. The coercion will still be reported under `datatable.verbose=TRUE`. For package/production use cases, static analyzers such as `lintr::implicit_integer_linter()` can also report when numeric literals should be rewritten as integer literals. |
224 | 315 |
|
| 316 | +7. In rare situations a data.table object may lose its internal attribute that holds a self-reference. New helper function `.selfref.ok()` tests just that. It is only intended for technical use cases. See manual for examples. |
| 317 | + |
225 | 318 | ## data.table [v1.17.8](https://github.com/Rdatatable/data.table/milestone/41) (6 July 2025) |
226 | 319 |
|
227 | 320 | 1. Internal functions used to signal errors are now marked as non-returning, silencing a compiler warning about potentially unchecked allocation failure. Thanks to Prof. Brian D. Ripley for the report and @aitap for the fix, [#7070](https://github.com/Rdatatable/data.table/pull/7070). |
|
0 commit comments