|
6 | 6 |
|
7 | 7 | ### BREAKING CHANGE |
8 | 8 |
|
9 | | -1. Rolling functions `frollmean` and `frollsum` distinguish `Inf`/`-Inf` from `NA` to match the same rules as base R when `algo="fast"` (previously they were considered the same). If your input into those functions has `Inf` or `-Inf` then you will be affected by this change. As a result, the argument that controls the handling of `NA`s has been renamed from `hasNA` to `has.nf` (_has non-finite_). `hasNA` continues to work with a warning, for now. |
| 9 | +1. `dcast()` now errors when `fun.aggregate` returns length != 1 (consistent with documentation), regardless of `fill`, [#6629](https://github.com/Rdatatable/data.table/issues/6629). Previously, when `fill` was not `NULL`, `dcast` warned and returned an undefined result. This change has been planned since 1.16.0 (25 Aug 2024). |
| 10 | + |
| 11 | +2. `melt()` returns an integer column for `variable` when `measure.vars` is a list of length=1, consistent with the documented behavior, [#5209](https://github.com/Rdatatable/data.table/issues/5209). Thanks to @tdhock for reporting. Any users who were relying on this behavior can change `measure.vars=list("col_name")` (output `variable` was column name, now is column index/integer) to `measure.vars="col_name"` (`variable` still is column name). This change has been planned since 1.16.0 (25 Aug 2024). |
| 12 | + |
| 13 | +3. Rolling functions `frollmean` and `frollsum` distinguish `Inf`/`-Inf` from `NA` to match the same rules as base R when `algo="fast"` (previously they were considered the same). If your input into those functions has `Inf` or `-Inf` then you will be affected by this change. As a result, the argument that controls the handling of `NA`s has been renamed from `hasNA` to `has.nf` (_has non-finite_). `hasNA` continues to work with a warning, for now. |
10 | 14 | ```r |
11 | 15 | ## before |
12 | 16 | frollsum(c(1,2,3,Inf,5,6), 2) |
|
16 | 20 | frollsum(c(1,2,3,Inf,5,6), 2) |
17 | 21 | #[1] NA 3 5 Inf Inf 11 |
18 | 22 |
|
19 | | -### NOTICE OF INTENDED FUTURE POTENTIAL BREAKING CHANGES |
| 23 | +4. `frollapply` result is not coerced to numeric anymore. Users' code could possibly break if it depends on forced coercion of input/output to numeric type. |
| 24 | + ```r |
| 25 | + ## before |
| 26 | + frollapply(c(F,T,F,F,F,T), 2, any) |
| 27 | + #[1] NA 1 1 0 0 1 |
20 | 28 |
|
21 | | -1. `data.table(x=1, <expr>)`, where `<expr>` is an expression resulting in a 1-column matrix without column names, will eventually have names `x` and `V2`, not `x` and `V1`, consistent with `data.table(x=1, <expr>)` where `<expr>` results in an atomic vector, for example `data.table(x=1, cbind(1))` and `data.table(x=1, 1)` will both have columns named `x` and `V2`. In this release, the matrix case continues to be named `V1`, but the new behavior can be activated by setting `options(datatable.old.matrix.autoname)` to `FALSE`. See point 5 under Bug Fixes for more context; this change will provide more internal consistency as well as more consistency with `data.frame()`. |
| 29 | + ## now |
| 30 | + frollapply(c(F,T,F,F,F,T), 2, any) |
| 31 | + #[1] NA TRUE TRUE FALSE FALSE TRUE |
| 32 | + ``` |
| 33 | + Additionally argument names in `frollapply` has been renamed from `x` to `X` and `n` to `N` to avoid conflicts with common argument names that may be passed to `...`, aligning to base R API of `lapply`. `x` and `n` continue to work with a warning, for now. |
22 | 34 |
|
23 | | -### BREAKING CHANGE |
| 35 | +5. Negative and missing values of `n` argument of adaptive rolling functions trigger an error. |
24 | 36 |
|
25 | | -1. `dcast()` now errors when `fun.aggregate` returns length != 1 (consistent with documentation), regardless of `fill`, [#6629](https://github.com/Rdatatable/data.table/issues/6629). Previously, when `fill` was not `NULL`, `dcast` warned and returned an undefined result. This change has been planned since 1.16.0 (25 Aug 2024). |
| 37 | +### NOTICE OF INTENDED FUTURE POTENTIAL BREAKING CHANGES |
26 | 38 |
|
27 | | -2. `melt()` returns an integer column for `variable` when `measure.vars` is a list of length=1, consistent with the documented behavior, [#5209](https://github.com/Rdatatable/data.table/issues/5209). Thanks to @tdhock for reporting. Any users who were relying on this behavior can change `measure.vars=list("col_name")` (output `variable` was column name, now is column index/integer) to `measure.vars="col_name"` (`variable` still is column name). This change has been planned since 1.16.0 (25 Aug 2024). |
| 39 | +1. `data.table(x=1, <expr>)`, where `<expr>` is an expression resulting in a 1-column matrix without column names, will eventually have names `x` and `V2`, not `x` and `V1`, consistent with `data.table(x=1, <expr>)` where `<expr>` results in an atomic vector, for example `data.table(x=1, cbind(1))` and `data.table(x=1, 1)` will both have columns named `x` and `V2`. In this release, the matrix case continues to be named `V1`, but the new behavior can be activated by setting `options(datatable.old.matrix.autoname)` to `FALSE`. See point 5 under Bug Fixes for more context; this change will provide more internal consistency as well as more consistency with `data.frame()`. |
28 | 40 |
|
29 | 41 | ### NEW FEATURES |
30 | 42 |
|
|
81 | 93 |
|
82 | 94 | 12. New `cbindlist()` and `setcbindlist()` for concatenating a `list` of data.tables column-wise, evocative of the analogous `do.call(rbind, l)` <-> `rbindlist(l)`, [#2576](https://github.com/Rdatatable/data.table/issues/2576). `setcbindlist()` does so without making any copies. Thanks @MichaelChirico for the FR, @jangorecki for the PR, and @MichaelChirico for extensive reviews and fine-tuning. |
83 | 95 |
|
| 96 | + ```r |
| 97 | + l = list( |
| 98 | + data.table(id = 1:3, a = letters[1:3]), |
| 99 | + data.table(b = 4:6, c = 7:9) |
| 100 | + ) |
| 101 | + cbindlist(l) |
| 102 | + # id a b c |
| 103 | + # 1: 1 a 4 7 |
| 104 | + # 2: 2 b 5 8 |
| 105 | + # 3: 3 c 6 9 |
| 106 | + ``` |
| 107 | +
|
84 | 108 | 13. New `mergelist()` and `setmergelist()` similarly work _a la_ `Reduce()` to recursively merge a `list` of data.tables, [#599](https://github.com/Rdatatable/data.table/issues/599). Different join modes (_left_, _inner_, _full_, _right_, _semi_, _anti_, and _cross_) are supported through the `how` argument; duplicate handling goes through the `mult` argument. `setmergelist()` carefully avoids copies where one is not needed, e.g. in a 1:1 left join. Thanks Patrick Nicholson for the FR (in 2013!), @jangorecki for the PR, and @MichaelChirico for extensive reviews and fine-tuning. |
85 | 109 |
|
| 110 | + ```r |
| 111 | + l = list( |
| 112 | + data.table(id = c(1L, 2L, 3L), x = c("a", "b", "c")), |
| 113 | + data.table(id = c(1L, 2L, 4L), y = c("d", "e", "f")), |
| 114 | + data.table(id = c(1L, 3L, 4L), z = c("g", "h", "i")) |
| 115 | + ) |
| 116 | +
|
| 117 | + # Recursive inner join |
| 118 | + mergelist(l, on = "id", how = "inner") |
| 119 | + # id x y z |
| 120 | + # 1: 1 a d g |
| 121 | +
|
| 122 | + # Recursive left join (the default 'how') |
| 123 | + mergelist(l, on = "id", how = "left") |
| 124 | + # id x y z |
| 125 | + # 1: 1 a d g |
| 126 | + # 2: 2 b e <NA> |
| 127 | + # 3: 3 c <NA> h |
| 128 | + ``` |
| 129 | +
|
86 | 130 | 14. `fcoalesce()` and `setcoalesce()` gain `nan` argument to control whether `NaN` values should be treated as missing (`nan=NA`, the default) or non-missing (`nan=NaN`), [#4567](https://github.com/Rdatatable/data.table/issues/4567). This provides full compatibility with `nafill()` behavior. Thanks to @ethanbsmith for the feature request and @Mukulyadav2004 for the implementation. |
87 | 131 |
|
88 | 132 | 15. New function `isoyear()` has been implemented as a complement to `isoweek()`, returning the ISO 8601 year corresponding to a given date, [#7154](https://github.com/Rdatatable/data.table/issues/7154). Thanks to @ben-schwen and @MichaelChirico for the suggestion and @venom1204 for the implementation. |
|
127 | 171 |
|
128 | 172 | As of now, adaptive rolling max has no _on-line_ implementation (`algo="fast"`), it uses a naive approach (`algo="exact"`). Therefore further speed up is still possible if `algo="fast"` gets implemented. |
129 | 173 |
|
| 174 | +17. Function `frollapply` has been completely rewritten. Thanks to @jangorecki for implementation. Be sure to read `frollapply` manual before using the function. There are following changes: |
| 175 | + - all basic types are now supported on input/output, not only double. Users' code could possibly break if it depends on forced coercion of input/output to double type. |
| 176 | + - new argument `by.column` allowing to pass a multi-column subset of a data.table into a rolling function, closes [#4887](https://github.com/Rdatatable/data.table/issues/4887). |
| 177 | + ```r |
| 178 | + x = data.table(v1=rnorm(120), v2=rnorm(120)) |
| 179 | + f = function(x) coef(lm(v2 ~ v1, data=x)) |
| 180 | + coef.fill = c("(Intercept)"=NA_real_, "v1"=NA_real_) |
| 181 | + frollapply(x, 4, f, by.column=FALSE, fill=coef.fill) |
| 182 | + # (Intercept) v1 |
| 183 | + # 1: NA NA |
| 184 | + # 2: NA NA |
| 185 | + # 3: NA NA |
| 186 | + # 4: 0.65456931 0.3138012 |
| 187 | + # 5: -1.07977441 -2.0588094 |
| 188 | + #--- |
| 189 | + #116: 0.15828417 0.3570216 |
| 190 | + #117: -0.09083424 1.5494507 |
| 191 | + #118: -0.18345878 0.6424837 |
| 192 | + #119: -0.28964772 0.6116575 |
| 193 | + #120: -0.40598313 0.6112854 |
| 194 | + ``` |
| 195 | + - uses multiple CPU threads (on a decent OS); evaluation of UDF is inherently slow so this can be a great help. |
| 196 | + ```r |
| 197 | + x = rnorm(1e5) |
| 198 | + n = 500 |
| 199 | + setDTthreads(1) |
| 200 | + system.time( |
| 201 | + th1 <- frollapply(x, n, median, simplify=unlist) |
| 202 | + ) |
| 203 | + # user system elapsed |
| 204 | + # 3.078 0.005 3.084 |
| 205 | + setDTthreads(4) |
| 206 | + system.time( |
| 207 | + th4 <- frollapply(x, n, median, simplify=unlist) |
| 208 | + ) |
| 209 | + # user system elapsed |
| 210 | + # 2.453 0.135 0.897 |
| 211 | + all.equal(th1, th4) |
| 212 | + #[1] TRUE |
| 213 | + ``` |
| 214 | + |
| 215 | +18. New helper `frolladapt` to facilitate applying rolling functions over windows of fixed calendar-time width in irregularly-spaced data sets, thereby bypassing the need to "augment" such data with placeholder rows, [#3241](https://github.com/Rdatatable/data.table/issues/3241). Thanks to @jangorecki for implementation. |
| 216 | + ```r |
| 217 | + idx = as.Date("2025-09-05") + c(0,4,7,8,9,10,12,13,17) |
| 218 | + dt = data.table(index=idx, value=seq_along(idx)) |
| 219 | + dt |
| 220 | + # index value |
| 221 | + # <Date> <int> |
| 222 | + #1: 2025-09-05 1 |
| 223 | + #2: 2025-09-09 2 |
| 224 | + #3: 2025-09-12 3 |
| 225 | + #4: 2025-09-13 4 |
| 226 | + #5: 2025-09-14 5 |
| 227 | + #6: 2025-09-15 6 |
| 228 | + #7: 2025-09-17 7 |
| 229 | + #8: 2025-09-18 8 |
| 230 | + #9: 2025-09-22 9 |
| 231 | + dt[, c("rollmean3","rollmean3days") := list( |
| 232 | + frollmean(value, 3), |
| 233 | + frollmean(value, frolladapt(index, 3), adaptive=TRUE) |
| 234 | + )] |
| 235 | + dt |
| 236 | + # index value rollmean3 rollmean3days |
| 237 | + # <Date> <int> <num> <num> |
| 238 | + #1: 2025-09-05 1 NA NA |
| 239 | + #2: 2025-09-09 2 NA 2.0 |
| 240 | + #3: 2025-09-12 3 2 3.0 |
| 241 | + #4: 2025-09-13 4 3 3.5 |
| 242 | + #5: 2025-09-14 5 4 4.0 |
| 243 | + #6: 2025-09-15 6 5 5.0 |
| 244 | + #7: 2025-09-17 7 6 6.5 |
| 245 | + #8: 2025-09-18 8 7 7.5 |
| 246 | + #9: 2025-09-22 9 8 9.0 |
| 247 | + ``` |
| 248 | + |
| 249 | +19. New rolling functions, `frollmin` and `frollprod`, have been implemented, towards [#2778](https://github.com/Rdatatable/data.table/issues/2778). Thanks to @jangorecki for implementation. |
| 250 | + |
130 | 251 | ### BUG FIXES |
131 | 252 |
|
132 | 253 | 1. `fread()` no longer warns on certain systems on R 4.5.0+ where the file owner can't be resolved, [#6918](https://github.com/Rdatatable/data.table/issues/6918). Thanks @ProfFancyPants for the report and PR. |
|
0 commit comments