Skip to content

Commit 3278bd9

Browse files
authored
Merge branch 'master' into parserFunctionRename
2 parents edbe3f5 + 88635ad commit 3278bd9

34 files changed

+3405
-885
lines changed

NAMESPACE

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -55,7 +55,10 @@ S3method(rollup, data.table)
5555
export(frollmean)
5656
export(frollsum)
5757
export(frollmax)
58+
export(frollmin)
59+
export(frollprod)
5860
export(frollapply)
61+
export(frolladapt)
5962
export(nafill)
6063
export(setnafill)
6164
export(.Last.updated)

NEWS.md

Lines changed: 127 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,11 @@
66

77
### BREAKING CHANGE
88

9-
1. Rolling functions `frollmean` and `frollsum` distinguish `Inf`/`-Inf` from `NA` to match the same rules as base R when `algo="fast"` (previously they were considered the same). If your input into those functions has `Inf` or `-Inf` then you will be affected by this change. As a result, the argument that controls the handling of `NA`s has been renamed from `hasNA` to `has.nf` (_has non-finite_). `hasNA` continues to work with a warning, for now.
9+
1. `dcast()` now errors when `fun.aggregate` returns length != 1 (consistent with documentation), regardless of `fill`, [#6629](https://github.com/Rdatatable/data.table/issues/6629). Previously, when `fill` was not `NULL`, `dcast` warned and returned an undefined result. This change has been planned since 1.16.0 (25 Aug 2024).
10+
11+
2. `melt()` returns an integer column for `variable` when `measure.vars` is a list of length=1, consistent with the documented behavior, [#5209](https://github.com/Rdatatable/data.table/issues/5209). Thanks to @tdhock for reporting. Any users who were relying on this behavior can change `measure.vars=list("col_name")` (output `variable` was column name, now is column index/integer) to `measure.vars="col_name"` (`variable` still is column name). This change has been planned since 1.16.0 (25 Aug 2024).
12+
13+
3. Rolling functions `frollmean` and `frollsum` distinguish `Inf`/`-Inf` from `NA` to match the same rules as base R when `algo="fast"` (previously they were considered the same). If your input into those functions has `Inf` or `-Inf` then you will be affected by this change. As a result, the argument that controls the handling of `NA`s has been renamed from `hasNA` to `has.nf` (_has non-finite_). `hasNA` continues to work with a warning, for now.
1014
```r
1115
## before
1216
frollsum(c(1,2,3,Inf,5,6), 2)
@@ -16,15 +20,23 @@
1620
frollsum(c(1,2,3,Inf,5,6), 2)
1721
#[1] NA 3 5 Inf Inf 11
1822

19-
### NOTICE OF INTENDED FUTURE POTENTIAL BREAKING CHANGES
23+
4. `frollapply` result is not coerced to numeric anymore. Users' code could possibly break if it depends on forced coercion of input/output to numeric type.
24+
```r
25+
## before
26+
frollapply(c(F,T,F,F,F,T), 2, any)
27+
#[1] NA 1 1 0 0 1
2028
21-
1. `data.table(x=1, <expr>)`, where `<expr>` is an expression resulting in a 1-column matrix without column names, will eventually have names `x` and `V2`, not `x` and `V1`, consistent with `data.table(x=1, <expr>)` where `<expr>` results in an atomic vector, for example `data.table(x=1, cbind(1))` and `data.table(x=1, 1)` will both have columns named `x` and `V2`. In this release, the matrix case continues to be named `V1`, but the new behavior can be activated by setting `options(datatable.old.matrix.autoname)` to `FALSE`. See point 5 under Bug Fixes for more context; this change will provide more internal consistency as well as more consistency with `data.frame()`.
29+
## now
30+
frollapply(c(F,T,F,F,F,T), 2, any)
31+
#[1] NA TRUE TRUE FALSE FALSE TRUE
32+
```
33+
Additionally argument names in `frollapply` has been renamed from `x` to `X` and `n` to `N` to avoid conflicts with common argument names that may be passed to `...`, aligning to base R API of `lapply`. `x` and `n` continue to work with a warning, for now.
2234
23-
### BREAKING CHANGE
35+
5. Negative and missing values of `n` argument of adaptive rolling functions trigger an error.
2436
25-
1. `dcast()` now errors when `fun.aggregate` returns length != 1 (consistent with documentation), regardless of `fill`, [#6629](https://github.com/Rdatatable/data.table/issues/6629). Previously, when `fill` was not `NULL`, `dcast` warned and returned an undefined result. This change has been planned since 1.16.0 (25 Aug 2024).
37+
### NOTICE OF INTENDED FUTURE POTENTIAL BREAKING CHANGES
2638
27-
2. `melt()` returns an integer column for `variable` when `measure.vars` is a list of length=1, consistent with the documented behavior, [#5209](https://github.com/Rdatatable/data.table/issues/5209). Thanks to @tdhock for reporting. Any users who were relying on this behavior can change `measure.vars=list("col_name")` (output `variable` was column name, now is column index/integer) to `measure.vars="col_name"` (`variable` still is column name). This change has been planned since 1.16.0 (25 Aug 2024).
39+
1. `data.table(x=1, <expr>)`, where `<expr>` is an expression resulting in a 1-column matrix without column names, will eventually have names `x` and `V2`, not `x` and `V1`, consistent with `data.table(x=1, <expr>)` where `<expr>` results in an atomic vector, for example `data.table(x=1, cbind(1))` and `data.table(x=1, 1)` will both have columns named `x` and `V2`. In this release, the matrix case continues to be named `V1`, but the new behavior can be activated by setting `options(datatable.old.matrix.autoname)` to `FALSE`. See point 5 under Bug Fixes for more context; this change will provide more internal consistency as well as more consistency with `data.frame()`.
2840
2941
### NEW FEATURES
3042
@@ -81,8 +93,40 @@
8193
8294
12. New `cbindlist()` and `setcbindlist()` for concatenating a `list` of data.tables column-wise, evocative of the analogous `do.call(rbind, l)` <-> `rbindlist(l)`, [#2576](https://github.com/Rdatatable/data.table/issues/2576). `setcbindlist()` does so without making any copies. Thanks @MichaelChirico for the FR, @jangorecki for the PR, and @MichaelChirico for extensive reviews and fine-tuning.
8395
96+
```r
97+
l = list(
98+
data.table(id = 1:3, a = letters[1:3]),
99+
data.table(b = 4:6, c = 7:9)
100+
)
101+
cbindlist(l)
102+
# id a b c
103+
# 1: 1 a 4 7
104+
# 2: 2 b 5 8
105+
# 3: 3 c 6 9
106+
```
107+
84108
13. New `mergelist()` and `setmergelist()` similarly work _a la_ `Reduce()` to recursively merge a `list` of data.tables, [#599](https://github.com/Rdatatable/data.table/issues/599). Different join modes (_left_, _inner_, _full_, _right_, _semi_, _anti_, and _cross_) are supported through the `how` argument; duplicate handling goes through the `mult` argument. `setmergelist()` carefully avoids copies where one is not needed, e.g. in a 1:1 left join. Thanks Patrick Nicholson for the FR (in 2013!), @jangorecki for the PR, and @MichaelChirico for extensive reviews and fine-tuning.
85109
110+
```r
111+
l = list(
112+
data.table(id = c(1L, 2L, 3L), x = c("a", "b", "c")),
113+
data.table(id = c(1L, 2L, 4L), y = c("d", "e", "f")),
114+
data.table(id = c(1L, 3L, 4L), z = c("g", "h", "i"))
115+
)
116+
117+
# Recursive inner join
118+
mergelist(l, on = "id", how = "inner")
119+
# id x y z
120+
# 1: 1 a d g
121+
122+
# Recursive left join (the default 'how')
123+
mergelist(l, on = "id", how = "left")
124+
# id x y z
125+
# 1: 1 a d g
126+
# 2: 2 b e <NA>
127+
# 3: 3 c <NA> h
128+
```
129+
86130
14. `fcoalesce()` and `setcoalesce()` gain `nan` argument to control whether `NaN` values should be treated as missing (`nan=NA`, the default) or non-missing (`nan=NaN`), [#4567](https://github.com/Rdatatable/data.table/issues/4567). This provides full compatibility with `nafill()` behavior. Thanks to @ethanbsmith for the feature request and @Mukulyadav2004 for the implementation.
87131
88132
15. New function `isoyear()` has been implemented as a complement to `isoweek()`, returning the ISO 8601 year corresponding to a given date, [#7154](https://github.com/Rdatatable/data.table/issues/7154). Thanks to @ben-schwen and @MichaelChirico for the suggestion and @venom1204 for the implementation.
@@ -127,6 +171,83 @@
127171
128172
As of now, adaptive rolling max has no _on-line_ implementation (`algo="fast"`), it uses a naive approach (`algo="exact"`). Therefore further speed up is still possible if `algo="fast"` gets implemented.
129173
174+
17. Function `frollapply` has been completely rewritten. Thanks to @jangorecki for implementation. Be sure to read `frollapply` manual before using the function. There are following changes:
175+
- all basic types are now supported on input/output, not only double. Users' code could possibly break if it depends on forced coercion of input/output to double type.
176+
- new argument `by.column` allowing to pass a multi-column subset of a data.table into a rolling function, closes [#4887](https://github.com/Rdatatable/data.table/issues/4887).
177+
```r
178+
x = data.table(v1=rnorm(120), v2=rnorm(120))
179+
f = function(x) coef(lm(v2 ~ v1, data=x))
180+
coef.fill = c("(Intercept)"=NA_real_, "v1"=NA_real_)
181+
frollapply(x, 4, f, by.column=FALSE, fill=coef.fill)
182+
# (Intercept) v1
183+
# 1: NA NA
184+
# 2: NA NA
185+
# 3: NA NA
186+
# 4: 0.65456931 0.3138012
187+
# 5: -1.07977441 -2.0588094
188+
#---
189+
#116: 0.15828417 0.3570216
190+
#117: -0.09083424 1.5494507
191+
#118: -0.18345878 0.6424837
192+
#119: -0.28964772 0.6116575
193+
#120: -0.40598313 0.6112854
194+
```
195+
- uses multiple CPU threads (on a decent OS); evaluation of UDF is inherently slow so this can be a great help.
196+
```r
197+
x = rnorm(1e5)
198+
n = 500
199+
setDTthreads(1)
200+
system.time(
201+
th1 <- frollapply(x, n, median, simplify=unlist)
202+
)
203+
# user system elapsed
204+
# 3.078 0.005 3.084
205+
setDTthreads(4)
206+
system.time(
207+
th4 <- frollapply(x, n, median, simplify=unlist)
208+
)
209+
# user system elapsed
210+
# 2.453 0.135 0.897
211+
all.equal(th1, th4)
212+
#[1] TRUE
213+
```
214+
215+
18. New helper `frolladapt` to facilitate applying rolling functions over windows of fixed calendar-time width in irregularly-spaced data sets, thereby bypassing the need to "augment" such data with placeholder rows, [#3241](https://github.com/Rdatatable/data.table/issues/3241). Thanks to @jangorecki for implementation.
216+
```r
217+
idx = as.Date("2025-09-05") + c(0,4,7,8,9,10,12,13,17)
218+
dt = data.table(index=idx, value=seq_along(idx))
219+
dt
220+
# index value
221+
# <Date> <int>
222+
#1: 2025-09-05 1
223+
#2: 2025-09-09 2
224+
#3: 2025-09-12 3
225+
#4: 2025-09-13 4
226+
#5: 2025-09-14 5
227+
#6: 2025-09-15 6
228+
#7: 2025-09-17 7
229+
#8: 2025-09-18 8
230+
#9: 2025-09-22 9
231+
dt[, c("rollmean3","rollmean3days") := list(
232+
frollmean(value, 3),
233+
frollmean(value, frolladapt(index, 3), adaptive=TRUE)
234+
)]
235+
dt
236+
# index value rollmean3 rollmean3days
237+
# <Date> <int> <num> <num>
238+
#1: 2025-09-05 1 NA NA
239+
#2: 2025-09-09 2 NA 2.0
240+
#3: 2025-09-12 3 2 3.0
241+
#4: 2025-09-13 4 3 3.5
242+
#5: 2025-09-14 5 4 4.0
243+
#6: 2025-09-15 6 5 5.0
244+
#7: 2025-09-17 7 6 6.5
245+
#8: 2025-09-18 8 7 7.5
246+
#9: 2025-09-22 9 8 9.0
247+
```
248+
249+
19. New rolling functions, `frollmin` and `frollprod`, have been implemented, towards [#2778](https://github.com/Rdatatable/data.table/issues/2778). Thanks to @jangorecki for implementation.
250+
130251
### BUG FIXES
131252

132253
1. `fread()` no longer warns on certain systems on R 4.5.0+ where the file owner can't be resolved, [#6918](https://github.com/Rdatatable/data.table/issues/6918). Thanks @ProfFancyPants for the report and PR.

0 commit comments

Comments
 (0)