Skip to content

Commit de27098

Browse files
authored
Merge branch 'master' into litedown
2 parents cc9bf5d + f1d5c27 commit de27098

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

66 files changed

+4687
-1318
lines changed

.Rbuildignore

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -19,6 +19,7 @@
1919
^\.github$
2020
^\.vscode$
2121
^\.zed$
22+
^\.lintr$
2223

2324
^\.gitlab-ci\.yml$
2425

.github/workflows/R-CMD-check.yaml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -36,6 +36,7 @@ jobs:
3636
R_REMOTES_NO_ERRORS_FROM_WARNINGS: true
3737
RSPM: ${{ matrix.config.rspm }}
3838
GITHUB_PAT: ${{ secrets.GITHUB_TOKEN }}
39+
_R_CHECK_RD_CHECKRD_MINLEVEL_: -Inf
3940

4041
steps:
4142
- uses: actions/checkout@v4

GOVERNANCE.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
Governance for the R data.table project
1+
# Governance for the R data.table project
22

33
# Purpose and scope
44

@@ -121,7 +121,7 @@ Funds acquired by the data.table project will be disbursed at the discretion of
121121

122122
# Code of conduct
123123

124-
The full Code of Conduct can be found [here](CODE_OF_CONDUCT.md), including details for reporting violations.
124+
The full Code of Conduct can be found [here](.github/CODE_OF_CONDUCT.md), including details for reporting violations.
125125

126126
## Reporting Responsibility
127127

NAMESPACE

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -54,7 +54,9 @@ S3method(cube, data.table)
5454
S3method(rollup, data.table)
5555
export(frollmean)
5656
export(frollsum)
57+
export(frollmax)
5758
export(frollapply)
59+
export(frolladapt)
5860
export(nafill)
5961
export(setnafill)
6062
export(.Last.updated)
@@ -158,6 +160,7 @@ if (getRversion() >= "3.6.0") {
158160
export(as.IDate,as.ITime,IDateTime)
159161
export(second,minute,hour,yday,wday,mday,week,isoweek,isoyear,month,quarter,year,yearmon,yearqtr)
160162

163+
if (getRversion() >= "4.3.0") S3method(chooseOpsMethod, IDate)
161164
S3method("[", ITime)
162165
S3method("+", IDate)
163166
S3method("-", IDate)

NEWS.md

Lines changed: 183 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,36 @@
44

55
## data.table [v1.17.99](https://github.com/Rdatatable/data.table/milestone/35) (in development)
66

7+
### BREAKING CHANGE
8+
9+
1. `dcast()` now errors when `fun.aggregate` returns length != 1 (consistent with documentation), regardless of `fill`, [#6629](https://github.com/Rdatatable/data.table/issues/6629). Previously, when `fill` was not `NULL`, `dcast` warned and returned an undefined result. This change has been planned since 1.16.0 (25 Aug 2024).
10+
11+
2. `melt()` returns an integer column for `variable` when `measure.vars` is a list of length=1, consistent with the documented behavior, [#5209](https://github.com/Rdatatable/data.table/issues/5209). Thanks to @tdhock for reporting. Any users who were relying on this behavior can change `measure.vars=list("col_name")` (output `variable` was column name, now is column index/integer) to `measure.vars="col_name"` (`variable` still is column name). This change has been planned since 1.16.0 (25 Aug 2024).
12+
13+
3. Rolling functions `frollmean` and `frollsum` distinguish `Inf`/`-Inf` from `NA` to match the same rules as base R when `algo="fast"` (previously they were considered the same). If your input into those functions has `Inf` or `-Inf` then you will be affected by this change. As a result, the argument that controls the handling of `NA`s has been renamed from `hasNA` to `has.nf` (_has non-finite_). `hasNA` continues to work with a warning, for now.
14+
```r
15+
## before
16+
frollsum(c(1,2,3,Inf,5,6), 2)
17+
#[1] NA 3 5 NA NA 11
18+
19+
## now
20+
frollsum(c(1,2,3,Inf,5,6), 2)
21+
#[1] NA 3 5 Inf Inf 11
22+
23+
4. `frollapply` result is not coerced to numeric anymore. Users' code could possibly break if it depends on forced coercion of input/output to numeric type.
24+
```r
25+
## before
26+
frollapply(c(F,T,F,F,F,T), 2, any)
27+
#[1] NA 1 1 0 0 1
28+
29+
## now
30+
frollapply(c(F,T,F,F,F,T), 2, any)
31+
#[1] NA TRUE TRUE FALSE FALSE TRUE
32+
```
33+
Additionally argument names in `frollapply` has been renamed from `x` to `X` and `n` to `N` to avoid conflicts with common argument names that may be passed to `...`, aligning to base R API of `lapply`. `x` and `n` continue to work with a warning, for now.
34+
35+
5. Negative and missing values of `n` argument of adaptive rolling functions trigger an error.
36+
737
### NOTICE OF INTENDED FUTURE POTENTIAL BREAKING CHANGES
838
939
1. `data.table(x=1, <expr>)`, where `<expr>` is an expression resulting in a 1-column matrix without column names, will eventually have names `x` and `V2`, not `x` and `V1`, consistent with `data.table(x=1, <expr>)` where `<expr>` results in an atomic vector, for example `data.table(x=1, cbind(1))` and `data.table(x=1, 1)` will both have columns named `x` and `V2`. In this release, the matrix case continues to be named `V1`, but the new behavior can be activated by setting `options(datatable.old.matrix.autoname)` to `FALSE`. See point 5 under Bug Fixes for more context; this change will provide more internal consistency as well as more consistency with `data.frame()`.
@@ -63,12 +93,159 @@
6393
6494
12. New `cbindlist()` and `setcbindlist()` for concatenating a `list` of data.tables column-wise, evocative of the analogous `do.call(rbind, l)` <-> `rbindlist(l)`, [#2576](https://github.com/Rdatatable/data.table/issues/2576). `setcbindlist()` does so without making any copies. Thanks @MichaelChirico for the FR, @jangorecki for the PR, and @MichaelChirico for extensive reviews and fine-tuning.
6595
96+
```r
97+
l = list(
98+
data.table(id = 1:3, a = letters[1:3]),
99+
data.table(b = 4:6, c = 7:9)
100+
)
101+
cbindlist(l)
102+
# id a b c
103+
# 1: 1 a 4 7
104+
# 2: 2 b 5 8
105+
# 3: 3 c 6 9
106+
```
107+
66108
13. New `mergelist()` and `setmergelist()` similarly work _a la_ `Reduce()` to recursively merge a `list` of data.tables, [#599](https://github.com/Rdatatable/data.table/issues/599). Different join modes (_left_, _inner_, _full_, _right_, _semi_, _anti_, and _cross_) are supported through the `how` argument; duplicate handling goes through the `mult` argument. `setmergelist()` carefully avoids copies where one is not needed, e.g. in a 1:1 left join. Thanks Patrick Nicholson for the FR (in 2013!), @jangorecki for the PR, and @MichaelChirico for extensive reviews and fine-tuning.
67109
110+
```r
111+
l = list(
112+
data.table(id = c(1L, 2L, 3L), x = c("a", "b", "c")),
113+
data.table(id = c(1L, 2L, 4L), y = c("d", "e", "f")),
114+
data.table(id = c(1L, 3L, 4L), z = c("g", "h", "i"))
115+
)
116+
117+
# Recursive inner join
118+
mergelist(l, on = "id", how = "inner")
119+
# id x y z
120+
# 1: 1 a d g
121+
122+
# Recursive left join (the default 'how')
123+
mergelist(l, on = "id", how = "left")
124+
# id x y z
125+
# 1: 1 a d g
126+
# 2: 2 b e <NA>
127+
# 3: 3 c <NA> h
128+
```
129+
68130
14. `fcoalesce()` and `setcoalesce()` gain `nan` argument to control whether `NaN` values should be treated as missing (`nan=NA`, the default) or non-missing (`nan=NaN`), [#4567](https://github.com/Rdatatable/data.table/issues/4567). This provides full compatibility with `nafill()` behavior. Thanks to @ethanbsmith for the feature request and @Mukulyadav2004 for the implementation.
69131
70132
15. New function `isoyear()` has been implemented as a complement to `isoweek()`, returning the ISO 8601 year corresponding to a given date, [#7154](https://github.com/Rdatatable/data.table/issues/7154). Thanks to @ben-schwen and @MichaelChirico for the suggestion and @venom1204 for the implementation.
71133
134+
16. Multiple improvements have been added to rolling functions. Request came from @gpierard who needed left aligned, adaptive, rolling max, [#5438](https://github.com/Rdatatable/data.table/issues/5438). There was no `frollmax` function yet. Adaptive rolling functions did not have support for `align="left"`. `frollapply` did not support `adaptive=TRUE`. Available alternatives were base R `mapply` or self-join using `max` and grouping `by=.EACHI`. As a follow up of his request, the following features have been added:
135+
- new function `frollmax`, applies `max` over a rolling window.
136+
- support for `align="left"` for adaptive rolling function.
137+
- support for `adaptive=TRUE` in `frollapply`.
138+
- `partial` argument to trim window width to available observations rather than returning `NA` whenever window is not complete.
139+
- `give.names` argument that can be used to automatically give the names based on the names of `x` and `n`.
140+
- `frollmean` and `frollsum` no longer treat `Inf` and `-Inf` as `NA`s as it used to be for `algo="fast"` (breaking change).
141+
- `hasNA` argument has been renamed to `has.nf` to convey that it is not only related to `NA/NaN` but other non-finite values (`Inf/-Inf`) as well.
142+
143+
Thanks to @jangorecki for implementation and @MichaelChirico and others for work on splitting into smaller PRs and reviews.
144+
For a comprehensive description about all available features see `?froll` manual.
145+
146+
Adaptive `frollmax` has observed to be around 80 times faster than second fastest solution (data.table self-join using `max` and grouping `by=.EACHI`). Note that important factor in performance is width of the rolling window. Code for the benchmark below has been taken from [this SO answer](https://stackoverflow.com/a/73408459/2490497).
147+
```r
148+
set.seed(108)
149+
setDTthreads(16)
150+
x = data.table(
151+
value = cumsum(rnorm(1e6, 0.1)),
152+
end_window = 1:1e6 + sample(50:500, 1e6, TRUE),
153+
row = 1:1e6
154+
)[, "end_window" := pmin(end_window, .N)
155+
][, "len_window" := end_window-row+1L]
156+
baser = function(x) x[, mapply(function(from, to) max(value[from:to]), row, end_window)]
157+
sj = function(x) x[x, max(value), on=.(row >= row, row <= end_window), by=.EACHI]$V1
158+
frmax = function(x) x[, frollmax(value, len_window, adaptive=TRUE, align="left", has.nf=FALSE)]
159+
frapply = function(x) x[, frollapply(value, len_window, max, adaptive=TRUE, align="left")]
160+
microbenchmark::microbenchmark(
161+
baser(x), sj(x), frmax(x), frapply(x),
162+
times=10, check="identical"
163+
)
164+
#Unit: milliseconds
165+
# expr min lq mean median uq max neval
166+
# baser(x) 3094.88357 3097.84966 3186.74832 3163.58050 3251.66753 3370.33785 10
167+
# sj(x) 2221.55456 2255.12083 2306.61382 2303.47883 2346.70293 2412.62975 10
168+
# frmax(x) 17.45124 24.16809 28.10062 28.58153 32.79802 34.83941 10
169+
# frapply(x) 272.07830 316.47060 366.94771 396.23566 416.06699 421.38701 10
170+
```
171+
172+
As of now, adaptive rolling max has no _on-line_ implementation (`algo="fast"`), it uses a naive approach (`algo="exact"`). Therefore further speed up is still possible if `algo="fast"` gets implemented.
173+
174+
17. Function `frollapply` has been completely rewritten. Thanks to @jangorecki for implementation. Be sure to read `frollapply` manual before using the function. There are following changes:
175+
- all basic types are now supported on input/output, not only double. Users' code could possibly break if it depends on forced coercion of input/output to double type.
176+
- new argument `by.column` allowing to pass a multi-column subset of a data.table into a rolling function, closes [#4887](https://github.com/Rdatatable/data.table/issues/4887).
177+
```r
178+
x = data.table(v1=rnorm(120), v2=rnorm(120))
179+
f = function(x) coef(lm(v2 ~ v1, data=x))
180+
coef.fill = c("(Intercept)"=NA_real_, "v1"=NA_real_)
181+
frollapply(x, 4, f, by.column=FALSE, fill=coef.fill)
182+
# (Intercept) v1
183+
# 1: NA NA
184+
# 2: NA NA
185+
# 3: NA NA
186+
# 4: 0.65456931 0.3138012
187+
# 5: -1.07977441 -2.0588094
188+
#---
189+
#116: 0.15828417 0.3570216
190+
#117: -0.09083424 1.5494507
191+
#118: -0.18345878 0.6424837
192+
#119: -0.28964772 0.6116575
193+
#120: -0.40598313 0.6112854
194+
```
195+
- uses multiple CPU threads (on a decent OS); evaluation of UDF is inherently slow so this can be a great help.
196+
```r
197+
x = rnorm(1e5)
198+
n = 500
199+
setDTthreads(1)
200+
system.time(
201+
th1 <- frollapply(x, n, median, simplify=unlist)
202+
)
203+
# user system elapsed
204+
# 3.078 0.005 3.084
205+
setDTthreads(4)
206+
system.time(
207+
th4 <- frollapply(x, n, median, simplify=unlist)
208+
)
209+
# user system elapsed
210+
# 2.453 0.135 0.897
211+
all.equal(th1, th4)
212+
#[1] TRUE
213+
```
214+
215+
18. New helper `frolladapt` to facilitate applying rolling functions over windows of fixed calendar-time width in irregularly-spaced data sets, thereby bypassing the need to "augment" such data with placeholder rows, [#3241](https://github.com/Rdatatable/data.table/issues/3241). Thanks to @jangorecki for implementation.
216+
```r
217+
idx = as.Date("2025-09-05") + c(0,4,7,8,9,10,12,13,17)
218+
dt = data.table(index=idx, value=seq_along(idx))
219+
dt
220+
# index value
221+
# <Date> <int>
222+
#1: 2025-09-05 1
223+
#2: 2025-09-09 2
224+
#3: 2025-09-12 3
225+
#4: 2025-09-13 4
226+
#5: 2025-09-14 5
227+
#6: 2025-09-15 6
228+
#7: 2025-09-17 7
229+
#8: 2025-09-18 8
230+
#9: 2025-09-22 9
231+
dt[, c("rollmean3","rollmean3days") := list(
232+
frollmean(value, 3),
233+
frollmean(value, frolladapt(index, 3), adaptive=TRUE)
234+
)]
235+
dt
236+
# index value rollmean3 rollmean3days
237+
# <Date> <int> <num> <num>
238+
#1: 2025-09-05 1 NA NA
239+
#2: 2025-09-09 2 NA 2.0
240+
#3: 2025-09-12 3 2 3.0
241+
#4: 2025-09-13 4 3 3.5
242+
#5: 2025-09-14 5 4 4.0
243+
#6: 2025-09-15 6 5 5.0
244+
#7: 2025-09-17 7 6 6.5
245+
#8: 2025-09-18 8 7 7.5
246+
#9: 2025-09-22 9 8 9.0
247+
```
248+
72249
### BUG FIXES
73250

74251
1. `fread()` no longer warns on certain systems on R 4.5.0+ where the file owner can't be resolved, [#6918](https://github.com/Rdatatable/data.table/issues/6918). Thanks @ProfFancyPants for the report and PR.
@@ -103,6 +280,12 @@
103280
104281
16. `between()` is now more robust with `integer64` arguments. Combining small integer `x` with certain large `integer64` bounds no longer misinterprets the bounds as `double`; if a `double` bound cannot be losslessly converted into `integer64` for comparison with `integer64` `x`, an error is signalled instead of returning a wrong answer with a warning; [#7164](https://github.com/Rdatatable/data.table/issues/7164). Thanks @aitap for the bug report and the fix.
105282
283+
17. `t1 - t2`, where one is an `IDate` and the other is a `Date`, are now consistent with the case where both are `IDate` or both are `Date`, [#4749](https://github.com/Rdatatable/data.table/issues/4749). Thanks @George9000 for the report and @MichaelChirico for the fix.
284+
285+
18. `fwrite` now allows `dec` to be the same as `sep` for edge cases where only one will be written, e.g. 0-row or 1-column tables. [#7227](https://github.com/Rdatatable/data.table/issues/7227). Thanks @MichaelChirico for the report and @venom1204 for the fix.
286+
287+
19. Ellipsis elements like `..1` are correctly excluded when searching for variables in "up-a-level" syntax inside `[`, [#5460](https://github.com/Rdatatable/data.table/issues/5460). Thanks @ggrothendieck for the report and @MichaelChirico for the fix.
288+
106289
### NOTES
107290
108291
1. The following in-progress deprecations have proceeded:

R/IDateTime.R

Lines changed: 9 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -91,6 +91,8 @@ round.IDate = function(x, digits=c("weeks", "months", "quarters", "years"), ...)
9191
years = ISOdate(year(x), 1L, 1L)))
9292
}
9393

94+
chooseOpsMethod.IDate = function(x, y, mx, my, cl, reverse) inherits(y, "Date")
95+
9496
#Adapted from `+.Date`
9597
`+.IDate` = function(e1, e2) {
9698
if (nargs() == 1L)
@@ -115,7 +117,7 @@ round.IDate = function(x, digits=c("weeks", "months", "quarters", "years"), ...)
115117
if (storage.mode(e1) != "integer")
116118
internal_error("storage mode of IDate is somehow no longer integer") # nocov
117119
if (nargs() == 1L)
118-
stopf("unary - is not defined for \"IDate\" objects")
120+
stopf('unary - is not defined for "IDate" objects')
119121
if (inherits(e2, "difftime"))
120122
internal_error("difftime objects may not be subtracted from IDate, but Ops dispatch should have intervened to prevent this") # nocov
121123

@@ -127,7 +129,12 @@ round.IDate = function(x, digits=c("weeks", "months", "quarters", "years"), ...)
127129
# ii) .Date was newly exposed in R some time after 3.4.4
128130
}
129131
ans = as.integer(unclass(e1) - unclass(e2))
130-
if (!inherits(e2, "Date")) setattr(ans, "class", c("IDate", "Date"))
132+
if (inherits(e2, "Date")) {
133+
setattr(ans, "class", "difftime")
134+
setattr(ans, "units", "days")
135+
} else {
136+
setattr(ans, "class", c("IDate", "Date"))
137+
}
131138
ans
132139
}
133140

R/data.table.R

Lines changed: 6 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -264,7 +264,8 @@ replace_dot_alias = function(e) {
264264
if (!missing(j)) {
265265
jsub = replace_dot_alias(jsub)
266266
root = root_name(jsub)
267-
av = all.vars(jsub)
267+
# exclude '..1' etc. for #5460
268+
av = grepv("^[.][.](?:[.]|[0-9]+)$", all.vars(jsub), invert=TRUE)
268269
all..names = FALSE
269270
if ((.is_withFALSE_range(jsub, x, root, av)) ||
270271
(root %chin% c("-","!") && jsub[[2L]] %iscall% '(' && jsub[[2L]][[2L]] %iscall% ':') || ## x[, !(V8:V10)]
@@ -312,7 +313,7 @@ replace_dot_alias = function(e) {
312313
root = root_name(jsub)
313314
} else if (length(jsub) > 2L && jsub[[2L]] %iscall% ":=") {
314315
#2142 -- j can be {} and have length 1
315-
stopf("You have wrapped := with {} which is ok but then := must be the only thing inside {}. You have something else inside {} as well. Consider placing the {} on the RHS of := instead; e.g. DT[,someCol:={tmpVar1<-...;tmpVar2<-...;tmpVar1*tmpVar2}]")
316+
stopf("Invalid use of `:=` inside `{}`. `:=` must be the only expression inside `{}` when used in `j`. Instead of: DT[{tmp1 <- ...; tmp2 <- ...; someCol := tmp1 * tmp2}], Use: DT[, someCol := {tmp1 <- ...; tmp2 <- ...; tmp1 * tmp2}]")
316317
}
317318
}
318319
if (root=="eval" && !any(all.vars(jsub[[2L]]) %chin% names_x)) {
@@ -1297,8 +1298,8 @@ replace_dot_alias = function(e) {
12971298
SDenv = new.env(parent=parent.frame())
12981299

12991300
syms = all.vars(jsub)
1300-
syms = syms[ startsWith(syms, "..") ]
1301-
syms = syms[ substr(syms, 3L, 3L) != "." ] # exclude ellipsis
1301+
syms = syms[startsWith(syms, "..")]
1302+
syms = grepv("^[.][.](?:[.]|[0-9]+)$", syms, invert=TRUE) # exclude ellipsis and '..n' ellipsis elements
13021303
for (sym in syms) {
13031304
if (sym %chin% names_x) {
13041305
# if "..x" exists as column name, use column, for backwards compatibility; e.g. package socialmixr in rev dep checks #2779
@@ -2884,7 +2885,7 @@ address = function(x) .Call(Caddress, eval(substitute(x), parent.frame()))
28842885

28852886
":=" = function(...) {
28862887
# this error is detected when eval'ing isub and replaced with a more helpful one when using := in i due to forgetting a comma, #4227
2887-
stopf('Check that is.data.table(DT) == TRUE. Otherwise, :=, `:=`(...) and let(...) are defined for use in j, once only and in particular ways. Note that namespace-qualification like data.table::`:=`(...) is not supported. See help(":=").', class="dt_invalid_let_error")
2888+
stopf('Check that is.data.table(DT) == TRUE. Otherwise, `:=` is defined for use in j, once only and in particular ways. See help(":=", "data.table"). A common reason for this error is allocating a new column in `j` and using `<-` instead of `:=`; e.g., `DT[, new_col <- 1]` should be `DT[, new_col := 1]`. Another is using `:=` in a multi-statement `{...}` block; please use `:=` as the only statement in `j`.', class="dt_invalid_let_error")
28882889
}
28892890

28902891
# TODO(#6197): Export these.

R/fcast.R

Lines changed: 1 addition & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -186,11 +186,7 @@ dcast.data.table = function(data, formula, fun.aggregate = NULL, sep = "_", ...,
186186
maybe_err = function(list.of.columns) {
187187
if (!all(lengths(list.of.columns) == 1L)) {
188188
msg = gettext("Aggregating functions should take a vector as input and return a single value (length=1), but they do not, so the result is undefined. Please fix by modifying your function so that a single value is always returned.")
189-
if (is.null(fill)) { # TODO change to always stopf #6329
190-
stop(msg, domain=NA, call. = FALSE)
191-
} else {
192-
warning(msg, domain=NA, call. = FALSE)
193-
}
189+
stop(msg, domain=NA, call. = FALSE)
194190
}
195191
list.of.columns
196192
}

0 commit comments

Comments
 (0)