Rdatatable
diff --git a/‎NAMESPACE‎
Lines changed: 1 addition & 0 deletions b/‎NAMESPACE‎
Lines changed: 1 addition & 0 deletions
diff --git a/‎NEWS.md‎
Lines changed: 91 additions & 1 deletion b/‎NEWS.md‎
Lines changed: 91 additions & 1 deletion
diff --git a/‎R/data.table.R‎
Lines changed: 4 additions & 3 deletions b/‎R/data.table.R‎
Lines changed: 4 additions & 3 deletions
diff --git a/‎R/froll.R‎
Lines changed: 164 additions & 12 deletions b/‎R/froll.R‎
Lines changed: 164 additions & 12 deletions
@@ -54,6 +54,7 @@ S3method(cube, data.table)
 S3method(rollup, data.table)
 export(frollmean)
 export(frollsum)
+export(frollmax)
 export(frollapply)
 export(nafill)
 export(setnafill)
 
@@ -4,6 +4,22 @@
 
 ## data.table [v1.17.99](https://github.com/Rdatatable/data.table/milestone/35)  (in development)
 
+### BREAKING CHANGE
+
+1. `dcast()` now errors when `fun.aggregate` returns length != 1 (consistent with documentation), regardless of `fill`, [#6629](https://github.com/Rdatatable/data.table/issues/6629). Previously, when `fill` was not `NULL`, `dcast` warned and returned an undefined result. This change has been planned since 1.16.0 (25 Aug 2024).
+
+2. `melt()` returns an integer column for `variable` when `measure.vars` is a list of length=1, consistent with the documented behavior, [#5209](https://github.com/Rdatatable/data.table/issues/5209). Thanks to @tdhock for reporting. Any users who were relying on this behavior can change `measure.vars=list("col_name")` (output `variable` was column name, now is column index/integer) to `measure.vars="col_name"` (`variable` still is column name). This change has been planned since 1.16.0 (25 Aug 2024).
+
+3. Rolling functions `frollmean` and `frollsum` distinguish `Inf`/`-Inf` from `NA` to match the same rules as base R when `algo="fast"` (previously they were considered the same). If your input into those functions has `Inf` or `-Inf` then you will be affected by this change. As a result, the argument that controls the handling of `NA`s has been renamed from `hasNA` to `has.nf` (_has non-finite_). `hasNA` continues to work with a warning, for now.
+    ```r
+    ## before
+    frollsum(c(1,2,3,Inf,5,6), 2)
+    #[1] NA  3  5 NA NA 11
+
+    ## now
+    frollsum(c(1,2,3,Inf,5,6), 2)
+    #[1]  NA   3   5 Inf Inf  11
+
 ### NOTICE OF INTENDED FUTURE POTENTIAL BREAKING CHANGES 
 
 1. `data.table(x=1, <expr>)`, where `<expr>` is an expression resulting in a 1-column matrix without column names, will eventually have names `x` and `V2`, not `x` and `V1`, consistent with `data.table(x=1, <expr>)` where `<expr>` results in an atomic vector, for example `data.table(x=1, cbind(1))` and `data.table(x=1, 1)` will both have columns named `x` and `V2`. In this release, the matrix case continues to be named `V1`, but the new behavior can be activated by setting `options(datatable.old.matrix.autoname)` to `FALSE`. See point 5 under Bug Fixes for more context; this change will provide more internal consistency as well as more consistency with `data.frame()`.
@@ -63,12 +79,84 @@
 
 12. New `cbindlist()` and `setcbindlist()` for concatenating a `list` of data.tables column-wise, evocative of the analogous `do.call(rbind, l)` <-> `rbindlist(l)`, [#2576](https://github.com/Rdatatable/data.table/issues/2576). `setcbindlist()` does so without making any copies. Thanks @MichaelChirico for the FR, @jangorecki for the PR, and @MichaelChirico for extensive reviews and fine-tuning.
 
+    ```r
+    l = list(
+      data.table(id = 1:3, a = letters[1:3]),
+      data.table(b = 4:6, c = 7:9)
+    )
+    cbindlist(l)
+    #    id a b c
+    # 1:  1 a 4 7
+    # 2:  2 b 5 8
+    # 3:  3 c 6 9
+    ```
+
 13. New `mergelist()` and `setmergelist()` similarly work _a la_ `Reduce()` to recursively merge a `list` of data.tables, [#599](https://github.com/Rdatatable/data.table/issues/599). Different join modes (_left_, _inner_, _full_, _right_, _semi_, _anti_, and _cross_) are supported through the `how` argument; duplicate handling goes through the `mult` argument. `setmergelist()` carefully avoids copies where one is not needed, e.g. in a 1:1 left join. Thanks Patrick Nicholson for the FR (in 2013!), @jangorecki for the PR, and @MichaelChirico for extensive reviews and fine-tuning.
 
+```r
+    l = list(
+      data.table(id = c(1L, 2L, 3L), x = c("a", "b", "c")),
+      data.table(id = c(1L, 2L, 4L), y = c("d", "e", "f")),
+      data.table(id = c(1L, 3L, 4L), z = c("g", "h", "i"))
+    )
+
+    # Recursive inner join
+    mergelist(l, on = "id", how = "inner")
+    #    id x y z
+    # 1:  1 a d g
+
+    # Recursive left join (the default 'how')
+    mergelist(l, on = "id", how = "left")
+    #    id x    y    z
+    # 1:  1 a    d    g
+    # 2:  2 b    e <NA>
+    # 3:  3 c <NA>    h
+    ```
+
 14. `fcoalesce()` and `setcoalesce()` gain `nan` argument to control whether `NaN` values should be treated as missing (`nan=NA`, the default) or non-missing (`nan=NaN`), [#4567](https://github.com/Rdatatable/data.table/issues/4567). This provides full compatibility with `nafill()` behavior. Thanks to @ethanbsmith for the feature request and @Mukulyadav2004 for the implementation.
 
 15. New function `isoyear()` has been implemented as a complement to `isoweek()`, returning the ISO 8601 year corresponding to a given date, [#7154](https://github.com/Rdatatable/data.table/issues/7154). Thanks to @ben-schwen and @MichaelChirico for the suggestion and @venom1204 for the implementation.
 
+16. Multiple improvements have been added to rolling functions. Request came from @gpierard who needed left aligned, adaptive, rolling max, [#5438](https://github.com/Rdatatable/data.table/issues/5438). There was no `frollmax` function yet. Adaptive rolling functions did not have support for `align="left"`. `frollapply` did not support `adaptive=TRUE`. Available alternatives were base R `mapply` or self-join using `max` and grouping `by=.EACHI`. As a follow up of his request, the following features have been added:
+    - new function `frollmax`, applies `max` over a rolling window.
+    - support for `align="left"` for adaptive rolling function.
+    - support for `adaptive=TRUE` in `frollapply`.
+    - `partial` argument to trim window width to available observations rather than returning `NA` whenever window is not complete.
+    - `give.names` argument that can be used to automatically give the names based on the names of `x` and `n`.
+    - `frollmean` and `frollsum` no longer treat `Inf` and `-Inf` as `NA`s as it used to be for `algo="fast"` (breaking change).
+    - `hasNA` argument has been renamed to `has.nf` to convey that it is not only related to `NA/NaN` but other non-finite values (`Inf/-Inf`) as well.
+
+    Thanks to @jangorecki for implementation and @MichaelChirico and others for work on splitting into smaller PRs and reviews.
+    For a comprehensive description about all available features see `?froll` manual.
+
+    Adaptive `frollmax` has observed to be around 80 times faster than second fastest solution (data.table self-join using `max` and grouping `by=.EACHI`). Note that important factor in performance is width of the rolling window. Code for the benchmark below has been taken from [this SO answer](https://stackoverflow.com/a/73408459/2490497).
+    ```r
+    set.seed(108)
+    setDTthreads(16)
+    x = data.table(
+      value = cumsum(rnorm(1e6, 0.1)),
+      end_window = 1:1e6 + sample(50:500, 1e6, TRUE),
+      row = 1:1e6
+    )[, "end_window" := pmin(end_window, .N)
+      ][, "len_window" := end_window-row+1L]
+    baser = function(x) x[, mapply(function(from, to) max(value[from:to]), row, end_window)]
+    sj = function(x) x[x, max(value), on=.(row >= row, row <= end_window), by=.EACHI]$V1
+    frmax = function(x) x[, frollmax(value, len_window, adaptive=TRUE, align="left", has.nf=FALSE)]
+    frapply = function(x) x[, frollapply(value, len_window, max, adaptive=TRUE, align="left")]
+    microbenchmark::microbenchmark(
+      baser(x), sj(x), frmax(x), frapply(x),
+      times=10, check="identical"
+    )
+    #Unit: milliseconds
+    #       expr        min         lq       mean     median         uq        max neval
+    #   baser(x) 3094.88357 3097.84966 3186.74832 3163.58050 3251.66753 3370.33785    10
+    #      sj(x) 2221.55456 2255.12083 2306.61382 2303.47883 2346.70293 2412.62975    10
+    #   frmax(x)   17.45124   24.16809   28.10062   28.58153   32.79802   34.83941    10
+    # frapply(x)  272.07830  316.47060  366.94771  396.23566  416.06699  421.38701    10
+    ```
+
+    As of now, adaptive rolling max has no _on-line_ implementation (`algo="fast"`), it uses a naive approach (`algo="exact"`). Therefore further speed up is still possible if `algo="fast"` gets implemented.
+
 ### BUG FIXES
 
 1. `fread()` no longer warns on certain systems on R 4.5.0+ where the file owner can't be resolved, [#6918](https://github.com/Rdatatable/data.table/issues/6918). Thanks @ProfFancyPants for the report and PR.
@@ -107,7 +195,9 @@
 
 18. `fwrite` now allows `dec` to be the same as `sep` for edge cases where only one will be written, e.g. 0-row or 1-column tables. [#7227](https://github.com/Rdatatable/data.table/issues/7227). Thanks @MichaelChirico for the report and @venom1204 for the fix.
 
-19. `rowwiseDT()` now provides a helpful error message when a complex object that is not a list (e.g., a function) is provided as a cell value, instructing the user to wrap it in `list()`. [#7219](https://github.com/Rdatatable/data.table/issues/7219). Thanks @kylebutts for the report and @venom1204 for the fix.
+19. Ellipsis elements like `..1` are correctly excluded when searching for variables in "up-a-level" syntax inside `[`, [#5460](https://github.com/Rdatatable/data.table/issues/5460). Thanks @ggrothendieck for the report and @MichaelChirico for the fix.
+
+20. `rowwiseDT()` now provides a helpful error message when a complex object that is not a list (e.g., a function) is provided as a cell value, instructing the user to wrap it in `list()`. [#7219](https://github.com/Rdatatable/data.table/issues/7219). Thanks @kylebutts for the report and @venom1204 for the fix.
 
 ### NOTES
 
 
@@ -264,7 +264,8 @@ replace_dot_alias = function(e) {
   if (!missing(j)) {
     jsub = replace_dot_alias(jsub)
     root = root_name(jsub)
-    av = all.vars(jsub)
+    # exclude '..1' etc. for #5460
+    av = grepv("^[.][.](?:[.]|[0-9]+)$", all.vars(jsub), invert=TRUE)
     all..names = FALSE
     if ((.is_withFALSE_range(jsub, x, root, av)) ||
         (root %chin% c("-","!") && jsub[[2L]] %iscall% '(' && jsub[[2L]][[2L]] %iscall% ':') || ## x[, !(V8:V10)]
@@ -1297,8 +1298,8 @@ replace_dot_alias = function(e) {
   SDenv = new.env(parent=parent.frame())
 
   syms = all.vars(jsub)
-  syms = syms[ startsWith(syms, "..") ]
-  syms = syms[ substr(syms, 3L, 3L) != "." ]  # exclude ellipsis
+  syms = syms[startsWith(syms, "..")]
+  syms = grepv("^[.][.](?:[.]|[0-9]+)$", syms, invert=TRUE) # exclude ellipsis and '..n' ellipsis elements
   for (sym in syms) {
     if (sym %chin% names_x) {
       # if "..x" exists as column name, use column, for backwards compatibility; e.g. package socialmixr in rev dep checks #2779
 
@@ -1,21 +1,173 @@
-froll = function(fun, x, n, fill=NA, algo=c("fast", "exact"), align=c("right", "left", "center"), na.rm=FALSE, hasNA=NA, adaptive=FALSE) {
-  stopifnot(!missing(fun), is.character(fun), length(fun)==1L, !is.na(fun))
-  algo = match.arg(algo)
+# helpers for partial2adaptive
+trimn = function(n, len, align) {
+  n = min(n, len) ## so frollsum(1:2, 3, partial=TRUE) works
+  if (align=="right")
+    c(seq_len(n), rep.int(n, len-n))
+  else
+    c(rep.int(n, len-n), rev(seq_len(n)))
+}
+trimnadaptive = function(n, align) {
+  if (align=="right")
+    pmin(n, seq_along(n))
+  else
+    pmin(n, rev(seq_along(n)))
+}
+
+# partial2adaptive helper function
+## tune provided 'n' via partial=TRUE to adaptive=TRUE by prepared adaptive 'n' as shown in ?froll examples
+# partial2adaptive(1:4, 2, "right", adaptive=FALSE)
+# partial2adaptive(1:4, 2:3, "right", adaptive=FALSE)
+# partial2adaptive(list(1:4, 2:5), 2:3, "right", adaptive=FALSE)
+# frollsum(1:4, 2, partial=FALSE, adaptive=FALSE)
+# frollsum(1:4, 2, partial=TRUE, adaptive=FALSE)
+# frollsum(1:4, 2:3, partial=FALSE, adaptive=FALSE)
+# frollsum(1:4, 2:3, partial=TRUE, adaptive=FALSE)
+# frollsum(list(1:4, 2:5), 2:3, partial=FALSE, adaptive=FALSE)
+# frollsum(list(1:4, 2:5), 2:3, partial=TRUE, adaptive=FALSE)
+partial2adaptive = function(x, n, align, adaptive) {
+  if (!length(n))
+    stopf("n must be non 0 length")
+  if (align=="center")
+    stopf("'partial' cannot be used together with align='center'")
+  if (is.list(x) && length(unique(lengths(x))) != 1L)
+    stopf("'partial' does not support variable length of columns in 'x'")
+  len = if (is.list(x)) length(x[[1L]]) else length(x)
+  verbose = getOption("datatable.verbose")
+  if (!adaptive) {
+    if (is.list(n))
+      stopf("n must be an integer, list is accepted for adaptive TRUE")
+    if (!is.numeric(n))
+      stopf("n must be an integer vector or a list of integer vectors")
+    if (verbose)
+      catf("partial2adaptive: froll partial=TRUE trimming 'n' and redirecting to adaptive=TRUE\n")
+    if (length(n) > 1L) {
+      ## c(2,3) -> list(c(1,2,2,2),c(1,2,3,3)) ## for x=1:4
+      lapply(n, len, align, FUN=trimn)
+    } else {
+      ## 3 -> c(1,2,3,3) ## for x=1:4
+      trimn(n, len, align)
+    }
+  } else {
+    if (!(is.numeric(n) || (is.list(n) && all(vapply_1b(n, is.numeric)))))
+      stopf("n must be an integer vector or a list of integer vectors")
+    if (length(unique(lengths(n))) != 1L)
+      stopf("adaptive window provided in 'n' must not to have different lengths")
+    if (is.numeric(n) && length(n) != len)
+      stopf("length of 'n' argument must be equal to number of observations provided in 'x'")
+    if (is.list(n) && length(n[[1L]]) != len)
+      stopf("length of vectors in 'x' must match to length of adaptive window in 'n'")
+    if (verbose)
+      catf("partial2adaptive: froll adaptive=TRUE and partial=TRUE trimming 'n'\n")
+    if (is.numeric(n)) {
+      ## c(3,3,3,2) -> c(1,2,3,2) ## for x=1:4
+      trimnadaptive(n, align)
+    } else {
+      ## list(c(3,3,3,2),c(4,2,3,3)) -> list(c(1,2,3,2),c(1,2,3,3)) ## for x=1:4
+      lapply(n, align, FUN = trimnadaptive)
+    }
+  }
+}
+
+# internal helper for handling give.names=TRUE
+make.roll.names = function(x.len, n.len, n, x.nm, n.nm, fun, adaptive) {
+  if (is.null(n.nm)) {
+    if (!adaptive) {
+      if (!is.numeric(n))
+        stopf("internal error: misuse of make.roll.names, n must be numeric for !adaptive") ## nocov
+      n.nm = paste0("roll", fun, as.character(as.integer(n)))
+    } else {
+      n.nm = paste0("aroll", fun, seq_len(n.len))
+    }
+  } else if (!length(n.nm) && !adaptive)
+    stopf("internal error: misuse of make.roll.names, non-null length 0 n is not possible for !adaptive") ## nocov
+  if (is.null(x.nm)) {
+    x.nm = paste0("V", seq_len(x.len))
+  }
+  ans = if (length(x.nm)) { ## is.list(x) && !is.data.frame(x)
+    if (length(n.nm)) { ## !adaptive || is.list(n)
+      paste(rep(x.nm, each=length(n.nm)), n.nm, sep="_")
+    } else { ## adaptive && is.numeric(n)
+      x.nm
+    }
+  } else { ## (by.column && is.atomic(x)) || (!by.column && is.data.frame(x))
+    if (length(n.nm)) { ## !adaptive || is.list(n)
+      n.nm
+    } else { ## adaptive && is.numeric(n)
+      NULL # nocov ## call to make.roll.names is excluded by is.list(ans) condition before calling it, it will be relevant for !by.column in next PR
+    }
+  }
+  if (!is.null(ans) && length(ans) != x.len*n.len)
+    stopf("internal error: make.roll.names generated names of wrong length") ## nocov
+  ans
+}
+
+froll = function(fun, x, n, fill=NA, algo, align=c("right","left","center"), na.rm=FALSE, has.nf=NA, adaptive=FALSE, partial=FALSE, FUN, rho, give.names=FALSE) {
   align = match.arg(align)
-  ans = .Call(CfrollfunR, fun, x, n, fill, algo, align, na.rm, hasNA, adaptive)
+  if (isTRUE(give.names)) {
+     orig = list(n=n, adaptive=adaptive)
+     xnam = if (is.list(x)) names(x) else character()
+     nnam = if (isTRUE(adaptive)) {
+       if (is.list(n)) names(n) else character()
+     } else names(n)
+     nx = if (is.list(x)) length(x) else 1L
+     nn = if (isTRUE(adaptive)) {
+       if (is.list(n)) length(n) else 1L
+     } else length(n)
+   }
+  if (isTRUE(partial)) {
+    n = partial2adaptive(x, n, align, adaptive)
+    adaptive = TRUE
+  }
+  leftadaptive = isTRUE(adaptive) && align=="left"
+  if (leftadaptive) {
+    verbose = getOption("datatable.verbose")
+    rev2 = function(x) if (is.list(x)) lapply(x, rev) else rev(x)
+    if (verbose)
+      catf("froll: adaptive=TRUE && align='left' pre-processing for align='right'\n")
+    x = rev2(x)
+    n = rev2(n)
+    align = "right"
+  } ## support for left adaptive added in #5441
+  if (missing(FUN))
+    ans = .Call(CfrollfunR, fun, x, n, fill, algo, align, na.rm, has.nf, adaptive)
+  else
+    ans = .Call(CfrollapplyR, FUN, x, n, fill, align, adaptive, rho)
+  if (leftadaptive) {
+    if (verbose)
+      catf("froll: adaptive=TRUE && align='left' post-processing from align='right'\n")
+    ans = rev2(ans)
+  }
+  if (isTRUE(give.names) && is.list(ans)) {
+    nms = make.roll.names(x.len=nx, n.len=nn, n=orig$n, x.nm=xnam, n.nm=nnam, fun=fun, adaptive=orig$adaptive)
+    setattr(ans, "names", nms)
+  }
   ans
 }
 
-frollmean = function(x, n, fill=NA, algo=c("fast", "exact"), align=c("right", "left", "center"), na.rm=FALSE, hasNA=NA, adaptive=FALSE) {
-  froll(fun="mean", x=x, n=n, fill=fill, algo=algo, align=align, na.rm=na.rm, hasNA=hasNA, adaptive=adaptive)
+frollfun = function(fun, x, n, fill=NA, algo=c("fast","exact"), align=c("right","left","center"), na.rm=FALSE, has.nf=NA, adaptive=FALSE, partial=FALSE, hasNA, give.names=FALSE) {
+  stopifnot(!missing(fun), is.character(fun), length(fun)==1L, !is.na(fun))
+  if (!missing(hasNA)) {
+    if (!is.na(has.nf))
+      stopf("hasNA is deprecated, use has.nf instead")
+    warningf("hasNA is deprecated, use has.nf instead")
+    has.nf = hasNA
+  } # remove check on next major release
+  algo = match.arg(algo)
+  froll(fun=fun, x=x, n=n, fill=fill, algo=algo, align=align, na.rm=na.rm, has.nf=has.nf, adaptive=adaptive, partial=partial, give.names=give.names)
+}
+
+frollmean = function(x, n, fill=NA, algo=c("fast","exact"), align=c("right","left","center"), na.rm=FALSE, has.nf=NA, adaptive=FALSE, partial=FALSE, hasNA, give.names=FALSE) {
+  frollfun(fun="mean", x=x, n=n, fill=fill, algo=algo, align=align, na.rm=na.rm, has.nf=has.nf, adaptive=adaptive, partial=partial, hasNA=hasNA, give.names=give.names)
 }
-frollsum = function(x, n, fill=NA, algo=c("fast","exact"), align=c("right", "left", "center"), na.rm=FALSE, hasNA=NA, adaptive=FALSE) {
-  froll(fun="sum", x=x, n=n, fill=fill, algo=algo, align=align, na.rm=na.rm, hasNA=hasNA, adaptive=adaptive)
+frollsum = function(x, n, fill=NA, algo=c("fast","exact"), align=c("right","left","center"), na.rm=FALSE, has.nf=NA, adaptive=FALSE, partial=FALSE, hasNA, give.names=FALSE) {
+  frollfun(fun="sum", x=x, n=n, fill=fill, algo=algo, align=align, na.rm=na.rm, has.nf=has.nf, adaptive=adaptive, partial=partial, hasNA=hasNA, give.names=give.names)
 }
-frollapply = function(x, n, FUN, ..., fill=NA, align=c("right", "left", "center")) {
+frollmax = function(x, n, fill=NA, algo=c("fast","exact"), align=c("right","left","center"), na.rm=FALSE, has.nf=NA, adaptive=FALSE, partial=FALSE, hasNA, give.names=FALSE) {
+  frollfun(fun="max", x=x, n=n, fill=fill, algo=algo, align=align, na.rm=na.rm, has.nf=has.nf, adaptive=adaptive, partial=partial, hasNA=hasNA, give.names=give.names)
+}
+
+frollapply = function(x, n, FUN, ..., fill=NA, align=c("right","left","center"), adaptive=FALSE, partial=FALSE, give.names=FALSE) {
   FUN = match.fun(FUN)
-  align = match.arg(align)
   rho = new.env()
-  ans = .Call(CfrollapplyR, FUN, x, n, fill, align, rho)
-  ans
+  froll(FUN=FUN, rho=rho, x=x, n=n, fill=fill, align=align, adaptive=adaptive, partial=partial, give.names=give.names)
 }