Skip to content

Commit 996bce9

Browse files
committed
frollmax PRs 2:10
1 parent 8647d44 commit 996bce9

File tree

12 files changed

+1821
-734
lines changed

12 files changed

+1821
-734
lines changed

NAMESPACE

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -54,6 +54,7 @@ S3method(cube, data.table)
5454
S3method(rollup, data.table)
5555
export(frollmean)
5656
export(frollsum)
57+
export(frollmax)
5758
export(frollapply)
5859
export(nafill)
5960
export(setnafill)

NEWS.md

Lines changed: 42 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,10 @@
44

55
## data.table [v1.17.99](https://github.com/Rdatatable/data.table/milestone/35) (in development)
66

7+
### BREAKING CHANGE
8+
9+
1. Rolling functions `frollmean` and `frollsum` used to treat `Inf` and `-Inf` as `NA` when using default `algo="fast"`. It has been changed now and infinite values are not treated as `NA` anymore. If your input into those functions has `Inf` or `-Inf` then you will be affected by this change.
10+
711
### NEW FEATURES
812

913
1. New `sort_by()` method for data.tables, [#6662](https://github.com/Rdatatable/data.table/issues/6662). It uses `forder()` to improve upon the data.frame method and also match `DT[order(...)]` behavior with respect to locale. Thanks @rikivillalba for the suggestion and PR.
@@ -14,6 +18,44 @@
1418

1519
4. `as.Date()` method for `IDate` no longer coerces to `double` [#6922](https://github.com/Rdatatable/data.table/issues/6922). Thanks @MichaelChirico for the report and PR. The only effect should be on overly-strict tests that assert `Date` objects have `double` storage, which is not in general true, especially from R 4.5.0.
1620

21+
5. Multiple improvements has been added to rolling functions. Request came from @gpierard who needed left aligned, adaptive, rolling max, [#5438](https://github.com/Rdatatable/data.table/issues/5438). There was no `frollmax` function yet. Adaptive rolling functions did not have support for `align="left"`. `frollapply` did not support `adaptive=TRUE`. Available alternatives were base R `mapply` or self-join using `max` and grouping `by=.EACHI`. As a follow up of his request, following features has been added:
22+
- new function `frollmax`, applies `max` over a rolling window.
23+
- support for `align="left"` for adaptive rolling function.
24+
- support for `adaptive=TRUE` in `frollapply`.
25+
- `partial` argument to trim window width to available observations rather than returning `NA` whenever window is not complete.
26+
- `give.names` argument that can be used to automatically give the names based on the names of `x` and `n`.
27+
- `frollmean` and `frollsum` no longer treat `Inf` and `-Inf` as `NA`s as it used to be for `algo="fast"` (breaking change).
28+
- `hasNA` argument has been renamed to `has.nf` to convey that it is not only related to `NA/NaN` but other non-finite values (`Inf/-Inf`) as well.
29+
30+
For a comprehensive description about all available features see `?froll` manual.
31+
32+
Adaptive `frollmax` has observed to be up to 50 times faster than second fastest solution (data.table self-join using `max` and grouping `by=.EACHI`). Note that important factor in performance is width of the rolling window. Code for the benchmark below has been taken from [this SO answer](https://stackoverflow.com/a/73408459/2490497).
33+
```r
34+
set.seed(108)
35+
setDTthreads(8)
36+
x = data.table(
37+
value = cumsum(rnorm(1e6, 0.1)),
38+
end_window = 1:1e6 + sample(50:500, 1e6, TRUE),
39+
row = 1:1e6
40+
)[, "end_window" := pmin(end_window, .N)
41+
][, "len_window" := end_window-row+1L]
42+
43+
baser = function(x) x[, mapply(function(from, to) max(value[from:to]), row, end_window)]
44+
sj = function(x) x[x, max(value), on=.(row >= row, row <= end_window), by=.EACHI]$V1
45+
frmax = function(x) x[, frollmax(value, len_window, adaptive=TRUE, align="left", has.nf=FALSE)]
46+
frapply = function(x) x[, frollapply(value, len_window, max, adaptive=TRUE, align="left")]
47+
microbenchmark::microbenchmark(
48+
baser(x), sj(x), frmax(x), frapply(x),
49+
times=10, check="identical"
50+
)
51+
#Unit: milliseconds
52+
# expr min lq mean median uq max neval
53+
# baser(x) 5181.36076 5417.57505 5537.2929 5494.73652 5706.2721 5818.6627 10
54+
# sj(x) 4608.28940 4627.57186 4792.4031 4785.35306 4856.4475 5054.3301 10
55+
# frmax(x) 70.41253 75.28659 91.3774 91.40227 102.0248 116.8622 10
56+
# frapply(x) 713.23108 742.34657 865.2524 848.31641 965.3599 1114.0531 10
57+
```
58+
1759
### BUG FIXES
1860

1961
1. Custom binary operators from the `lubridate` package now work with objects of class `IDate` as with a `Date` subclass, [#6839](https://github.com/Rdatatable/data.table/issues/6839). Thanks @emallickhossain for the report and @aitap for the fix.

R/froll.R

Lines changed: 121 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -1,21 +1,130 @@
1-
froll = function(fun, x, n, fill=NA, algo=c("fast", "exact"), align=c("right", "left", "center"), na.rm=FALSE, hasNA=NA, adaptive=FALSE) {
2-
stopifnot(!missing(fun), is.character(fun), length(fun)==1L, !is.na(fun))
3-
algo = match.arg(algo)
1+
# helpers for partial2adaptive
2+
trimn = function(n, len, align) {
3+
n = min(n, len) ## so frollsum(1:2, 3, partial=TRUE) works
4+
if (align=="right")
5+
c(seq.int(n), rep.int(n, len-n))
6+
else
7+
c(rep.int(n, len-n), rev(seq.int(n)))
8+
}
9+
trimnadaptive = function(n, align) {
10+
if (align=="right")
11+
pmin(n, seq_along(n))
12+
else
13+
pmin(n, rev(seq_along(n)))
14+
}
15+
16+
# partial2adaptive helper function
17+
## tune provided 'n' via partial=TRUE to adaptive=TRUE by prepared adaptive 'n' as shown in ?froll examples
18+
# partial2adaptive(1:4, 2, "right", adaptive=FALSE)
19+
# partial2adaptive(1:4, 2:3, "right", adaptive=FALSE)
20+
# partial2adaptive(list(1:4, 2:5), 2:3, "right", adaptive=FALSE)
21+
# frollsum(1:4, 2, partial=FALSE, adaptive=FALSE)
22+
# frollsum(1:4, 2, partial=TRUE, adaptive=FALSE)
23+
# frollsum(1:4, 2:3, partial=FALSE, adaptive=FALSE)
24+
# frollsum(1:4, 2:3, partial=TRUE, adaptive=FALSE)
25+
# frollsum(list(1:4, 2:5), 2:3, partial=FALSE, adaptive=FALSE)
26+
# frollsum(list(1:4, 2:5), 2:3, partial=TRUE, adaptive=FALSE)
27+
partial2adaptive = function(x, n, align, adaptive) {
28+
if (align=="center")
29+
stopf("'partial' cannot be used together with align='center'")
30+
if (is.list(x) && length(unique(lengths(x)))!=1L)
31+
stopf("'partial' does not support variable length of columns in 'x'")
32+
len = if (is.list(x)) length(x[[1L]]) else length(x)
33+
verbose = getOption("datatable.verbose")
34+
if (!adaptive) {
35+
if (is.list(n))
36+
stopf("n must be an integer, list is accepted for adaptive TRUE")
37+
if (!is.numeric(n))
38+
stopf("n must be an integer vector or a list of integer vectors")
39+
if (verbose)
40+
cat("partial2adaptive: froll partial=TRUE trimming 'n' and redirecting to adaptive=TRUE\n")
41+
if (length(n)>1L) {
42+
lapply(n, len, align, FUN=trimn)
43+
} else {
44+
trimn(n, len, align)
45+
}
46+
} else {
47+
if (!(is.numeric(n) || (is.list(n) && all(vapply_1b(n, is.numeric)))))
48+
stopf("n must be an integer vector or a list of integer vectors")
49+
if (!is.list(n))
50+
n = list(n)
51+
if (length(unique(lengths(n))) != 1L)
52+
stopf("adaptive window provided in 'n' must not to have different lengths")
53+
if (length(n[[1L]]) != len)
54+
stopf("length of vectors in 'x' must match to length of adaptive window in 'n'")
55+
if (verbose)
56+
cat("partial2adaptive: froll adaptive=TRUE and partial=TRUE trimming 'n'\n")
57+
lapply(n, align, FUN=trimnadaptive)
58+
}
59+
}
60+
61+
froll = function(fun, x, n, fill=NA, algo, align=c("right","left","center"), na.rm=FALSE, has.nf=NA, adaptive=FALSE, partial=FALSE, FUN, rho, give.names=FALSE) {
462
align = match.arg(align)
5-
ans = .Call(CfrollfunR, fun, x, n, fill, algo, align, na.rm, hasNA, adaptive)
63+
if (isTRUE(give.names))
64+
orig = list(n=n, adaptive=adaptive)
65+
if (isTRUE(partial)) {
66+
if (!length(n))
67+
stopf("n must be non 0 length")
68+
n = partial2adaptive(x, n, align, adaptive)
69+
adaptive = TRUE
70+
}
71+
leftadaptive = isTRUE(adaptive) && align=="left"
72+
if (leftadaptive) {
73+
verbose = getOption("datatable.verbose")
74+
rev2 = function(x) if (is.list(x)) lapply(x, rev) else rev(x)
75+
if (verbose)
76+
cat("froll: adaptive=TRUE && align='left' pre-processing for align='right'\n")
77+
x = rev2(x)
78+
n = rev2(n)
79+
align = "right"
80+
} ## support for left adaptive added in #5441
81+
if (missing(FUN))
82+
ans = .Call(CfrollfunR, fun, x, n, fill, algo, align, na.rm, has.nf, adaptive)
83+
else
84+
ans = .Call(CfrollapplyR, FUN, x, n, fill, align, adaptive, rho)
85+
if (leftadaptive) {
86+
if (verbose)
87+
cat("froll: adaptive=TRUE && align='left' post-processing from align='right'\n")
88+
ans = rev2(ans)
89+
}
90+
if (isTRUE(give.names) && is.list(ans)) {
91+
n = orig$n
92+
adaptive = orig$adaptive
93+
nx = names(x)
94+
nn = names(n)
95+
if (is.null(nx)) nx = paste0("V", if (is.atomic(x)) 1L else seq_along(x))
96+
if (is.null(nn)) nn = if (adaptive) paste0("N", if (is.atomic(n)) 1L else seq_along(n)) else paste("roll", as.character(n), sep="_")
97+
setattr(ans, "names", paste(rep(nx, each=length(nn)), nn, sep="_"))
98+
}
699
ans
7100
}
8101

9-
frollmean = function(x, n, fill=NA, algo=c("fast", "exact"), align=c("right", "left", "center"), na.rm=FALSE, hasNA=NA, adaptive=FALSE) {
10-
froll(fun="mean", x=x, n=n, fill=fill, algo=algo, align=align, na.rm=na.rm, hasNA=hasNA, adaptive=adaptive)
102+
frollfun = function(fun, x, n, fill=NA, algo=c("fast","exact"), align=c("right","left","center"), na.rm=FALSE, has.nf=NA, adaptive=FALSE, partial=FALSE, hasNA, give.names=FALSE) {
103+
stopifnot(!missing(fun), is.character(fun), length(fun)==1L, !is.na(fun))
104+
if (!missing(hasNA)) {
105+
if (!is.na(has.nf))
106+
stopf("hasNA is deprecated, use has.nf instead")
107+
warningf("hasNA is deprecated, use has.nf instead")
108+
has.nf = hasNA
109+
} # remove check on next major release
110+
algo = match.arg(algo)
111+
froll(fun=fun, x=x, n=n, fill=fill, algo=algo, align=align, na.rm=na.rm, has.nf=has.nf, adaptive=adaptive, partial=partial, give.names=give.names)
112+
}
113+
114+
frollmean = function(x, n, fill=NA, algo=c("fast","exact"), align=c("right","left","center"), na.rm=FALSE, has.nf=NA, adaptive=FALSE, partial=FALSE, hasNA, give.names=FALSE) {
115+
frollfun(fun="mean", x=x, n=n, fill=fill, algo=algo, align=align, na.rm=na.rm, has.nf=has.nf, adaptive=adaptive, partial=partial, hasNA=hasNA, give.names=give.names)
116+
}
117+
frollsum = function(x, n, fill=NA, algo=c("fast","exact"), align=c("right","left","center"), na.rm=FALSE, has.nf=NA, adaptive=FALSE, partial=FALSE, hasNA, give.names=FALSE) {
118+
frollfun(fun="sum", x=x, n=n, fill=fill, algo=algo, align=align, na.rm=na.rm, has.nf=has.nf, adaptive=adaptive, partial=partial, hasNA=hasNA, give.names=give.names)
11119
}
12-
frollsum = function(x, n, fill=NA, algo=c("fast","exact"), align=c("right", "left", "center"), na.rm=FALSE, hasNA=NA, adaptive=FALSE) {
13-
froll(fun="sum", x=x, n=n, fill=fill, algo=algo, align=align, na.rm=na.rm, hasNA=hasNA, adaptive=adaptive)
120+
frollmax = function(x, n, fill=NA, algo=c("fast","exact"), align=c("right","left","center"), na.rm=FALSE, has.nf=NA, adaptive=FALSE, partial=FALSE, hasNA, give.names=FALSE) {
121+
frollfun(fun="max", x=x, n=n, fill=fill, algo=algo, align=align, na.rm=na.rm, has.nf=has.nf, adaptive=adaptive, partial=partial, hasNA=hasNA, give.names=give.names)
14122
}
15-
frollapply = function(x, n, FUN, ..., fill=NA, align=c("right", "left", "center")) {
123+
124+
frollapply = function(x, n, FUN, ..., fill=NA, align=c("right","left","center"), adaptive=FALSE, partial=FALSE, give.names=FALSE) {
125+
if (isTRUE(adaptive) && base::getRversion() < "3.4.0") ## support SET_GROWABLE_BIT
126+
stopf("frollapply adaptive=TRUE requires at least R 3.4.0"); # nocov
16127
FUN = match.fun(FUN)
17-
align = match.arg(align)
18128
rho = new.env()
19-
ans = .Call(CfrollapplyR, FUN, x, n, fill, align, rho)
20-
ans
129+
froll(FUN=FUN, rho=rho, x=x, n=n, fill=fill, align=align, adaptive=adaptive, partial=partial, give.names=give.names)
21130
}

TODO

Lines changed: 28 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,28 @@
1+
This is the list of follow up issues that may be resolved after all rolling functions PRs will be merged. As of the current moment none of those is a bug fix or essentially necessary but rather good practice changes.
2+
3+
- Rename `has.nf` arg to `all.finite`?
4+
5+
- fix malloc calls
6+
```
7+
bool *isnan = malloc(nx*sizeof(bool)); // isnan lookup - we use it to reduce ISNAN calls in nested loop
8+
## into
9+
bool *isnan = malloc(sizeof(*isnan)*nx);
10+
```
11+
12+
- remove `free(NULL)`
13+
14+
```
15+
free(isnan);
16+
```
17+
18+
- as we moved to 3.4.0 dependency remove mentions of R < 3.4.0 related to set growable for frollapply adaptive
19+
20+
- catf() instead of cat()
21+
22+
- use `test(..., options(datatable.verbose=TRUE))` rather than `options(datatable.verbose=TRUE)`
23+
24+
- verbose output `frolladaptivefun: algo 0 not implemented, fall back to 1` could be more intuitive
25+
26+
The only reason why it was added was that we can test if expected algo is being used. Rolling functions do not expose verbose arg directly, it has to be set via options, therefore full user friendliness is maybe not the biggest priority at the moment. We will also have to keep in mind buffer limit of 500 characters set for `snprintf`.
27+
28+
- The `snprintf(end(message), 500, format, ...)` idiom should be more like `len = strlen(ans->message[0]); snprintf(ans->message[0] + len, sizeof(ans->message[0]) - len, format, ...)`, but I don't see a way the current code could overflow the 4096-byte buffer, even with long translation strings; the current code doesn't cause a problem.

0 commit comments

Comments
 (0)