Skip to content

Commit 4f6a365

Browse files
jangoreckiToby Dylan Hockingtdhock
authored
frollapply rewritten, by.column=F, parallel, any type (#7272)
* frollmax PRs 2:10 * frollmax PR as potentially complete * recoded frollapply but not split * address issues spot by lintr * fix codecov * try possibly a fix for codecov bug? * codecov runs older R where is.atomic(NULL) was true, this should fix * codecov last item * add parallel to suggests * types.Rraw causes codecov to hang, try no parallel * manual codecov confirmation for nocov exceptions * disabling openmp did not help, still stuck * improve comments * use new argument name N * fix codecov, temporary workaround * handle interrupts nicely * move interrupt handler just after child processes are created * duplicate error/warn templates for easier translation * wrap messages with gettext * frollapply test * rm perf test * minor doc edits --------- Co-authored-by: Toby Dylan Hocking <[email protected]> Co-authored-by: Toby Dylan Hocking <[email protected]>
1 parent 22045e5 commit 4f6a365

File tree

12 files changed

+1294
-516
lines changed

12 files changed

+1294
-516
lines changed

NEWS.md

Lines changed: 53 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -20,6 +20,18 @@
2020
frollsum(c(1,2,3,Inf,5,6), 2)
2121
#[1] NA 3 5 Inf Inf 11
2222

23+
4. `frollapply` result is not coerced to numeric anymore. Users' code could possibly break if it depends on forced coercion of input/output to numeric type.
24+
```r
25+
## before
26+
frollapply(c(F,T,F,F,F,T), 2, any)
27+
#[1] NA 1 1 0 0 1
28+
29+
## now
30+
frollapply(c(F,T,F,F,F,T), 2, any)
31+
#[1] NA TRUE TRUE FALSE FALSE TRUE
32+
```
33+
Additionally argument names in `frollapply` has been renamed from `x` to `X` and `n` to `N` to avoid conflicts with common argument names that may be passed to `...`, aligning to base R API of `lapply`. `x` and `n` continue to work with a warning, for now.
34+
2335
### NOTICE OF INTENDED FUTURE POTENTIAL BREAKING CHANGES
2436
2537
1. `data.table(x=1, <expr>)`, where `<expr>` is an expression resulting in a 1-column matrix without column names, will eventually have names `x` and `V2`, not `x` and `V1`, consistent with `data.table(x=1, <expr>)` where `<expr>` results in an atomic vector, for example `data.table(x=1, cbind(1))` and `data.table(x=1, 1)` will both have columns named `x` and `V2`. In this release, the matrix case continues to be named `V1`, but the new behavior can be activated by setting `options(datatable.old.matrix.autoname)` to `FALSE`. See point 5 under Bug Fixes for more context; this change will provide more internal consistency as well as more consistency with `data.frame()`.
@@ -157,6 +169,47 @@
157169
158170
As of now, adaptive rolling max has no _on-line_ implementation (`algo="fast"`), it uses a naive approach (`algo="exact"`). Therefore further speed up is still possible if `algo="fast"` gets implemented.
159171
172+
17. Function `frollapply` has been completely rewritten. Thanks to @jangorecki for implementation. Be sure to read `frollapply` manual before using the function. There are following changes:
173+
- all basic types are now supported on input/output, not only double. Users' code could possibly break if it depends on forced coercion of input/output to double type.
174+
- new argument `by.column` allowing to pass a multi-column subset of a data.table into a rolling function, closes [#4887](https://github.com/Rdatatable/data.table/issues/4887).
175+
```r
176+
x = data.table(v1=rnorm(120), v2=rnorm(120))
177+
f = function(x) coef(lm(v2 ~ v1, data=x))
178+
coef.fill = c("(Intercept)"=NA_real_, "v1"=NA_real_)
179+
frollapply(x, 4, f, by.column=FALSE, fill=coef.fill)
180+
# (Intercept) v1
181+
# 1: NA NA
182+
# 2: NA NA
183+
# 3: NA NA
184+
# 4: 0.65456931 0.3138012
185+
# 5: -1.07977441 -2.0588094
186+
#---
187+
#116: 0.15828417 0.3570216
188+
#117: -0.09083424 1.5494507
189+
#118: -0.18345878 0.6424837
190+
#119: -0.28964772 0.6116575
191+
#120: -0.40598313 0.6112854
192+
```
193+
- uses multiple CPU threads (on a decent OS); evaluation of UDF is inherently slow so this can be a great help.
194+
```r
195+
x = rnorm(1e5)
196+
n = 500
197+
setDTthreads(1)
198+
system.time(
199+
th1 <- frollapply(x, n, median, simplify=unlist)
200+
)
201+
# user system elapsed
202+
# 3.078 0.005 3.084
203+
setDTthreads(4)
204+
system.time(
205+
th4 <- frollapply(x, n, median, simplify=unlist)
206+
)
207+
# user system elapsed
208+
# 2.453 0.135 0.897
209+
all.equal(th1, th4)
210+
#[1] TRUE
211+
```
212+
160213
### BUG FIXES
161214

162215
1. `fread()` no longer warns on certain systems on R 4.5.0+ where the file owner can't be resolved, [#6918](https://github.com/Rdatatable/data.table/issues/6918). Thanks @ProfFancyPants for the report and PR.

R/froll.R

Lines changed: 46 additions & 49 deletions
Original file line numberDiff line numberDiff line change
@@ -25,21 +25,31 @@ trimnadaptive = function(n, align) {
2525
# frollsum(list(1:4, 2:5), 2:3, partial=FALSE, adaptive=FALSE)
2626
# frollsum(list(1:4, 2:5), 2:3, partial=TRUE, adaptive=FALSE)
2727
partial2adaptive = function(x, n, align, adaptive) {
28+
## do not quote argument x and n arg names because frollapply has them in uppercase
2829
if (!length(n))
2930
stopf("n must be non 0 length")
3031
if (align=="center")
3132
stopf("'partial' cannot be used together with align='center'")
32-
if (is.list(x) && length(unique(lengths(x))) != 1L)
33-
stopf("'partial' does not support variable length of columns in 'x'")
34-
len = if (is.list(x)) length(x[[1L]]) else length(x)
33+
if (is.list(x)) {
34+
if (!is.data.frame(x) && !equal.lengths(x)) ## froll
35+
stopf("'partial' does not support variable length of columns in x")
36+
else if (all_data.frame(x) && !equal.nrows(x)) ## frollapply by.column=F, single DT already wrapped into list
37+
stopf("'partial' does not support variable nrow of data.tables in x")
38+
}
39+
len = if (is.list(x)) {
40+
if (is.data.frame(x[[1L]])) ## frollapply by.column
41+
nrow(x[[1L]])
42+
else ## froll, this will work for both x list and x dt on input
43+
length(x[[1L]])
44+
} else length(x)
3545
verbose = getOption("datatable.verbose")
3646
if (!adaptive) {
3747
if (is.list(n))
3848
stopf("n must be an integer, list is accepted for adaptive TRUE")
3949
if (!is.numeric(n))
4050
stopf("n must be an integer vector or a list of integer vectors")
4151
if (verbose)
42-
catf("partial2adaptive: froll partial=TRUE trimming 'n' and redirecting to adaptive=TRUE\n")
52+
catf("partial2adaptive: froll partial=TRUE trimming n and redirecting to adaptive=TRUE\n")
4353
if (length(n) > 1L) {
4454
## c(2,3) -> list(c(1,2,2,2),c(1,2,3,3)) ## for x=1:4
4555
lapply(n, len, align, FUN=trimn)
@@ -50,14 +60,14 @@ partial2adaptive = function(x, n, align, adaptive) {
5060
} else {
5161
if (!(is.numeric(n) || (is.list(n) && all(vapply_1b(n, is.numeric)))))
5262
stopf("n must be an integer vector or a list of integer vectors")
53-
if (length(unique(lengths(n))) != 1L)
54-
stopf("adaptive window provided in 'n' must not to have different lengths")
63+
if (is.list(n) && length(unique(lengths(n))) != 1L)
64+
stopf("adaptive windows provided in n must not to have different lengths")
5565
if (is.numeric(n) && length(n) != len)
56-
stopf("length of 'n' argument must be equal to number of observations provided in 'x'")
66+
stopf("length of n argument must be equal to number of observations provided in x")
5767
if (is.list(n) && length(n[[1L]]) != len)
58-
stopf("length of vectors in 'x' must match to length of adaptive window in 'n'")
68+
stopf("length of vectors in x must match to length of adaptive window in n")
5969
if (verbose)
60-
catf("partial2adaptive: froll adaptive=TRUE and partial=TRUE trimming 'n'\n")
70+
catf("partial2adaptive: froll adaptive=TRUE and partial=TRUE trimming n\n")
6171
if (is.numeric(n)) {
6272
## c(3,3,3,2) -> c(1,2,3,2) ## for x=1:4
6373
trimnadaptive(n, align)
@@ -93,27 +103,35 @@ make.roll.names = function(x.len, n.len, n, x.nm, n.nm, fun, adaptive) {
93103
if (length(n.nm)) { ## !adaptive || is.list(n)
94104
n.nm
95105
} else { ## adaptive && is.numeric(n)
96-
NULL # nocov ## call to make.roll.names is excluded by is.list(ans) condition before calling it, it will be relevant for !by.column in next PR
106+
stopf("internal error: make.roll.names call should have been escaped in frollapply during 'unpack atomic input'") # nocov ## frollapply(data.frame(x=1:5), rep(2,5), dim, by.column=FALSE, give.names=TRUE, adaptive=TRUE)
97107
}
98108
}
99109
if (!is.null(ans) && length(ans) != x.len*n.len)
100110
stopf("internal error: make.roll.names generated names of wrong length") ## nocov
101111
ans
102112
}
103113

104-
froll = function(fun, x, n, fill=NA, algo, align=c("right","left","center"), na.rm=FALSE, has.nf=NA, adaptive=FALSE, partial=FALSE, FUN, rho, give.names=FALSE) {
114+
froll = function(fun, x, n, fill=NA, algo=c("fast","exact"), align=c("right","left","center"), na.rm=FALSE, has.nf=NA, adaptive=FALSE, partial=FALSE, give.names=FALSE, hasNA) {
115+
stopifnot(!missing(fun), is.character(fun), length(fun)==1L, !is.na(fun))
116+
if (!missing(hasNA)) {
117+
if (!is.na(has.nf))
118+
stopf("hasNA is deprecated, use has.nf instead")
119+
warningf("hasNA is deprecated, use has.nf instead")
120+
has.nf = hasNA
121+
} # remove check on next major release
122+
algo = match.arg(algo)
105123
align = match.arg(align)
106124
if (isTRUE(give.names)) {
107-
orig = list(n=n, adaptive=adaptive)
108-
xnam = if (is.list(x)) names(x) else character()
109-
nnam = if (isTRUE(adaptive)) {
110-
if (is.list(n)) names(n) else character()
111-
} else names(n)
112-
nx = if (is.list(x)) length(x) else 1L
113-
nn = if (isTRUE(adaptive)) {
114-
if (is.list(n)) length(n) else 1L
115-
} else length(n)
116-
}
125+
orig = list(n=n, adaptive=adaptive)
126+
xnam = if (is.list(x)) names(x) else character()
127+
nnam = if (isTRUE(adaptive)) {
128+
if (is.list(n)) names(n) else character()
129+
} else names(n)
130+
nx = if (is.list(x)) length(x) else 1L
131+
nn = if (isTRUE(adaptive)) {
132+
if (is.list(n)) length(n) else 1L
133+
} else length(n)
134+
}
117135
if (isTRUE(partial)) {
118136
n = partial2adaptive(x, n, align, adaptive)
119137
adaptive = TRUE
@@ -128,10 +146,7 @@ froll = function(fun, x, n, fill=NA, algo, align=c("right","left","center"), na.
128146
n = rev2(n)
129147
align = "right"
130148
} ## support for left adaptive added in #5441
131-
if (missing(FUN))
132-
ans = .Call(CfrollfunR, fun, x, n, fill, algo, align, na.rm, has.nf, adaptive)
133-
else
134-
ans = .Call(CfrollapplyR, FUN, x, n, fill, align, adaptive, rho)
149+
ans = .Call(CfrollfunR, fun, x, n, fill, algo, align, na.rm, has.nf, adaptive)
135150
if (leftadaptive) {
136151
if (verbose)
137152
catf("froll: adaptive=TRUE && align='left' post-processing from align='right'\n")
@@ -144,30 +159,12 @@ froll = function(fun, x, n, fill=NA, algo, align=c("right","left","center"), na.
144159
ans
145160
}
146161

147-
frollfun = function(fun, x, n, fill=NA, algo=c("fast","exact"), align=c("right","left","center"), na.rm=FALSE, has.nf=NA, adaptive=FALSE, partial=FALSE, hasNA, give.names=FALSE) {
148-
stopifnot(!missing(fun), is.character(fun), length(fun)==1L, !is.na(fun))
149-
if (!missing(hasNA)) {
150-
if (!is.na(has.nf))
151-
stopf("hasNA is deprecated, use has.nf instead")
152-
warningf("hasNA is deprecated, use has.nf instead")
153-
has.nf = hasNA
154-
} # remove check on next major release
155-
algo = match.arg(algo)
156-
froll(fun=fun, x=x, n=n, fill=fill, algo=algo, align=align, na.rm=na.rm, has.nf=has.nf, adaptive=adaptive, partial=partial, give.names=give.names)
157-
}
158-
159-
frollmean = function(x, n, fill=NA, algo=c("fast","exact"), align=c("right","left","center"), na.rm=FALSE, has.nf=NA, adaptive=FALSE, partial=FALSE, hasNA, give.names=FALSE) {
160-
frollfun(fun="mean", x=x, n=n, fill=fill, algo=algo, align=align, na.rm=na.rm, has.nf=has.nf, adaptive=adaptive, partial=partial, hasNA=hasNA, give.names=give.names)
162+
frollmean = function(x, n, fill=NA, algo=c("fast","exact"), align=c("right","left","center"), na.rm=FALSE, has.nf=NA, adaptive=FALSE, partial=FALSE, give.names=FALSE, hasNA) {
163+
froll(fun="mean", x=x, n=n, fill=fill, algo=algo, align=align, na.rm=na.rm, has.nf=has.nf, adaptive=adaptive, partial=partial, hasNA=hasNA, give.names=give.names)
161164
}
162-
frollsum = function(x, n, fill=NA, algo=c("fast","exact"), align=c("right","left","center"), na.rm=FALSE, has.nf=NA, adaptive=FALSE, partial=FALSE, hasNA, give.names=FALSE) {
163-
frollfun(fun="sum", x=x, n=n, fill=fill, algo=algo, align=align, na.rm=na.rm, has.nf=has.nf, adaptive=adaptive, partial=partial, hasNA=hasNA, give.names=give.names)
165+
frollsum = function(x, n, fill=NA, algo=c("fast","exact"), align=c("right","left","center"), na.rm=FALSE, has.nf=NA, adaptive=FALSE, partial=FALSE, give.names=FALSE, hasNA) {
166+
froll(fun="sum", x=x, n=n, fill=fill, algo=algo, align=align, na.rm=na.rm, has.nf=has.nf, adaptive=adaptive, partial=partial, hasNA=hasNA, give.names=give.names)
164167
}
165-
frollmax = function(x, n, fill=NA, algo=c("fast","exact"), align=c("right","left","center"), na.rm=FALSE, has.nf=NA, adaptive=FALSE, partial=FALSE, hasNA, give.names=FALSE) {
166-
frollfun(fun="max", x=x, n=n, fill=fill, algo=algo, align=align, na.rm=na.rm, has.nf=has.nf, adaptive=adaptive, partial=partial, hasNA=hasNA, give.names=give.names)
167-
}
168-
169-
frollapply = function(x, n, FUN, ..., fill=NA, align=c("right","left","center"), adaptive=FALSE, partial=FALSE, give.names=FALSE) {
170-
FUN = match.fun(FUN)
171-
rho = new.env()
172-
froll(FUN=FUN, rho=rho, x=x, n=n, fill=fill, align=align, adaptive=adaptive, partial=partial, give.names=give.names)
168+
frollmax = function(x, n, fill=NA, algo=c("fast","exact"), align=c("right","left","center"), na.rm=FALSE, has.nf=NA, adaptive=FALSE, partial=FALSE, give.names=FALSE, hasNA) {
169+
froll(fun="max", x=x, n=n, fill=fill, algo=algo, align=align, na.rm=na.rm, has.nf=has.nf, adaptive=adaptive, partial=partial, hasNA=hasNA, give.names=give.names)
173170
}

0 commit comments

Comments
 (0)