Skip to content

Commit 36f09d4

Browse files
committed
Merge branch 'master' into forder_segfault
2 parents 9f53a32 + 8ab0b2b commit 36f09d4

File tree

12 files changed

+123
-72
lines changed

12 files changed

+123
-72
lines changed

.github/workflows/R-CMD-check-occasional.yaml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,7 @@
11
on:
22
schedule:
33
- cron: '17 13 23 * *' # 23rd of month at 13:17 UTC
4+
workflow_dispatch:
45

56
# A more complete suite of checks to run monthly; each PR/merge need not pass all these, but they should pass before CRAN release
67
name: R-CMD-check-occasional

.github/workflows/R-CMD-check.yaml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,7 @@ on:
55
branches:
66
- master
77
pull_request:
8+
workflow_dispatch:
89

910
name: R-CMD-check
1011

.github/workflows/code-quality.yaml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,7 @@ on:
22
push:
33
branches: [master]
44
pull_request:
5+
workflow_dispatch:
56

67
name: code-quality
78

.github/workflows/performance-tests.yml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,7 @@ on:
1010
- 'R/**'
1111
- 'src/**'
1212
- '.ci/atime/**'
13+
workflow_dispatch:
1314

1415
jobs:
1516
comment:

.github/workflows/rchk.yaml

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -18,7 +18,14 @@
1818
on:
1919
push:
2020
branches: [master]
21+
paths:
22+
- '.github/workflows/rchk.yaml'
23+
- 'src/**'
2124
pull_request:
25+
paths:
26+
- '.github/workflows/rchk.yaml'
27+
- 'src/**'
28+
workflow_dispatch:
2229

2330
name: 'rchk'
2431

.github/workflows/test-coverage.yaml

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,20 @@
33
on:
44
push:
55
branches: [master]
6+
paths:
7+
- '.github/workflows/test-coverage.yaml'
8+
- 'inst/tests/**'
9+
- 'R/**'
10+
- 'src/**'
11+
- 'tests/**'
612
pull_request:
13+
paths:
14+
- '.github/workflows/test-coverage.yaml'
15+
- 'inst/tests/**'
16+
- 'R/**'
17+
- 'src/**'
18+
- 'tests/**'
19+
workflow_dispatch:
720

821
name: test-coverage.yaml
922

NEWS.md

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -38,6 +38,11 @@
3838
3939
1. `data.table(x=1, <expr>)`, where `<expr>` is an expression resulting in a 1-column matrix without column names, will eventually have names `x` and `V2`, not `x` and `V1`, consistent with `data.table(x=1, <expr>)` where `<expr>` results in an atomic vector, for example `data.table(x=1, cbind(1))` and `data.table(x=1, 1)` will both have columns named `x` and `V2`. In this release, the matrix case continues to be named `V1`, but the new behavior can be activated by setting `options(datatable.old.matrix.autoname)` to `FALSE`. See point 5 under Bug Fixes for more context; this change will provide more internal consistency as well as more consistency with `data.frame()`.
4040
41+
2. The behavior of `week()` will be changed in a future release to calculate weeks sequentially (days 1-7 as week 1), which is a potential breaking change. For now, the current "legacy" behavior, where week numbers advance every 7th day of the year (e.g., day 7 starts week 2), remains the default, and a deprecation warning will be issued when the old and new behaviors differ. Users can control this behavior with the temporary option `options(datatable.week = "...")`:
42+
* `"sequential"`: Opt-in to the new, sequential behavior (no warning).
43+
* `"legacy"`: Continue using the legacy behavior but suppress the deprecation warning.
44+
See [#2611](https://github.com/Rdatatable/data.table/issues/2611) for details. Thanks @MichaelChirico for the report and @venom1204 for the implementation.
45+
4146
### NEW FEATURES
4247
4348
1. New `sort_by()` method for data.tables, [#6662](https://github.com/Rdatatable/data.table/issues/6662). It uses `forder()` to improve upon the data.frame method and also matches `DT[order(...)]` behavior with respect to locale. Thanks @rikivillalba for the suggestion and PR.

inst/tests/tests.Rraw

Lines changed: 12 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -18384,7 +18384,7 @@ x = c("1111-11-11", "2019-01-01", "2019-02-28", "2019-03-01", "2019-12-31", "202
1838418384
test(2236.1, yday(x), c(315L, 1L, 59L, 60L, 365L, 60L, 61L, 366L, 1L, 366L, 60L, NA))
1838518385
test(2236.2, mday(x), c(11L, 1L, 28L, 1L, 31L, 29L, 1L, 31L, 1L, 31L, 1L, NA))
1838618386
test(2236.3, wday(x), c(7L, 3L, 5L, 6L, 3L, 7L, 1L, 5L, 1L, 2L, 2L, NA))
18387-
test(2236.4, week(x), c(46L, 1L, 9L, 9L, 53L, 9L, 9L, 53L, 1L, 53L, 9L, NA))
18387+
test(2236.4, options = c(datatable.week = "legacy"), week(x), c(46L, 1L, 9L, 9L, 53L, 9L, 9L, 53L, 1L, 53L, 9L, NA))
1838818388
test(2236.5, month(x), c(11L, 1L, 2L, 3L, 12L, 2L, 3L, 12L, 1L, 12L, 3L, NA))
1838918389
test(2236.6, quarter(x), c(4L, 1L, 1L, 1L, 4L, 1L, 1L, 4L, 1L, 4L, 1L, NA))
1839018390
test(2236.7, year(x), c(1111L, 2019L, 2019L, 2019L, 2019L, 2020L, 2020L, 2020L, 2040L, 2040L, 2100L, NA))
@@ -21816,13 +21816,22 @@ test(2341.24, fread('a
2181621816
b
2181721817
', comment.char = '#', strip.white = FALSE, sep = ","), data.table(a=c(" ", "b")))
2181821818

21819+
# week() sequential numbering fix tests #2611
21820+
test(2342.1, options = c(datatable.week = "sequential"), week(as.IDate("1970-01-01") + 0:7), c(1L,1L,1L,1L,1L,1L,1L,2L))
21821+
test(2342.2, options = c(datatable.week = "sequential"), week(as.IDate(c("2012-02-28","2012-02-29","2012-03-01"))), c(9L,9L,9L))
21822+
test(2342.3, options = c(datatable.week = "sequential"), week(as.IDate(c("2019-12-31","2020-01-01"))), c(53L,1L))
21823+
test(2342.4, options = c(datatable.week = "sequential"), week(as.IDate(c("2020-12-31","2021-01-01"))), c(53L,1L))
21824+
test(2342.5, options = c(datatable.week = "sequential"), week(as.IDate("2021-01-06") + 0:6), c(1L,1L,2L,2L,2L,2L,2L))
21825+
test(2342.6, options = c(datatable.week = "sequential"), week(as.IDate(c("2016-02-27","2016-02-28","2016-02-29","2016-03-01","2016-03-02"))), c(9L,9L,9L,9L,9L))
21826+
test(2342.7, options = c(datatable.week = "default"), week(as.IDate("1970-01-07")), 2L, warning = "The default behavior of week() is changing")
21827+
2181921828
# forderv should not segfault on long single group keys due to recursion #4300
2182021829
N = 1e4
2182121830
set.seed(1)
2182221831
idx = sort(sample(10, 20, TRUE))
2182321832
x = matrix(rnorm(N), nrow=10)
2182421833
DT = as.data.table(x[idx,])
2182521834
DT[, V1000 := 20:1]
21826-
test(2342.1, forderv(DT, by=names(DT), sort=FALSE, retGrp=TRUE), forderv(DT, by=c("V1", "V1000"), sort=FALSE, retGrp=TRUE))
21835+
test(2343.1, forderv(DT, by=names(DT), sort=FALSE, retGrp=TRUE), forderv(DT, by=c("V1", "V1000"), sort=FALSE, retGrp=TRUE))
2182721836
x = c(rep(0, 7e5), 1e6)
21828-
test(2342.2, forderv(list(x)), integer(0))
21837+
test(2343.2, forderv(list(x)), integer(0))

man/froll.Rd

Lines changed: 21 additions & 41 deletions
Original file line numberDiff line numberDiff line change
@@ -45,59 +45,44 @@
4545
}
4646
\arguments{
4747
\item{x}{ Integer, numeric or logical vector, coerced to numeric, on which sliding window calculates an aggregate function. It supports vectorized input, then it needs to be a \code{data.table}, \code{data.frame} or a \code{list}, in which case a rolling function is applied to each column/vector. }
48-
\item{n}{ Integer, non-negative, rolling window size. This is the \emph{total} number of included values in aggregate function. In case of an adaptive rolling function window size has to be provided as a vector for each indivdual value of \code{x}. It supports vectorized input, then it needs to be a vector, or in case of an adaptive rolling a \code{list} of vectors. }
49-
\item{fill}{ Numeric; value to pad by. Defaults to \code{NA}. }
50-
\item{algo}{ Character, default \code{"fast"}. When set to \code{"exact"}, a slower (but more accurate) algorithm is used. It suffers less from floating point rounding errors by performing an extra pass, and carefully handles all non-finite values. It will use multiple cores where available. See Details for more information. }
48+
\item{n}{ Integer, non-negative, non-NA, rolling window size. This is the \emph{total} number of included values in aggregate function. In case of an adaptive rolling function, the window size has to be provided as a vector for each individual value of \code{x}. It supports vectorized input, then it needs to be a vector, or in case of an adaptive rolling a \code{list} of vectors. }
49+
\item{fill}{ Numeric; value to pad by for an incomplete window iteration. Defaults to \code{NA}. When partial=TRUE this argument is ignored. }
50+
\item{algo}{ Character, default \code{"fast"}. When set to \code{"exact"}, a slower (in some cases more accurate) algorithm is used. It will use multiple cores where available. See Details for more information. }
5151
\item{align}{ Character, specifying the "alignment" of the rolling window, defaulting to \code{"right"}. \code{"right"} covers preceding rows (the window \emph{ends} on the current value); \code{"left"} covers following rows (the window \emph{starts} on the current value); \code{"center"} is halfway in between (the window is \emph{centered} on the current value, biased towards \code{"left"} when \code{n} is even). }
52-
\item{na.rm}{ Logical, default \code{FALSE}. Should missing values be removed when calculating window? }
52+
\item{na.rm}{ Logical, default \code{FALSE}. Should missing values be removed when calculating aggregate function on a window? }
5353
\item{has.nf}{ Logical. If it is known whether \code{x} contains non-finite values (\code{NA}, \code{NaN}, \code{Inf}, \code{-Inf}), then setting this to \code{TRUE} or \code{FALSE} may speed up computation. Defaults to \code{NA}. See \emph{has.nf argument} section below for details. }
5454
\item{adaptive}{ Logical, default \code{FALSE}. Should the rolling function be calculated adaptively? See \emph{Adaptive rolling functions} section below for details. }
55-
\item{partial}{ Logical, default \code{FALSE}. Should the rolling window size(s) provided in \code{n} be computed also for leading incomplete running window. See \emph{\code{partial} argument} section below for details. }
55+
\item{partial}{ Logical, default \code{FALSE}. Should the rolling window size(s) provided in \code{n} be computed also for leading incomplete running window? See \emph{\code{partial} argument} section below for details. }
5656
\item{give.names}{ Logical, default \code{FALSE}. When \code{TRUE}, names are automatically generated corresponding to names of \code{x} and names of \code{n}. If answer is an atomic vector, then the argument is ignored, see examples. }
5757
\item{hasNA}{ Logical. Deprecated, use \code{has.nf} argument instead. }
5858
}
5959
\details{
60-
\code{froll*} functions accept vector, list, \code{data.frame} or \code{data.table}. Functions operate on a single vector; when passing a non-atomic input, then function is applied column-by-column, not to the complete set of columns at once.
60+
\code{froll*} functions accept vector, list, \code{data.frame} or \code{data.table}. Functions operate on a single vector; when passing a non-atomic input, then the function is applied column-by-column, not to the complete set of columns at once.
6161

6262
Argument \code{n} allows multiple values to apply rolling function on multiple window sizes. If \code{adaptive=TRUE}, then \code{n} can be a list to specify multiple window sizes for adaptive rolling computation. See \emph{Adaptive rolling functions} section below for details.
6363

64-
When multiple columns and/or multiple window widths are provided, then computations run in parallel. The exception is for \code{algo="exact"}, which runs in parallel even for single column and single window width. By default, data.table uses only half of available CPUs, see \code{\link{setDTthreads}} for details on how to tune CPU usage.
64+
When multiple columns or multiple window widths are provided, then they are run in parallel. The exception is for \code{algo="exact"} or \code{adaptive=TRUE}, which runs in parallel even for single column and single window width. By default, data.table uses only half of available CPUs, see \code{\link{setDTthreads}} for details on how to tune CPU usage.
6565

66-
Adaptive rolling functions are a special case where each
67-
observation has its own corresponding rolling window width. Due to the logic
68-
of adaptive rolling functions, the following restrictions apply:
69-
\itemize{
70-
\item \code{align} only \code{"right"}.
71-
\item if list of vectors is passed to \code{x}, then all
72-
vectors within it must have equal length.
73-
}
74-
75-
When multiple columns or multiple windows width are provided, then they
76-
are run in parallel. The exception is for \code{algo="exact"}, which runs in
77-
parallel already.
78-
79-
Setting \code{options(datatable.verbose=TRUE)} will display various
80-
information about how rolling function processed. It will not print
81-
information in real-time but only at the end of the processing.
66+
Setting \code{options(datatable.verbose=TRUE)} will display various information about how rolling function processed. It will not print information in real-time but only at the end of the processing.
8267
}
8368
\value{
84-
For a non \emph{vectorized} input (\code{x} is not a list, and \code{n} specify single rolling window) a \code{vector} is returned, for convenience. Thus, rolling functions can be used conveniently within \code{data.table} syntax. For a \emph{vectorized} input a list is returned.
69+
For a non \emph{vectorized} input (\code{x} is not a list, and \code{n} specifies a single rolling window) a \code{vector} is returned, for convenience. Thus, rolling functions can be used conveniently within \code{data.table} syntax. For a \emph{vectorized} input a list is returned.
8570
}
8671
\note{
87-
Be aware that rolling functions operate on the physical order of input. If the intent is to roll values in a vector by a logical window, for example an hour, or a day, then one has to ensure that there are no gaps in the input, or use adaptive rolling function to handle gaps, for which we provide helper function \code{\link{frolladapt}} to generate adaptive window size.
72+
Be aware that rolling functions operate on the physical order of input. If the intent is to roll values in a vector by a logical window, for example an hour, or a day, then one has to ensure that there are no gaps in the input, or use an adaptive rolling function to handle gaps, for which we provide helper function \code{\link{frolladapt}} to generate adaptive window size.
8873
}
8974
\section{\code{has.nf} argument}{
9075
\code{has.nf} can be used to speed up processing in cases when it is known if \code{x} contains (or not) non-finite values (\code{NA}, \code{NaN}, \code{Inf}, \code{-Inf}).
9176
\itemize{
92-
\item Default \code{has.nf=NA} uses faster implementation that does not support non-finite values, but when non-finite values are detected it will re-run non-finite supported implementation.
77+
\item Default \code{has.nf=NA} uses faster implementation that does not support non-finite values, but when non-finite values are detected it will re-run non-finite aware implementation.
9378
\item \code{has.nf=TRUE} uses non-finite aware implementation straightaway.
9479
\item \code{has.nf=FALSE} uses faster implementation that does not support non-finite values. Then depending on the rolling function it will either:
9580
\itemize{
9681
\item (\emph{mean, sum, prod, var, sd}) detect non-finite, re-run non-finite aware.
9782
\item (\emph{max, min, median}) does not detect non-finites and may silently produce an incorrect answer.
9883
}
99-
In general \code{has.nf=FALSE && any(!is.finite(x))} should be considered as undefined behavior. Therefore \code{has.nf=FALSE} should be used with care.
10084
}
85+
In general \code{has.nf=FALSE && any(!is.finite(x))} should be considered undefined behavior. Therefore \code{has.nf=FALSE} should be used with care.
10186
}
10287
\section{Implementation}{
10388
Most of the rolling functions have 4 different implementations. First factor that decides which implementation is used is the \code{adaptive} argument (either \code{TRUE} or \code{FALSE}), see section below for details. Then for each of those two algorithms there are usually two implementations depending on the \code{algo} argument.
@@ -111,16 +96,16 @@
11196
}
11297
\item \code{algo="exact"} will make the rolling functions use a more computationally-intensive algorithm. For each observation in the input vector it will compute a function on a rolling window from scratch (complexity \eqn{O(n^2)}).
11398
\itemize{
114-
\item Depeneding on the function, this algorithm may suffers less from floating point rounding error (the same consideration applies to base \code{\link[base]{mean}}).
99+
\item Depending on the function, this algorithm may suffer less from floating point rounding error (the same consideration applies to base \code{\link[base]{mean}}).
115100
\item In case of \emph{mean}, it will additionally make an extra pass to perform floating point error correction. Error corrections might not be truly exact on some platforms (like Windows) when using multiple threads.
116101
}
117102
}
118103
}
119104
\section{Adaptive rolling functions}{
120-
Adaptive rolling functions are a special case where each observation has its own corresponding rolling window width. Therefore, values passed to \code{n} argument must be series corresponding to observations in \code{x}. If multiple windows are meant to be computed, then a list of integer vectors is expected; each list element must be an integer vector of window size corresponding to observations in \code{x}; see Examples. Due to the logic or implementation of adaptive rolling functions, the following restrictions apply
105+
Adaptive rolling functions are a special case where each observation has its own corresponding rolling window width. Therefore, values passed to \code{n} argument must be series corresponding to observations in \code{x}. If multiple windows are meant to be computed, then a list of integer vectors is expected; each list element must be an integer vector of window size corresponding to observations in \code{x}; see Examples. Due to the logic or implementation of adaptive rolling functions, the following restrictions apply:
121106
\itemize{
122107
\item \code{align} does not support \code{"center"}.
123-
\item if list of vectors is passed to \code{x}, then all vectors within it must have equal length due to the fact that length of adaptive window widths must match the length of vectors in \code{x}.
108+
\item if a list of vectors is passed to \code{x}, then all vectors within it must have equal length due to the fact that length of adaptive window widths must match the length of vectors in \code{x}.
124109
}
125110
}
126111
\section{\code{partial} argument}{
@@ -131,19 +116,14 @@
131116
\section{\code{zoo} package users notice}{
132117
Users coming from most popular package for rolling functions \code{zoo} might expect following differences in \code{data.table} implementation
133118
\itemize{
134-
\item rolling function will always return result of the same length
135-
as input.
119+
\item rolling function will always return result of the same length as input.
136120
\item \code{fill} defaults to \code{NA}.
137-
\item \code{fill} accepts only constant values. It does not support
138-
for \emph{na.locf} or other functions.
121+
\item \code{fill} accepts only constant values. It does not support for \emph{na.locf} or other functions.
139122
\item \code{align} defaults to \code{"right"}.
140-
\item \code{na.rm} is respected, and other functions are not needed
141-
when input contains \code{NA}.
123+
\item \code{na.rm} is respected, and other functions are not needed when input contains \code{NA}.
142124
\item integers and logical are always coerced to numeric.
143-
\item when \code{adaptive=FALSE} (default), then \code{n} must be a
144-
numeric vector. List is not accepted.
145-
\item when \code{adaptive=TRUE}, then \code{n} must be vector of
146-
length equal to \code{nrow(x)}, or list of such vectors.
125+
\item when \code{adaptive=FALSE} (default), then \code{n} must be a numeric vector. List is not accepted.
126+
\item when \code{adaptive=TRUE}, then \code{n} must be vector of length equal to \code{nrow(x)}, or list of such vectors.
147127
}
148128
}
149129
\examples{
@@ -190,7 +170,7 @@ frollsum(list(x=1:5, y=5:1), c(tiny=2, big=4), give.names=TRUE)
190170
frollmax(c(1,2,NA,4,5), 2)
191171
frollmax(c(1,2,NA,4,5), 2, has.nf=FALSE)
192172

193-
# use verobse=TRUE for extra insight
173+
# use verbose=TRUE for extra insight
194174
.op = options(datatable.verbose = TRUE)
195175
frollsd(c(1:5,NA,7:8), 4)
196176
options(.op)

0 commit comments

Comments
 (0)