Skip to content

Commit 017cecd

Browse files
authored
Merge branch 'master' into fixSpelling
2 parents d38996b + 59692e7 commit 017cecd

File tree

17 files changed

+348
-177
lines changed

17 files changed

+348
-177
lines changed

.github/workflows/pkgup.yaml

Lines changed: 3 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,7 @@ on:
1212
branches:
1313
- 'master'
1414

15-
name: pkgdown-deploy
15+
name: pkgup
1616

1717
jobs:
1818
build:
@@ -47,16 +47,9 @@ jobs:
4747
- name: manual
4848
if: github.ref == 'refs/heads/master'
4949
run: |
50-
cp -R ${{ env.R_LIBS_USER }} library
51-
R CMD INSTALL --library="library" $(ls -1t data.table_*.tar.gz | head -n 1) --html
52-
mkdir -p doc/html
53-
cp $(R RHOME)/doc/html/{left.jpg,up.jpg,Rlogo.svg,R.css,index.html} doc/html
54-
Rscript -e 'utils::make.packages.html("library", docdir="doc")'
55-
sed -i "s|file://|../..|g" doc/html/packages.html
50+
R CMD INSTALL $(ls -1t data.table_*.tar.gz | head -n 1)
5651
mkdir -p public
57-
mv doc public/doc
58-
cp -r --parents library/*/{html,help,doc,demo,DESCRIPTION,README,NEWS,README.md,NEWS.md} public 2>/dev/null || :
59-
sed -i 's|"/doc/html/|"/data.table/doc/html/|g' public/library/data.table/doc/index.html 2>/dev/null || :
52+
Rscript -e 'tools::pkg2HTML("data.table", out="public/index.html")'
6053
- name: repo
6154
if: github.ref == 'refs/heads/master'
6255
run: |

DESCRIPTION

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@ Version: 1.17.99
33
Title: Extension of `data.frame`
44
Depends: R (>= 3.4.0)
55
Imports: methods
6-
Suggests: bit64 (>= 4.0.0), bit (>= 4.0.4), R.utils, xts, zoo (>= 1.8-1), yaml, knitr, markdown
6+
Suggests: bit64 (>= 4.0.0), bit (>= 4.0.4), R.utils (>= 2.13.0), xts, zoo (>= 1.8-1), yaml, knitr, markdown
77
Description: Fast aggregation of large data (e.g. 100GB in RAM), fast ordered joins, fast add/modify/delete of columns by group using no copies at all, list columns, friendly and fast character-separated-value read/write. Offers a natural and flexible syntax, for faster development.
88
License: MPL-2.0 | file LICENSE
99
URL: https://r-datatable.com, https://Rdatatable.gitlab.io/data.table, https://github.com/Rdatatable/data.table

NAMESPACE

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -215,6 +215,7 @@ S3method(format_list_item, data.frame)
215215
export(fdroplevels, setdroplevels)
216216
S3method(droplevels, data.table)
217217
export(frev)
218+
export(.selfref.ok)
218219

219220
# sort_by added in R 4.4.0, #6662, https://stat.ethz.ch/pipermail/r-announce/2024/000701.html
220221
if (getRversion() >= "4.4.0") S3method(sort_by, data.table)

NEWS.md

Lines changed: 11 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -177,20 +177,20 @@
177177
```r
178178
x = data.table(v1=rnorm(120), v2=rnorm(120))
179179
f = function(x) coef(lm(v2 ~ v1, data=x))
180-
coef.fill = c("(Intercept)"=NA_real_, "v1"=NA_real_)
181-
frollapply(x, 4, f, by.column=FALSE, fill=coef.fill)
180+
frollapply(x, 4, f, by.column=FALSE)
182181
# (Intercept) v1
182+
# <num> <num>
183183
# 1: NA NA
184184
# 2: NA NA
185185
# 3: NA NA
186-
# 4: 0.65456931 0.3138012
187-
# 5: -1.07977441 -2.0588094
186+
# 4: -0.04648236 -0.6349687
187+
# 5: 0.09208733 -0.4964023
188188
#---
189-
#116: 0.15828417 0.3570216
190-
#117: -0.09083424 1.5494507
191-
#118: -0.18345878 0.6424837
192-
#119: -0.28964772 0.6116575
193-
#120: -0.40598313 0.6112854
189+
#116: -0.21169439 0.7421358
190+
#117: -0.19729119 0.4926939
191+
#118: -0.04217896 0.0452713
192+
#119: 0.22472549 -0.5245874
193+
#120: 0.54540359 -0.1638333
194194
```
195195
- uses multiple CPU threads (on a decent OS); evaluation of UDF is inherently slow so this can be a great help.
196196
```r
@@ -311,6 +311,8 @@
311311
312312
6. Using a double vector in `set()`'s `i=` and/or `j=` no longer throws a warning about preferring integer, [#6594](https://github.com/Rdatatable/data.table/issues/6594). While it may improve efficiency to use integer, there's no guarantee it's an improvement and the difference is likely to be minimal. The coercion will still be reported under `datatable.verbose=TRUE`. For package/production use cases, static analyzers such as `lintr::implicit_integer_linter()` can also report when numeric literals should be rewritten as integer literals.
313313

314+
7. In rare situations a data.table object may lose its internal attribute that holds a self-reference. New helper function `.selfref.ok()` tests just that. It is only intended for technical use cases. See manual for examples.
315+
314316
## data.table [v1.17.8](https://github.com/Rdatatable/data.table/milestone/41) (6 July 2025)
315317

316318
1. Internal functions used to signal errors are now marked as non-returning, silencing a compiler warning about potentially unchecked allocation failure. Thanks to Prof. Brian D. Ripley for the report and @aitap for the fix, [#7070](https://github.com/Rdatatable/data.table/pull/7070).

R/fread.R

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -119,6 +119,9 @@ yaml=FALSE, tmpdir=tempdir(), tz="UTC")
119119
if ((w <- endsWithAny(file, c(".gz", ".bgz",".bz2"))) || (gzsig <- is_gzip(file_signature)) || is_bzip(file_signature)) {
120120
if (!requireNamespace("R.utils", quietly = TRUE))
121121
stopf("To read %s files directly, fread() requires 'R.utils' package which cannot be found. Please install 'R.utils' using 'install.packages('R.utils')'.", if (w<=2L || gzsig) "gz" else "bz2") # nocov
122+
# not worth doing a behavior test here, so just use getRversion().
123+
if (packageVersion("R.utils") < "2.13.0" && base::getRversion() >= "4.5.0")
124+
stopf("Reading compressed files in fread requires R.utils version 2.13.0 or higher. Please upgrade R.utils.") # nocov
122125
FUN = if (w<=2L || gzsig) gzfile else bzfile
123126
decompFile = tempfile(tmpdir=tmpdir)
124127
on.exit(unlink(decompFile), add=TRUE)

R/frollapply.R

Lines changed: 107 additions & 54 deletions
Original file line numberDiff line numberDiff line change
@@ -1,57 +1,117 @@
1+
all_atomic = function(x) all(vapply_1b(x, is.atomic, use.names=FALSE))
2+
all_data.frame = function(x) all(vapply_1b(x, is.data.frame, use.names=FALSE))
3+
all_list = function(x) all(vapply_1b(x, is.list, use.names=FALSE))
4+
all_types = function(x) vapply_1c(x, typeof, use.names=FALSE)
5+
all_names = function(x) lapply(x, names)
6+
len_unq = function(x) length(unique(x))
7+
any_NA_neg = function(x) anyNA(x) || any(x < 0L)
8+
any_NA_names = function(x) anyNA(names(x))
9+
all_NULL_names = function(x) all(vapply_1b(x, function(y) is.null(names(y)), use.names=FALSE))
10+
equal.lengths = function(x) len_unq(lengths(x)) <= 1L
11+
equal.nrows = function(x) len_unq(lapply(x, nrow)) <= 1L
12+
equal.dims = function(x) len_unq(lapply(x, dim)) <= 1L
13+
equal.types = function(x) len_unq(all_types(x)) <= 1L
14+
equal.names = function(x) len_unq(all_names(x)) <= 1L
15+
116
## ansmask is to handle leading values from fill to match type of the ans
217
simplifylist = function(x, fill, ansmask) {
3-
all.t = vapply_1c(x, typeof, use.names=FALSE)
18+
all.t = all_types(x)
419
all.ut = unique(all.t)
520
if (length(all.ut) > 1L) {
6-
ans.t = vapply_1c(x[ansmask], typeof, use.names=FALSE)
7-
ans.ut = unique(ans.t)
8-
## ans postprocessing to match types
9-
if ((length(ans.ut) == 2L) && all(ans.ut %in% c("double","integer","logical"))) { ## align numeric and integer when function is not type stable: median #7313
10-
if ("double" %in% ans.ut) {
11-
if ("integer" %in% ans.ut)
12-
x[ansmask & all.t=="integer"] = lapply(x[ansmask & all.t=="integer"], as.numeric) ## coerce integer to double
13-
if ("logical" %in% ans.ut)
14-
x[ansmask & all.t=="logical"] = lapply(x[ansmask & all.t=="logical"], as.numeric) ## coerce logical to double
15-
ans.ut = "double"
16-
} else if ("integer" %in% ans.ut) {
17-
if ("logical" %in% ans.ut)
18-
x[ansmask & all.t=="logical"] = lapply(x[ansmask & all.t=="logical"], as.integer) ## coerce logical to integer
19-
else
20-
internal_error("simplifylist aligning return types, at that place there should have been some logical types in the answer") # nocov
21-
ans.ut = "integer"
22-
}
21+
ans.ut = unique(all_types(x[ansmask]))
22+
## coerce int to double when varies within answer results: median #7313
23+
if (
24+
length(ans.ut) == 2L && ## simplifylist(list(NA, 1, 1L, TRUE), NA, ansmask=c(F,T,T,T))
25+
all(c("double","integer") %in% ans.ut) && ## simplifylist(list(NA, 1, TRUE), NA, ansmask=c(F,T,T))
26+
equal.lengths(x[ansmask]) ## simplifylist(list(NA, 1, 1:2), NA, ansmask=c(F,T,T))
27+
) {
28+
x[ansmask & all.t=="integer"] = lapply(x[ansmask & all.t=="integer"], as.numeric) ## coerce integer to double
29+
ans.ut = "double"
2330
}
24-
## file postprocessing to match types
25-
if (length(ans.ut) == 1L && equal.lengths(x[ansmask])) {
26-
if (identical(fill, NA)) { ## different typeof of ans and default fill=NA and lengths of ans equal
27-
filli = which(!ansmask)
28-
ans1 = x[[which.first(ansmask)]] ## first ans from full window
29-
x[filli] = rep_len(list(ans1[NA]), length(filli)) ## this will make NA of matching type to ans1 and recycle for all filli
30-
all.ut = ans.ut
31-
} else if (typeof(fill) != ans.ut && all(c(typeof(fill), ans.ut) %in% c("double","integer","logical"))) { ## fill=-2, ans=1L
32-
filli = which(!ansmask)
33-
cast = switch(ans.ut, double = as.numeric, integer = as.integer, logical = as.logical)
34-
x[filli] = rep_len(list(cast(fill)), length(filli))
35-
all.ut = ans.ut
31+
## coerce fill to answers type and length
32+
if (
33+
length(ans.ut) == 1L && ## simplifylist(list(NA, 1, TRUE), NA, ansmask=c(F,T,T))
34+
equal.lengths(x[ansmask]) && ## simplifylist(list(NA, 1L, 1:2), NA, ansmask=c(F,T,T))
35+
is.atomic(fill) ## simplifylist(list(list(NA), list(1), list(2)), list(NA), ansmask=c(F,T,T))
36+
) {
37+
fill.t = typeof(fill)
38+
ans1 = x[[which.first(ansmask)]] ## first ans from full window, all ans same type by now
39+
## coerce fill to type
40+
if (identical(fill, NA)) {
41+
if (ans.ut == "list") {
42+
fill = lapply(ans1, `[`, NA) ## we want list(NA) rather than list(NULL), this also propagates names
43+
} else {
44+
fill = setNames(ans1[NA], names(ans1))
45+
}
46+
fill.t = ans.ut
47+
} else if (
48+
fill.t != ans.ut && ## simplifylist(list(-1, 1, 2), -1, ansmask=c(F,T,T))
49+
fill.t %in% c("double","integer") && ## simplifylist(list(NULL, 1, 2), NULL, ansmask=c(F,T,T))
50+
ans.ut %in% c("double","integer") ## simplifylist(list(1, "a", "b"), 1, ansmask=c(F,T,T))
51+
) { ## fill=-2, ans=1L
52+
if (fill.t == "integer" && ans.ut == "double") {
53+
fill = as.double(fill)
54+
} else if (fill.t == "double" && ans.ut == "integer") {
55+
fill = as.integer(fill)
56+
} else {
57+
internal_error("coerce fill type reached a branch of unexpected fill type and answer type") # nocov
58+
}
59+
fill.t = ans.ut
3660
}
61+
## name fill if all ans have same names
62+
if (
63+
ans.ut != "list" &&
64+
length(fill) == length(ans1) &&
65+
is.null(names(fill)) &&
66+
!is.null(names(ans1)) && ## simplifylist(list(NA, c(1,2), c(1,2)), NA, ansmask=c(F,T,T))
67+
len_unq(vapply_1b(x[ansmask], any_NA_names, use.names=FALSE)) <= 1L && ## simplifylist(list(NA, c(a=1,b=2), setNames(c(1, 2), c(NA,"b"))), NA, ansmask=c(F,T,T))
68+
equal.names(x[ansmask]) ## simplifylist(list(NA, c(a=1,b=2), c(x=1,y=2)), NA, ansmask=c(F,T,T))
69+
) {
70+
fill = setNames(fill, names(ans1))
71+
}
72+
## recycle fill
73+
filli = which(!ansmask)
74+
x[filli] = rep_len(list(fill), length(filli))
3775
}
76+
all.ut = unique(all_types(x))
77+
}
78+
if (
79+
!is.null(names(fill)) &&
80+
all_NULL_names(x[ansmask]) &&
81+
equal.lengths(x)
82+
) {
83+
nm = names(fill)
84+
x[ansmask] = lapply(x[ansmask], `names<-`, nm)
3885
}
39-
all.ut = unique(vapply_1c(x, typeof, use.names=FALSE))
40-
if ((length(all.ut) == 1L) && all(all.ut %in% c("integer","logical","double","complex","character","raw"))) {
41-
if (identical(unique(lengths(x)), 1L)) { ## length 1
42-
return(unlist(x, recursive=FALSE, use.names=FALSE))
43-
} else if (equal.lengths(x)) { ## length 2+ and equal
44-
return(rbindlist(lapply(x, as.list)))
86+
if (length(all.ut) == 1L) {
87+
if (all.ut %in% c("integer","logical","double","complex","character","raw")) {
88+
if (identical(unique(lengths(x)), 1L)) { ## all length 1
89+
return(unlist(x, recursive=FALSE, use.names=FALSE))
90+
} else if (
91+
equal.lengths(x) &&
92+
equal.names(x)
93+
) { ## length 2+ and equal
94+
return(rbindlist(lapply(x, as.list)))
95+
}
96+
} else if (identical(all.ut,"list")) {
97+
if (
98+
all_data.frame(x) && ## simplifylist(list(NA, list(a=1L, b=2L), data.table(a=1L, b=2L)), NA, ansmask=c(F,T,T))
99+
equal.dims(x) && ## simplifylist(list(NA, data.table(a=1L, b=2L), data.table(a=1L)), NA, ansmask=c(F,T,T))
100+
equal.types(x) && ## simplifylist(list(NA, data.table(a=1L, b=2L), data.table(a=1L, b="b")), NA, ansmask=c(F,T,T))
101+
equal.names(x) ## simplifylist(list(NA, data.table(a=1L, b=2L), data.table(x=1L, y=2L)), NA, ansmask=c(F,T,T))
102+
) {
103+
return(rbindlist(x))
104+
} else if (
105+
equal.lengths(x) &&
106+
len_unq(lapply(x, lengths, use.names=FALSE)) <= 1L && ## nested lengths
107+
len_unq(lapply(lapply(x, unlist, recursive=FALSE, use.names=FALSE), typeof)) <= 1L &&
108+
equal.names(x)
109+
) { ## same length lists: list(list(1:2, 1:2), list(2:3, 2:3))
110+
return(rbindlist(x)) ## simplifylist(list(NA, list(1:2, 1:2), list(2:3, 2:3)), NA, ansmask=c(F,T,T))
111+
}
45112
}
46-
} else if (identical(all.ut, "list")) {
47-
if (all_data.frame(x)) ## list(data.table(...), data.table(...))
48-
return(rbindlist(x))
49-
if (equal.lengths(x)) ## same length lists: list(list(1:2, 1:2), list(2:3, 2:3))
50-
return(rbindlist(x))
51113
}
52-
## not simplified, return as is
53-
# not length stable
54-
# not type stable
114+
## not simplified, return as is, see #7317
55115
# NULL, closure, special, builtin, environment, S4, ...
56116
x
57117
}
@@ -65,13 +125,6 @@ fixselfref = function(x) {
65125
x
66126
}
67127

68-
all_atomic = function(x) all(vapply_1b(x, is.atomic, use.names=FALSE))
69-
all_data.frame = function(x) all(vapply_1b(x, is.data.frame, use.names=FALSE))
70-
all_list = function(x) all(vapply_1b(x, is.list, use.names=FALSE))
71-
equal.lengths = function(x) length(unique(lengths(x))) <= 1L
72-
equal.nrows = function(x) length(unique(vapply(x, nrow, 0L))) <= 1L
73-
anyNAneg = function(x) anyNA(x) || any(x < 0L)
74-
75128
frollapply = function(X, N, FUN, ..., by.column=TRUE, fill=NA, align=c("right","left","center"), adaptive=FALSE, partial=FALSE, give.names=FALSE, simplify=TRUE, x, n) {
76129
if (!missing(x)) {
77130
warningf("'x' is deprecated in frollapply, use 'X' instead")
@@ -159,7 +212,7 @@ frollapply = function(X, N, FUN, ..., by.column=TRUE, fill=NA, align=c("right","
159212
nnam = names(N) ## used for give.names
160213
if (!is.integer(N))
161214
N = as.integer(N)
162-
if (anyNAneg(N))
215+
if (any_NA_neg(N))
163216
stopf("'N' must be non-negative integer values (>= 0)")
164217
nn = length(N) ## top level loop for vectorized n
165218
} else {
@@ -170,7 +223,7 @@ frollapply = function(X, N, FUN, ..., by.column=TRUE, fill=NA, align=c("right","
170223
stopf("length of integer vector(s) provided as list to 'N' argument must be equal to number of observations provided in 'X'")
171224
if (!is.integer(N))
172225
N = as.integer(N)
173-
if (anyNAneg(N))
226+
if (any_NA_neg(N))
174227
stopf("'N' must be non-negative integer values (>= 0)")
175228
nn = 1L
176229
N = list(N)
@@ -184,7 +237,7 @@ frollapply = function(X, N, FUN, ..., by.column=TRUE, fill=NA, align=c("right","
184237
stopf("'N' must be an integer vector or list of integer vectors")
185238
if (!all(vapply_1b(N, is.integer, use.names=FALSE)))
186239
N = lapply(N, as.integer)
187-
if (any(vapply_1b(N, anyNAneg, use.names=FALSE)))
240+
if (any(vapply_1b(N, any_NA_neg, use.names=FALSE)))
188241
stopf("'N' must be non-negative integer values (>= 0)")
189242
nn = length(N)
190243
nnam = names(N)

R/helpers.R

Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,19 @@
1+
# here we have helpers functions that DT exports, on contrary to utils.R file
2+
3+
# convert char to factor retaining order #4837
4+
fctr = function(x, levels=unique(x), ..., sort=FALSE, rev=FALSE) {
5+
if (!isTRUEorFALSE(sort))
6+
stopf("argument 'sort' must be TRUE or FALSE")
7+
if (!isTRUEorFALSE(rev))
8+
stopf("argument 'rev' must be TRUE or FALSE")
9+
if (sort) levels = sort(levels)
10+
if (rev) levels = frev(levels)
11+
factor(x, levels=levels, ...)
12+
}
13+
14+
# add a function for validating data.tables that might need setDT #7329
15+
.selfref.ok = function(x) {
16+
if (!is.data.table(x))
17+
stopf(".selfref.ok expects data.table class object.")
18+
selfrefok(x, verbose=FALSE) > 0L
19+
}

R/utils.R

Lines changed: 0 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -220,17 +220,6 @@ rss = function() { #5515 #5517
220220
# nocov end
221221
}
222222

223-
# convert char to factor retaining order #4837
224-
fctr = function(x, levels=unique(x), ..., sort=FALSE, rev=FALSE) {
225-
if (!isTRUEorFALSE(sort))
226-
stopf("argument 'sort' must be TRUE or FALSE")
227-
if (!isTRUEorFALSE(rev))
228-
stopf("argument 'rev' must be TRUE or FALSE")
229-
if (sort) levels = sort(levels)
230-
if (rev) levels = frev(levels)
231-
factor(x, levels=levels, ...)
232-
}
233-
234223
formula_vars = function(f, x) { # .formula2varlist is not API and seems to have appeared after R-4.2, #6841
235224
terms <- terms(f)
236225
setNames(

README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@
33

44
<!-- badges: start -->
55
[![CRAN status](https://badges.cranchecks.info/flavor/release/data.table.svg)](https://cran.r-project.org/web/checks/check_results_data.table.html)
6-
[![R-CMD-check](https://github.com/Rdatatable/data.table/workflows/R-CMD-check/badge.svg)](https://github.com/Rdatatable/data.table/actions)
6+
[![R-CMD-check](https://github.com/Rdatatable/data.table/actions/workflows/R-CMD-check.yaml/badge.svg?branch=master)](https://github.com/Rdatatable/data.table/actions)
77
[![Codecov test coverage](https://codecov.io/github/Rdatatable/data.table/coverage.svg?branch=master)](https://app.codecov.io/github/Rdatatable/data.table?branch=master)
88
[![GitLab CI build status](https://gitlab.com/Rdatatable/data.table/badges/master/pipeline.svg)](https://rdatatable.gitlab.io/data.table/web/checks/check_results_data.table.html)
99
[![downloads](https://cranlogs.r-pkg.org/badges/data.table)](https://www.rdocumentation.org/trends)

0 commit comments

Comments
 (0)