Skip to content

Commit d96467b

Browse files
authored
Merge branch 'master' into issue6964
2 parents 5c1d1d7 + ae2b815 commit d96467b

File tree

17 files changed

+310
-29
lines changed

17 files changed

+310
-29
lines changed

NAMESPACE

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -59,6 +59,7 @@ export(nafill)
5959
export(setnafill)
6060
export(.Last.updated)
6161
export(fcoalesce)
62+
export(cbindlist, setcbindlist)
6263
export(substitute2)
6364
#export(DT) # mtcars |> DT(i,j,by) #4872 #5472
6465
export(fctr)

NEWS.md

Lines changed: 7 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -40,6 +40,8 @@
4040
# 2: 2 6 4 5
4141
```
4242

43+
8. `groupingsets()` gets a new argument `enclos` for use together with the `jj` argument in functions wrapping `groupingsets()`, including the existing wrappers `rollup()` and `cube()`. When forwarding a `j`-expression as `groupingsets(jj = substitute(j))`, make sure to pass `enclos = parent.frame()` as well, so that the `j`-expression will be evaluated in the right context. This makes it possible for `j` to refer to variables outside the `data.table`.
44+
4345
### BUG FIXES
4446

4547
1. Custom binary operators from the `lubridate` package now work with objects of class `IDate` as with a `Date` subclass, [#6839](https://github.com/Rdatatable/data.table/issues/6839). Thanks @emallickhossain for the report and @aitap for the fix.
@@ -72,7 +74,11 @@
7274
7375
15. Including an `ITime` object as a named input to `data.frame()` respects the provided name, i.e. `data.frame(a = as.ITime(...))` will have column `a`, [#4673](https://github.com/Rdatatable/data.table/issues/4673). Thanks @shrektan for the report and @MichaelChirico for the fix.
7476
75-
16. `cube()`, `rollup()`, and `groupingsets()` no longer produce a spurious warning when using `min()` or `max()` as aggregations, [#6964](https://github.com/Rdatatable/data.table/issues/6964). Thanks @ferenci-tamas for the report and @venom1204 for the fix.
77+
16. `fread()` now handles the `na.strings` argument for quoted text columns, making it possible to specify `na.strings = '""'` and read empty quoted strings as `NA`s, [#6974](https://github.com/Rdatatable/data.table/issues/6974). Thanks to @AngelFelizR for the report and @aitap for the PR.
78+
79+
17. A data.table with a column of class `vctrs_list_of` (from package {vctrs}) prints as expected, [#5948](https://github.com/Rdatatable/data.table/issues/5948). Before, they could be printed messily, e.g. printing every entry in a nested data.frame. Thanks @jesse-smith for the report, @DavisVaughan and @r2evans for contributing, and @MichaelChirico for the PR.
80+
81+
18. `cube()`, `rollup()`, and `groupingsets()` no longer produce a spurious warning when using `min()` or `max()` as aggregations, [#6964](https://github.com/Rdatatable/data.table/issues/6964). Thanks @ferenci-tamas for the report and @venom1204 for the fix.
7682
7783
### NOTES
7884

R/groupingsets.R

Lines changed: 12 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -13,7 +13,7 @@ rollup.data.table = function(x, j, by, .SDcols, id = FALSE, label = NULL, ...) {
1313
sets = lapply(length(by):0L, function(i) by[0L:i])
1414
# redirect to workhorse function
1515
jj = substitute(j)
16-
groupingsets.data.table(x, by=by, sets=sets, .SDcols=.SDcols, id=id, jj=jj, label=label)
16+
groupingsets.data.table(x, by=by, sets=sets, .SDcols=.SDcols, id=id, jj=jj, label=label, enclos = parent.frame())
1717
}
1818

1919
cube = function(x, ...) {
@@ -35,13 +35,13 @@ cube.data.table = function(x, j, by, .SDcols, id = FALSE, label = NULL, ...) {
3535
sets = lapply((2L^n):1L, function(jj) by[keepBool[jj, ]])
3636
# redirect to workhorse function
3737
jj = substitute(j)
38-
groupingsets.data.table(x, by=by, sets=sets, .SDcols=.SDcols, id=id, jj=jj, label=label)
38+
groupingsets.data.table(x, by=by, sets=sets, .SDcols=.SDcols, id=id, jj=jj, label=label, enclos = parent.frame())
3939
}
4040

4141
groupingsets = function(x, ...) {
4242
UseMethod("groupingsets")
4343
}
44-
groupingsets.data.table = function(x, j, by, sets, .SDcols, id = FALSE, jj, label = NULL, ...) {
44+
groupingsets.data.table = function(x, j, by, sets, .SDcols, id = FALSE, jj, label = NULL, enclos = parent.frame(), ...) {
4545
# input data type basic validation
4646
if (!is.data.table(x))
4747
stopf("Argument 'x' must be a data.table object")
@@ -112,14 +112,10 @@ groupingsets.data.table = function(x, j, by, sets, .SDcols, id = FALSE, jj, labe
112112
.SDcols = if (".SD" %chin% av) setdiff(names(x), by) else NULL
113113
if (length(names(by))) by = unname(by)
114114
# 0 rows template data.table to keep colorder and type
115-
empty = withCallingHandlers(
116-
if (length(.SDcols)) x[0L, eval(jj), by, .SDcols=.SDcols] else x[0L, eval(jj), by],
117-
simpleWarning = function(w) {
118-
if (grepl("no non-missing arguments to min|max", w$message, ignore.case = TRUE)) {
119-
invokeRestart("muffleWarning")
120-
}
121-
}
122-
)
115+
# inline all arguments that might clash with enclosing environment
116+
pcall = substitute(x[0L, jj, by], list(x = x, jj = jj, by = by))
117+
if (length(.SDcols)) pcall$.SDcols = .SDcols
118+
empty = suppressWarnings(eval(pcall, list(.datatable.aware = TRUE), enclos))
123119
if (id && "grouping" %chin% names(empty)) # `j` could have been evaluated to `grouping` field
124120
stopf("When using `id=TRUE` the 'j' expression must not evaluate to a column named 'grouping'.")
125121
if (anyDuplicated(names(empty)) > 0L)
@@ -157,8 +153,12 @@ groupingsets.data.table = function(x, j, by, sets, .SDcols, id = FALSE, jj, labe
157153
stopf("Using integer64 class columns require to have 'bit64' package installed.") # nocov
158154
int64.by.cols = intersect(int64.cols, by)
159155
# aggregate function called for each grouping set
156+
# inline all arguments that might clash with enclosing environment
157+
pcall = substitute(x[, jj], list(x = x, jj = jj))
158+
if (length(.SDcols)) pcall$.SDcols = .SDcols
160159
aggregate.set = function(by.set) {
161-
r = if (length(.SDcols)) x[, eval(jj), by.set, .SDcols=.SDcols] else x[, eval(jj), by.set]
160+
pcall$by = by.set
161+
r = eval(pcall, list(.datatable.aware = TRUE), enclos)
162162
if (id) {
163163
# integer bit mask of aggregation levels: http://www.postgresql.org/docs/9.5/static/functions-aggregate.html#FUNCTIONS-GROUPING-TABLE
164164
# 3267: strtoi("", base = 2L) output apparently unstable across platforms

R/mergelist.R

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,12 @@
1+
cbindlist_impl_ = function(l, copy) {
2+
ans = .Call(Ccbindlist, l, copy)
3+
if (anyDuplicated(names(ans))) { ## invalidate key and index
4+
setattr(ans, "sorted", NULL)
5+
setattr(ans, "index", NULL)
6+
}
7+
setDT(ans)
8+
ans
9+
}
10+
11+
cbindlist = function(l) cbindlist_impl_(l, copy=TRUE)
12+
setcbindlist = function(l) cbindlist_impl_(l, copy=FALSE)

R/print.data.table.R

Lines changed: 0 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -199,8 +199,6 @@ has_format_method = function(x) {
199199
format_col.default = function(x, ...) {
200200
if (!is.null(dim(x)))
201201
"<multi-column>"
202-
else if (has_format_method(x) && length(formatted<-format(x, ...))==length(x))
203-
formatted #PR5224 motivated by package sf where column class is c("sfc_MULTIPOLYGON","sfc") and sf:::format.sfc exists
204202
else if (is.list(x))
205203
vapply_1c(x, format_list_item, ...)
206204
else

inst/tests/mergelist.Rraw

Lines changed: 71 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,71 @@
1+
require(methods)
2+
3+
if (exists("test.data.table", .GlobalEnv, inherits=FALSE)) {
4+
if ((tt<-compiler::enableJIT(-1))>0)
5+
cat("This is dev mode and JIT is enabled (level ", tt, ") so there will be a brief pause around the first test.\n", sep="")
6+
} else {
7+
require(data.table)
8+
test = data.table:::test
9+
}
10+
11+
# cbindlist, setcbindlist
12+
13+
local({
14+
l = list(
15+
d1 = data.table(x=1:3, v1=1L),
16+
d2 = data.table(y=3:1, v2=2L),
17+
d3 = data.table(z=2:4, v3=3L)
18+
)
19+
ans = cbindlist(l)
20+
expected = data.table(l$d1, l$d2, l$d3)
21+
test(11.01, ans, expected)
22+
test(11.02, intersect(vapply(ans, address, ""), unlist(lapply(l, vapply, address, ""))), character())
23+
ans = setcbindlist(l)
24+
expected = setDT(c(l$d1, l$d2, l$d3))
25+
test(11.03, ans, expected)
26+
test(11.04, length(intersect(vapply(ans, address, ""), unlist(lapply(l, vapply, address, "")))), ncol(expected))
27+
})
28+
29+
test(11.05, cbindlist(list(data.table(a=1L), data.table(), data.table(d=2L), data.table(f=3L))), data.table(a=1L, d=2L, f=3L))
30+
## codecov
31+
test(12.01, cbindlist(data.frame(a=1L)), error="must be a list")
32+
test(12.02, cbindlist(TRUE), error="must be a list")
33+
test(12.03, cbindlist(list(data.table(a=1L), 1L)), error="is not a data.table")
34+
test(12.04, options = c(datatable.verbose=TRUE), cbindlist(list(data.table(a=1:2), data.table(b=1:2))), data.table(a=1:2, b=1:2), output="cbindlist.*took")
35+
test(12.05, cbindlist(list(data.table(), data.table(a=1:2), data.table(b=1:2))), data.table(a=1:2, b=1:2))
36+
test(12.06, cbindlist(list(data.table(), data.table(a=1:2), list(b=1:2))), data.table(a=1:2, b=1:2))
37+
test(12.07, cbindlist(list(data.table(a=integer()), list(b=integer()))), data.table(a=integer(), b=integer()))
38+
## duplicated names
39+
test(12.08, cbindlist(list(data.table(a=1L, b=2L), data.table(b=3L, d=4L))), data.table(a=1L, b=2L, b=3L, d=4L))
40+
local({
41+
# also test that keys, indices are wiped
42+
ans = cbindlist(list(setindexv(data.table(a=2:1, b=1:2), "a"), data.table(a=1:2, b=2:1, key="a"), data.table(a=2:1, b=1:2)))
43+
test(12.09, ans, data.table(a=2:1, b=1:2, a=1:2, b=2:1, a=2:1, b=1:2))
44+
test(12.10, indices(ans), NULL)
45+
})
46+
## recycling, first ensure cbind recycling that we want to match to
47+
test(12.11, cbind(data.table(x=integer()), data.table(a=1:2)), data.table(x=c(NA_integer_, NA), a=1:2))
48+
test(12.12, cbind(data.table(x=1L), data.table(a=1:2)), data.table(x=c(1L, 1L), a=1:2))
49+
test(12.13, cbindlist(list(data.table(a=integer()), data.table(b=1:2))), error="Recycling.*not yet implemented")
50+
test(12.14, cbindlist(list(data.table(a=1L), data.table(b=1:2))), error="Recycling.*not yet implemented")
51+
test(12.15, setcbindlist(list(data.table(a=integer()), data.table(b=1:2))), error="have to have the same number of rows")
52+
test(12.16, setcbindlist(list(data.table(a=1L), data.table(b=1:2))), error="have to have the same number of rows")
53+
54+
## retain indices
55+
local({
56+
l = list(
57+
data.table(id1=1:5, id2=5:1, id3=1:5, v1=1:5),
58+
data.table(id4=5:1, id5=1:5, v2=1:5),
59+
data.table(id6=5:1, id7=1:5, v3=1:5),
60+
data.table(id8=5:1, id9=5:1, v4=1:5)
61+
)
62+
setkeyv(l[[1L]], "id1"); setindexv(l[[1L]], list("id1", "id2", "id3", c("id1","id2","id3"))); setindexv(l[[3L]], list("id6", "id7")); setindexv(l[[4L]], "id9")
63+
ii = lapply(l, indices)
64+
ans = cbindlist(l)
65+
test(13.1, key(ans), "id1")
66+
test(13.2, indices(ans), c("id1", "id2", "id3", "id1__id2__id3", "id6", "id7", "id9"))
67+
test(13.3, ii, lapply(l, indices)) ## this tests that original indices have not been touched, shallow_duplicate in mergeIndexAttrib
68+
})
69+
test(13.4, cbindlist(list(data.table(a=1:2), data.table(b=3:4, key="b"))), data.table(a=1:2, b=3:4, key="b"))
70+
# TODO(#7116): this could be supported
71+
# test(13.5, cbindlist(list(data.table(a=1:2, key="a"), data.table(b=3:4, key="b"))), data.table(a=1:2, b=3:4, key=c("a", "b")))

inst/tests/other.Rraw

Lines changed: 8 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
pkgs = c("ggplot2", "hexbin", "plyr", "dplyr", "caret", "zoo", "xts", "gdata", "nlme", "bit64", "knitr", "parallel", "sf", "nanotime", "R.utils", "yaml")
1+
pkgs = c("bit64", "caret", "dplyr", "gdata", "ggplot2", "hexbin", "knitr", "nanotime", "nlme", "parallel", "plyr", "R.utils", "sf", "vctrs", "xts", "yaml", "zoo")
22
# First expression of this file must be as above: .gitlab-ci.yml uses parse(,n=1L) to read one expression from this file and installs pkgs.
33
# So that these dependencies of other.Rraw are maintained in a single place.
44
# TEST_DATA_TABLE_WITH_OTHER_PACKAGES is off by default so this other.Rraw doesn't run on CRAN. It is run by GLCI, locally in dev, and by
@@ -221,6 +221,7 @@ test(14.2, {example('CJ', package='data.table', local=TRUE, echo=FALSE); TRUE})
221221
if (loaded[["sf"]]) { #2273
222222
DT = as.data.table(st_read(system.file("shape/nc.shp", package = "sf"), quiet=TRUE))
223223
test(15, DT[1:3, .(NAME, FIPS, geometry)], output="Ashe.*-81.4.*Surry.*-80.4")
224+
test(15.1, DT, output="MULTIPOLYGON (((") # make sure individual list items are formatted, #6637, #5224
224225

225226
dsf = sf::st_as_sf(data.table(x=1:10, y=1:10, s=sample(1:2, 10, TRUE)), coords=1:2)
226227
test(16, split(dsf, dsf$s), list(`1` = dsf[dsf$s == 1, ], `2` = dsf[dsf$s == 2, ]))
@@ -774,3 +775,9 @@ if (loaded[["nanotime"]]) {
774775
res <- tables(env=.e)
775776
test(32, res[, .(NAME,NROW,NCOL,MB)], data.table(NAME="DT",NROW=20000000L,NCOL=15L,MB=2288.0))
776777
rm(.e, res)
778+
779+
if (loaded[["vctrs"]]) {
780+
# vctrs::list_of() columns are treated the same as other list() columns
781+
DT = data.table(a = 1, b = list_of(mtcars))
782+
test(33, DT, output="<vctrs_list_of>.*<data\\.frame\\[32x11\\]>")
783+
}

inst/tests/tests.Rraw

Lines changed: 53 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -16800,15 +16800,15 @@ registerS3method("format_col", "complex", format_col.default)
1680016800
# then i) test 1610.1 fails if test.data.table() is rerun, ii) user display of complex data would be affected
1680116801
# did try wrapping with on.exit(,add=TRUE) but perhaps because this is a script that is sys.source'd, it ran straight away
1680216802

16803-
# format method for column takes predecedence over format method for each list item
16803+
# as of #6637, format individual list items but not the whole list
1680416804
registerS3method("format", "myclass2130", function(x, ...) paste0("<", class(x)[1L], ":", x$id, ">"))
1680516805
DT = data.table(row=1:2, objs=list(structure(list(id="foo"), class="myclass2130"), structure(list(id="bar"), class="myclass2130")))
1680616806
test(2130.13, print(DT), output="myclass2130:foo.*myclass2130:bar")
1680716807
setattr(DT$objs, "class", "foo2130")
1680816808
registerS3method("format", "foo2130", function(x, ...) "All hail foo")
1680916809
test(2130.14, print(DT), output="myclass2130:foo.*myclass2130:bar") # because length 1 from format but needs to be length(x)
1681016810
registerS3method("format", "foo2130", function(x, ...) rep("All hail foo",length(x)))
16811-
test(2130.15, print(DT), output="All hail foo") # e.g. sf:::format.sfc rather than sf:::format.sfg on each item
16811+
test(2130.15, print(DT), output="myclass2130:foo.*myclass2130:bar") # used to call format(column), not vapply_1c(column, format)
1681216812
setattr(DT$objs, "class", "bar2130_with_no_method")
1681316813
test(2130.16, print(DT), output="myclass2130:foo.*myclass2130:bar")
1681416814
registerS3method("format", "myclass2130", format.default)
@@ -21252,3 +21252,54 @@ it <- as.ITime('00:00:00')
2125221252
test(2323.1, names(data.frame(COL = it)), "COL")
2125321253
test(2323.2, names(data.frame(b = 1, COL = it)), c("b", "COL"))
2125421254
test(2323.3, names(as.data.frame(it, optional=TRUE)), NULL)
21255+
21256+
# 'sets' is a local variable in groupingsets(), cube(), rollup() and shouldn't leak into the 'j' expression
21257+
n = 24L
21258+
set.seed(25)
21259+
DT = data.table(
21260+
color = sample(c("green","yellow","red"), n, TRUE),
21261+
year = as.Date(sample(paste0(2011:2015,"-01-01"), n, TRUE)),
21262+
status = as.factor(sample(c("removed","active","inactive","archived"), n, TRUE)),
21263+
amount = sample(1:5, n, TRUE),
21264+
value = sample(c(3, 3.5, 2.5, 2), n, TRUE)
21265+
)
21266+
sets = 0
21267+
test(2324.0,
21268+
groupingsets(DT, j = c(list(count=.N + ..sets)), by = c("color","year","status"), sets = list("color", c("year","status"), character()), id=TRUE),
21269+
groupingsets(DT, j = c(list(count=.N + 0)), by = c("color","year","status"), sets = list("color", c("year","status"), character()), id=TRUE)
21270+
)
21271+
test(2324.1,
21272+
cube(DT, j = sum(value) + ..sets, by = c("color","year","status"), id=TRUE),
21273+
cube(DT, j = sum(value), by = c("color","year","status"), id=TRUE)
21274+
)
21275+
test(2324.2,
21276+
rollup(DT, j = sum(value) + ..sets, by=c("color","year","status"), label="total"),
21277+
rollup(DT, j = sum(value), by=c("color","year","status"), label="total")
21278+
)
21279+
21280+
# allow na.strings to be quoted, #6974
21281+
f = tempfile()
21282+
DT = data.table(
21283+
"Date Example"=c("12/5/2012", NA),
21284+
"Question 1"=c("Yes", NA),
21285+
"Question 2"=c("Yes", NA),
21286+
"Site: Country"=c("Chile", "Virgin Islands, British")
21287+
)
21288+
fwrite(DT, f, na='""')
21289+
test(2325.1, fread(f, na.strings='""'), DT)
21290+
unlink(f)
21291+
test(2325.2,
21292+
fread('"foo","bar","baz"\n"a","b","c"', na.strings=c('"foo"', '"bar"', '"baz"'), header=FALSE),
21293+
data.table(V1=c(NA, "a"), V2=c(NA, "b"), V3=c(NA, "c")))
21294+
21295+
## ensure setDT will retain key and indices when it is called on the list (cbindlist assumes this)
21296+
local({
21297+
d = data.table(x=1:2, y=2:1, z=2:1, v1=1:2)
21298+
setkeyv(d, "x"); setindexv(d, list("y", "z"))
21299+
a = attributes(d)
21300+
attributes(d) = a[!names(a) %in% c("class", ".internal.selfref", "row.names")]
21301+
test(2326.1, class(d), "list")
21302+
setDT(d)
21303+
test(2326.2, key(d), "x")
21304+
test(2326.3, indices(d), c("y", "z"))
21305+
})

man/cbindlist.Rd

Lines changed: 41 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,41 @@
1+
\name{cbindlist}
2+
\alias{cbindlist}
3+
\alias{setcbindlist}
4+
\alias{cbind}
5+
\alias{cbind.data.table}
6+
\title{Column bind multiple data.tables}
7+
\description{
8+
Column bind multiple \code{data.table}s.
9+
}
10+
\usage{
11+
cbindlist(l)
12+
setcbindlist(l)
13+
}
14+
\arguments{
15+
\item{l}{ \code{list} of \code{data.table}s to merge. }
16+
}
17+
\details{
18+
Column bind only stacks input elements. Works like \code{\link{data.table}}, but takes \code{list} type on input. Zero-column tables in \code{l} are omitted. Tables in \code{l} should have matching row count; recycling of length-1 rows is not yet implemented. Indices of the input tables are transferred to the resulting table, as well as the \emph{key} of the first keyed table.
19+
}
20+
\value{
21+
A new \code{data.table} based on the stacked objects.
22+
23+
For \code{setcbindlist}, columns in the output will be shared with the input, i.e., \emph{no copy is made}.
24+
}
25+
\note{
26+
No attempt is made to deduplicate resulting names. If the result has any duplicate names, keys and indices are removed.
27+
}
28+
\seealso{
29+
\code{\link{data.table}}, \code{\link{rbindlist}}, \code{\link{setDT}}
30+
}
31+
\examples{
32+
d1 = data.table(x=1:3, v1=1L, key="x")
33+
d2 = data.table(y=3:1, v2=2L, key="y")
34+
d3 = data.table(z=2:4, v3=3L)
35+
cbindlist(list(d1, d2, d3))
36+
cbindlist(list(d1, d1))
37+
d4 = setcbindlist(list(d1))
38+
d4[, v1:=2L]
39+
identical(d4, d1)
40+
}
41+
\keyword{ data }

man/fctr.Rd

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@
33
\alias{factor}
44
\title{Create a factor retaining original ordering}
55
\description{
6-
Creates a code{\link[base:factor]{factor}}.
6+
Creates a \code{\link[base]{factor}}.
77

88
By default, the output will have its levels in the original order, i.e., \code{levels = unique(x)}, as opposed to \code{factor}'s default where \code{levels = sort(unique(x))}.
99
}
@@ -13,7 +13,7 @@ fctr(x, levels=unique(x), ..., sort=FALSE, rev=FALSE)
1313
\arguments{
1414
\item{x}{ Object to be turned into a factor. }
1515
\item{levels}{ Levels for the new factor; \code{unique(x)} by default. }
16-
\item{\dots}{ Other arguments passed to code{\link[base:factor]{factor}}. }
16+
\item{\dots}{ Other arguments passed to \code{\link[base]{factor}}. }
1717
\item{sort}{ Logical, default \code{FALSE}. Should \code{levels} be sorted? }
1818
\item{rev}{ Logical, default \code{FALSE}. Should \code{levels} be reversed? Applied \emph{after} \code{sort}. }
1919
}

0 commit comments

Comments
 (0)