Skip to content

Commit 2675539

Browse files
Add an option for enabling new data.table(<1-column matrix>) auto-naming behavior (#7158)
* implement option, add regresion tests * regression test for dup-named case * fix for updated behavior * need option set for original tests * refine wording * item number fix * Regression test for having fixed #5367 * correct NEWS reference * include other fixed bugs in NEWS * fix test numbering
1 parent 67670e9 commit 2675539

File tree

5 files changed

+61
-14
lines changed

5 files changed

+61
-14
lines changed

NEWS.md

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,10 @@
44

55
## data.table [v1.17.99](https://github.com/Rdatatable/data.table/milestone/35) (in development)
66

7+
### NOTICE OF INTENDED FUTURE POTENTIAL BREAKING CHANGES
8+
9+
1. `data.table(x=1, <expr>)`, where `<expr>` is an expression resulting in a 1-column matrix without column names, will eventually have names `x` and `V2`, not `x` and `V1`, consistent with `data.table(x=1, <expr>)` where `<expr>` results in an atomic vector, for example `data.table(x=1, cbind(1))` and `data.table(x=1, 1)` will both have columns named `x` and `V2`. In this release, the matrix case continues to be named `V1`, but the new behavior can be activated by setting `options(datatable.old.matrix.autoname)` to `FALSE`. See point 5 under Bug Fixes for more context; this change will provide more internal consistency as well as more consistency with `data.frame()`.
10+
711
### NEW FEATURES
812

913
1. New `sort_by()` method for data.tables, [#6662](https://github.com/Rdatatable/data.table/issues/6662). It uses `forder()` to improve upon the data.frame method and also match `DT[order(...)]` behavior with respect to locale. Thanks @rikivillalba for the suggestion and PR.
@@ -62,7 +66,7 @@
6266
6367
4. In rare cases, `data.table` failed to expand ALTREP columns when assigning a full column by reference. This could result in the target column getting modified unintentionally if the next call to the data.table was a modification by reference of the source column. E.g. in `DT[, b := as.character(a)]` the string conversion gets deferred and subsequent modification of column `a` would also modify column `b`, [#5400](https://github.com/Rdatatable/data.table/issues/5400). Thanks to @aquasync for the report and Václav Tlapák for the PR.
6468
65-
5. `data.table()` function is now more aligned with `data.frame()` with respect to the names of the output when one of its inputs is a single-column matrix object, [#4124](https://github.com/Rdatatable/data.table/issues/4124). Thanks @PavoDive for the report, @jangorecki for the PR, and @MichaelChirico for a follow-up for back-compatibility.
69+
5. `data.table()` function is now more aligned with `data.frame()` with respect to the names of the output when one of its inputs is a single-column matrix object, [#4124](https://github.com/Rdatatable/data.table/issues/4124), [#3193](https://github.com/Rdatatable/data.table/issues/3193), and [#5367](https://github.com/Rdatatable/data.table/issues/5367). Thanks @PavoDive for the report, @jangorecki for the PR, and @MichaelChirico for a follow-up for back-compatibility.
6670
6771
6. Including an `ITime` object as a named input to `data.frame()` respects the provided name, i.e. `data.frame(a = as.ITime(...))` will have column `a`, [#4673](https://github.com/Rdatatable/data.table/issues/4673). Thanks @shrektan for the report and @MichaelChirico for the fix.
6872

R/as.data.table.R

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -50,7 +50,7 @@ as.data.table.matrix = function(x, keep.rownames=FALSE, key=NULL, ...) {
5050
ans = data.table(rn=rownames(x), x, keep.rownames=FALSE)
5151
# auto-inferred name 'x' is not back-compatible & inconsistent, #7145
5252
if (ncol(x) == 1L && is.null(colnames(x)))
53-
setnames(ans, 'x', 'V1')
53+
setnames(ans, 'x', 'V1', skip_absent=TRUE)
5454
if (is.character(keep.rownames))
5555
setnames(ans, 'rn', keep.rownames[1L])
5656
return(ans)
@@ -162,7 +162,7 @@ as.data.table.list = function(x,
162162
xi = x[[i]] = as.POSIXct(xi)
163163
} else if (is.matrix(xi) || is.data.frame(xi)) {
164164
if (!is.data.table(xi)) {
165-
if (is.matrix(xi) && NCOL(xi)<=1L && is.null(colnames(xi))) { # 1 column matrix naming #4124
165+
if (is.matrix(xi) && NCOL(xi)==1L && is.null(colnames(xi)) && isFALSE(getOption('datatable.old.matrix.autoname'))) { # 1 column matrix naming #4124
166166
xi = x[[i]] = c(xi)
167167
} else {
168168
xi = x[[i]] = as.data.table(xi, keep.rownames=keep.rownames) # we will never allow a matrix to be a column; always unpack the columns

R/onLoad.R

Lines changed: 7 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -73,7 +73,8 @@
7373
# In fread and fwrite we have moved back to using getOption's default argument since it is unlikely fread and fread will be called in a loop many times, plus they
7474
# are relatively heavy functions where the overhead in getOption() would not be noticed. It's only really [.data.table where getOption default bit.
7575
# Improvement to base::getOption() now submitted (100x; 5s down to 0.05s): https://bugs.r-project.org/bugzilla/show_bug.cgi?id=17394
76-
opts = c("datatable.verbose"="FALSE", # datatable.<argument name>
76+
opts = c(
77+
"datatable.verbose"="FALSE", # datatable.<argument name>
7778
"datatable.optimize"="Inf", # datatable.<argument name>
7879
"datatable.print.nrows"="100L", # datatable.<argument name>
7980
"datatable.print.topn"="5L", # datatable.<argument name>
@@ -85,12 +86,14 @@
8586
"datatable.show.indices"="FALSE", # for print.data.table
8687
"datatable.allow.cartesian"="FALSE", # datatable.<argument name>
8788
"datatable.join.many"="TRUE", # mergelist, [.data.table #4383 #914
88-
"datatable.dfdispatchwarn"="TRUE", # not a function argument
89-
"datatable.warnredundantby"="TRUE", # not a function argument
89+
"datatable.dfdispatchwarn"="TRUE", # not a function argument
90+
"datatable.warnredundantby"="TRUE", # not a function argument
9091
"datatable.alloccol"="1024L", # argument 'n' of alloc.col. Over-allocate 1024 spare column slots
9192
"datatable.auto.index"="TRUE", # DT[col=="val"] to auto add index so 2nd time faster
9293
"datatable.use.index"="TRUE", # global switch to address #1422
93-
"datatable.prettyprint.char" = NULL # FR #1091
94+
"datatable.prettyprint.char" = NULL, # FR #1091
95+
"datatable.old.matrix.autoname"="TRUE", # #7145: how data.table(x=1, matrix(1)) is auto-named set to change
96+
NULL
9497
)
9598
for (i in setdiff(names(opts),names(options()))) {
9699
eval(parse(text=paste0("options(",i,"=",opts[i],")")))

inst/tests/tests.Rraw

Lines changed: 38 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -21277,10 +21277,15 @@ if (test_R.utils) local({
2127721277
})
2127821278

2127921279
# Create a data.table when one vector is transposed doesn't respect the name defined by user #4124
21280-
test(2321.01, DT <- data.table(a=1:2, b=matrix(1:2)), data.table(a=1:2, b=1:2))
21281-
test(2321.02, names(DT), names(data.frame(a=1:2, b=matrix(1:2))))
21282-
test(2321.03, DT <- data.table(a=integer(), b=matrix(1L, nrow=0L, ncol=1L)), data.table(a=integer(), b=integer()))
21283-
test(2321.04, names(DT), names(data.frame(a=integer(), b=matrix(1L, nrow=0L, ncol=1L))))
21280+
local({
21281+
old = options(datatable.old.matrix.autoname=FALSE)
21282+
on.exit(options(old))
21283+
21284+
test(2321.01, DT <- data.table(a=1:2, b=matrix(1:2)), data.table(a=1:2, b=1:2))
21285+
test(2321.02, names(DT), names(data.frame(a=1:2, b=matrix(1:2))))
21286+
test(2321.03, DT <- data.table(a=integer(), b=matrix(1L, nrow=0L, ncol=1L)), data.table(a=integer(), b=integer()))
21287+
test(2321.04, names(DT), names(data.frame(a=integer(), b=matrix(1L, nrow=0L, ncol=1L))))
21288+
})
2128421289
## but respect named column vectors
2128521290
test(2321.05, DT <- data.table(a=1:2, cbind(b=3:4)), data.table(a=1:2, b=3:4))
2128621291
test(2321.06, names(DT), names(data.frame(a=1:2, cbind(b=3:4))))
@@ -21318,6 +21323,30 @@ colnames(M) = c('A', '')
2131821323
test(2321.26, as.data.table(M), data.table(A=1:3, V2=4:6))
2131921324
test(2321.27, as.data.table(M, keep.rownames='id'), data.table(id=c('a', 'b', 'c'), A=1:3, V2=4:6))
2132021325

21326+
# also respect old auto-naming rules by default (to be deprecated)
21327+
test(2321.28, names(data.table(a=1, cbind(2), c=3, 4)), c("a", "V1", "c", "V4"))
21328+
test(2321.29, names(data.table(cbind(1), cbind(2))), c("V1", "V1"))
21329+
# also test behavior with a 0-column matrix
21330+
M = cbind(1:3)
21331+
test(2321.30, data.table(M[, 0L]), data.table(NULL))
21332+
test(2321.31, data.table(a=1:3, M[, 0L]), data.table(a=1:3))
21333+
21334+
local({
21335+
old = options(datatable.old.matrix.autoname=FALSE)
21336+
on.exit(options(old))
21337+
21338+
test(2321.32, names(data.table(a=1, cbind(2), c=3, 4)), c("a", "V2", "c", "V4"))
21339+
# particularly buggy old behavior: can easily result in duplicate names
21340+
test(2321.33, names(data.table(cbind(1), cbind(2))), c("V1", "V2"))
21341+
M = cbind(1:3)
21342+
test(2321.34, data.table(M[, 0L]), data.table(NULL))
21343+
test(2321.35, data.table(a=1:3, M[, 0L]), data.table(a=1:3))
21344+
21345+
# a more subtle version of this as expressed in #5367
21346+
DT <- data.table(Counts=c(10, 20), Severity=c(1, 2))
21347+
test(2321.36, names(DT[,.(New_name = Severity %*% Counts)]), "New_name")
21348+
})
21349+
2132121350
# New fctr() helper: like factor() but retaining order by default #4837
2132221351
test(2322.01, levels(fctr(c("b","a","c"))), c("b","a","c"))
2132321352
test(2322.02, levels(fctr(c(3,1,2))), c("3","1","2"))
@@ -21423,10 +21452,12 @@ DF <- data.frame(row.names = letters[1:6], V = 1:6) # Test data.frame with e
2142321452
test(2330.6, as.data.table(list(a = 6:1, DF), keep.rownames=TRUE), data.table(rn=letters[1:6], a=6:1, V=1:6))
2142421453

2142521454
z <- setNames(1:3, rep("", 3)) # vector with all-empty names # behaviour with all-empty row names
21426-
test(2330.7, as.data.table(list(z), keep.rownames=TRUE), data.table(rn=rep("", 3), V1=1:3))
21455+
test(2330.7, as.data.table(list(z), keep.rownames=TRUE), data.table(rn="", V1=1:3))
2142721456

21428-
M <- matrix(1:6, nrow=3, dimnames=list(rep("", 3), c("V1", "V2"))) # test of list(M) for empty-rowname'd matrix input
21429-
test(2330.8, as.data.table(list(M), keep.rownames=TRUE), data.table(rn=rep("", 3), V1=1:3, V2=4:6))
21457+
M <- matrix(1:6, nrow=3, dimnames=list(rep("", 3L), c("V1", "V2"))) # test of list(M) for empty-rowname'd matrix input
21458+
test(2330.8, as.data.table(list(M), keep.rownames=TRUE), data.table(rn="", V1=1:3, V2=4:6))
21459+
# 0-column input can still provide rownames
21460+
test(2330.9, as.data.table(list(M[, 0L], 1:3), keep.rownames=TRUE), data.table(rn="", V2=1:3))
2143021461

2143121462
# .SD reference in '...' passed to lapply(FUN=) is recognized as data.table
2143221463
test(2331, lapply(list(data.table(a=1:2)), `[`, j=.SD[1L]), list(data.table(a=1L)))

man/data.table-options.Rd

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -108,6 +108,15 @@
108108
}
109109
}
110110

111+
\section{Back-compatibility Options}{
112+
\describe{
113+
\item{\code{datatable.old.matrix.autoname}}{Logical, default \code{TRUE}. Governs how the output of
114+
expressions like \code{data.table(x=1, cbind(1))} will be named. When \code{TRUE}, it will be named
115+
\code{V1}, otherwise it will be named \code{V2}.
116+
}
117+
}
118+
}
119+
111120
\seealso{
112121
\code{\link[base]{options}},
113122
\code{\link[base]{getOption}},

0 commit comments

Comments
 (0)