Skip to content

Commit 20be6f3

Browse files
authored
Merge branch 'master' into issue7171
2 parents f5bb7a7 + a080833 commit 20be6f3

File tree

9 files changed

+258
-197
lines changed

9 files changed

+258
-197
lines changed

NEWS.md

Lines changed: 6 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,10 @@
44

55
## data.table [v1.17.99](https://github.com/Rdatatable/data.table/milestone/35) (in development)
66

7+
### NOTICE OF INTENDED FUTURE POTENTIAL BREAKING CHANGES
8+
9+
1. `data.table(x=1, <expr>)`, where `<expr>` is an expression resulting in a 1-column matrix without column names, will eventually have names `x` and `V2`, not `x` and `V1`, consistent with `data.table(x=1, <expr>)` where `<expr>` results in an atomic vector, for example `data.table(x=1, cbind(1))` and `data.table(x=1, 1)` will both have columns named `x` and `V2`. In this release, the matrix case continues to be named `V1`, but the new behavior can be activated by setting `options(datatable.old.matrix.autoname)` to `FALSE`. See point 5 under Bug Fixes for more context; this change will provide more internal consistency as well as more consistency with `data.frame()`.
10+
711
### NEW FEATURES
812

913
1. New `sort_by()` method for data.tables, [#6662](https://github.com/Rdatatable/data.table/issues/6662). It uses `forder()` to improve upon the data.frame method and also match `DT[order(...)]` behavior with respect to locale. Thanks @rikivillalba for the suggestion and PR.
@@ -94,7 +98,7 @@
9498
9599
4. In rare cases, `data.table` failed to expand ALTREP columns when assigning a full column by reference. This could result in the target column getting modified unintentionally if the next call to the data.table was a modification by reference of the source column. E.g. in `DT[, b := as.character(a)]` the string conversion gets deferred and subsequent modification of column `a` would also modify column `b`, [#5400](https://github.com/Rdatatable/data.table/issues/5400). Thanks to @aquasync for the report and Václav Tlapák for the PR.
96100
97-
5. `data.table()` function is now more aligned with `data.frame()` with respect to the names of the output when one of its inputs is a single-column matrix object, [#4124](https://github.com/Rdatatable/data.table/issues/4124). Thanks @PavoDive for the report, @jangorecki for the PR, and @MichaelChirico for a follow-up for back-compatibility.
101+
5. `data.table()` function is now more aligned with `data.frame()` with respect to the names of the output when one of its inputs is a single-column matrix object, [#4124](https://github.com/Rdatatable/data.table/issues/4124), [#3193](https://github.com/Rdatatable/data.table/issues/3193), and [#5367](https://github.com/Rdatatable/data.table/issues/5367). Thanks @PavoDive for the report, @jangorecki for the PR, and @MichaelChirico for a follow-up for back-compatibility.
98102
99103
6. Including an `ITime` object as a named input to `data.frame()` respects the provided name, i.e. `data.frame(a = as.ITime(...))` will have column `a`, [#4673](https://github.com/Rdatatable/data.table/issues/4673). Thanks @shrektan for the report and @MichaelChirico for the fix.
100104
@@ -129,6 +133,7 @@
129133
+ On non-Windows systems, `fread()` now prints the reason why the file couldn't be opened, which could also be due to it being too large to map.
130134
+ With `verbose=TRUE`, file sizes are now printed using correct binary SI prefixes (the sizes have always been reported as bytes denominated in powers of `2^10`, so e.g. `1024*1024` bytes was reported as `1 MB` where `1 MiB` or `1.05 MB` is correct).
131135
136+
4. The default `format_list_item()` method (and hence `print.data.table()`) annotates truncated list items with their length, [#605](https://github.com/Rdatatable/data.table/issues/605). Thanks Matt Dowle for the original report (2012!) and @MichaelChirico for the fix.
132137
133138
# data.table [v1.17.8](https://github.com/Rdatatable/data.table/milestone/41) (6 July 2025)
134139

R/as.data.table.R

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -50,7 +50,7 @@ as.data.table.matrix = function(x, keep.rownames=FALSE, key=NULL, ...) {
5050
ans = data.table(rn=rownames(x), x, keep.rownames=FALSE)
5151
# auto-inferred name 'x' is not back-compatible & inconsistent, #7145
5252
if (ncol(x) == 1L && is.null(colnames(x)))
53-
setnames(ans, 'x', 'V1')
53+
setnames(ans, 'x', 'V1', skip_absent=TRUE)
5454
if (is.character(keep.rownames))
5555
setnames(ans, 'rn', keep.rownames[1L])
5656
return(ans)
@@ -162,7 +162,7 @@ as.data.table.list = function(x,
162162
xi = x[[i]] = as.POSIXct(xi)
163163
} else if (is.matrix(xi) || is.data.frame(xi)) {
164164
if (!is.data.table(xi)) {
165-
if (is.matrix(xi) && NCOL(xi)<=1L && is.null(colnames(xi))) { # 1 column matrix naming #4124
165+
if (is.matrix(xi) && NCOL(xi)==1L && is.null(colnames(xi)) && isFALSE(getOption('datatable.old.matrix.autoname'))) { # 1 column matrix naming #4124
166166
xi = x[[i]] = c(xi)
167167
} else {
168168
xi = x[[i]] = as.data.table(xi, keep.rownames=keep.rownames) # we will never allow a matrix to be a column; always unpack the columns

R/onLoad.R

Lines changed: 7 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -73,7 +73,8 @@
7373
# In fread and fwrite we have moved back to using getOption's default argument since it is unlikely fread and fread will be called in a loop many times, plus they
7474
# are relatively heavy functions where the overhead in getOption() would not be noticed. It's only really [.data.table where getOption default bit.
7575
# Improvement to base::getOption() now submitted (100x; 5s down to 0.05s): https://bugs.r-project.org/bugzilla/show_bug.cgi?id=17394
76-
opts = c("datatable.verbose"="FALSE", # datatable.<argument name>
76+
opts = c(
77+
"datatable.verbose"="FALSE", # datatable.<argument name>
7778
"datatable.optimize"="Inf", # datatable.<argument name>
7879
"datatable.print.nrows"="100L", # datatable.<argument name>
7980
"datatable.print.topn"="5L", # datatable.<argument name>
@@ -85,12 +86,14 @@
8586
"datatable.show.indices"="FALSE", # for print.data.table
8687
"datatable.allow.cartesian"="FALSE", # datatable.<argument name>
8788
"datatable.join.many"="TRUE", # mergelist, [.data.table #4383 #914
88-
"datatable.dfdispatchwarn"="TRUE", # not a function argument
89-
"datatable.warnredundantby"="TRUE", # not a function argument
89+
"datatable.dfdispatchwarn"="TRUE", # not a function argument
90+
"datatable.warnredundantby"="TRUE", # not a function argument
9091
"datatable.alloccol"="1024L", # argument 'n' of alloc.col. Over-allocate 1024 spare column slots
9192
"datatable.auto.index"="TRUE", # DT[col=="val"] to auto add index so 2nd time faster
9293
"datatable.use.index"="TRUE", # global switch to address #1422
93-
"datatable.prettyprint.char" = NULL # FR #1091
94+
"datatable.prettyprint.char" = NULL, # FR #1091
95+
"datatable.old.matrix.autoname"="TRUE", # #7145: how data.table(x=1, matrix(1)) is auto-named set to change
96+
NULL
9497
)
9598
for (i in setdiff(names(opts),names(options()))) {
9699
eval(parse(text=paste0("options(",i,"=",opts[i],")")))

R/print.data.table.R

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -227,7 +227,7 @@ format_list_item.default = function(x, ...) {
227227
if (is.null(x)) # NULL item in a list column
228228
"[NULL]" # not '' or 'NULL' to distinguish from those "common" string values in data
229229
else if (is.atomic(x) || inherits(x, "formula")) # FR #2591 - format.data.table issue with columns of class "formula"
230-
paste(c(format(head(x, 6L), ...), if (length(x) > 6L) "..."), collapse=",") # fix for #5435 and #37 - format has to be added here...
230+
paste(c(format(head(x, 6L), ...), if (length(x) > 6L) sprintf("...[%d]", length(x))), collapse=",") # fix for #5435, #37, and #605 - format has to be added here...
231231
else if (has_format_method(x) && length(formatted<-format(x, ...))==1L) {
232232
# the column's class does not have a format method (otherwise it would have been used by format_col and this
233233
# format_list_item would not be reached) but this particular list item does have a format method so use it

inst/tests/benchmark.Rraw

Lines changed: 18 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -190,20 +190,24 @@ DT = data.table(A=1:10,B=rnorm(10),C=paste("a",1:100010,sep=""))
190190
test(301.1, nrow(DT[,sum(B),by=C])==100010)
191191

192192
# Test := by key, and that := to the key by key unsets the key. Make it non-trivial in size too.
193-
options(datatable.optimize=0L)
194-
set.seed(1)
195-
DT = data.table(a=sample(1:100,1e6,replace=TRUE),b=sample(1:1000,1e6,replace=TRUE),key="a")
196-
test(637.1, DT[,m:=sum(b),by=a][1:3], data.table(a=1L,b=c(156L,808L,848L),m=DT[J(1),sum(b)],key="a"))
197-
test(637.2, key(DT[J(43L),a:=99L]), NULL)
198-
setkey(DT,a)
199-
test(637.3, key(DT[,a:=99L,by=a]), NULL)
200-
options(datatable.optimize=2L)
201-
set.seed(1)
202-
DT = data.table(a=sample(1:100,1e6,replace=TRUE),b=sample(1:1000,1e6,replace=TRUE),key="a")
203-
test(638.1, DT[,m:=sum(b),by=a][1:3], data.table(a=1L,b=c(156L,808L,848L),m=DT[J(1),sum(b)],key="a"))
204-
test(638.2, key(DT[J(43L),a:=99L]), NULL)
205-
setkey(DT,a)
206-
test(638.3, key(DT[,a:=99L,by=a]), NULL)
193+
local({
194+
old = options(datatable.optimize=0L); on.exit(options(old))
195+
set.seed(1)
196+
DT = data.table(a=sample(1:100, 1e6, replace=TRUE), b=sample(1:1000, 1e6, replace=TRUE), key="a")
197+
test(637.1, DT[, m:=sum(b), by=a][1:3], data.table(a=1L, b=c(156L, 808L, 848L), m=DT[J(1), sum(b)], key="a"))
198+
test(637.2, key(DT[J(43L), a:=99L]), NULL)
199+
setkey(DT, a)
200+
test(637.3, key(DT[, a:=99L, by=a]), NULL)
201+
})
202+
local({
203+
options(datatable.optimize=2L); on.exit(options(old))
204+
set.seed(1)
205+
DT = data.table(a=sample(1:100, 1e6, replace=TRUE), b=sample(1:1000, 1e6, replace=TRUE), key="a")
206+
test(638.1, DT[, m:=sum(b), by=a][1:3], data.table(a=1L, b=c(156L, 808L, 848L), m=DT[J(1), sum(b)], key="a"))
207+
test(638.2, key(DT[J(43L), a:=99L]), NULL)
208+
setkey(DT,a)
209+
test(638.3, key(DT[, a:=99L, by=a]), NULL)
210+
})
207211

208212
# Test X[Y] slowdown, #2216
209213
# Many minutes in 1.8.2! Now well under 1s, but 10s for very wide tolerance for CRAN. We'd like CRAN to tell us if any changes

inst/tests/nafill.Rraw

Lines changed: 64 additions & 63 deletions
Original file line numberDiff line numberDiff line change
@@ -160,14 +160,15 @@ names(dt) <- NULL
160160
test(4.36, colnamesInt(dt, "a"), error="has no names")
161161

162162
# verbose
163-
dt = data.table(a=c(1L, 2L, NA_integer_), b=c(1, 2, NA_real_))
164-
old=options(datatable.verbose=TRUE)
165-
test(5.01, nafill(dt, "locf"), output="nafillInteger: took.*nafillDouble: took.*nafillR.*took")
166-
test(5.02, setnafill(dt, "locf"), output="nafillInteger: took.*nafillDouble: took.*nafillR.*took")
167-
if (test_bit64) {
168-
test(5.03, nafill(as.integer64(c(NA,2,NA,3)), "locf"), as.integer64(c(NA,2,2,3)), output="nafillInteger64: took.*nafillR.*took")
169-
}
170-
options(old)
163+
local({
164+
dt = data.table(a=c(1L, 2L, NA_integer_), b=c(1, 2, NA_real_))
165+
old = options(datatable.verbose=TRUE); on.exit(options(old))
166+
test(5.01, nafill(dt, "locf"), output="nafillInteger: took.*nafillDouble: took.*nafillR.*took")
167+
test(5.02, setnafill(dt, "locf"), output="nafillInteger: took.*nafillDouble: took.*nafillR.*took")
168+
if (test_bit64) {
169+
test(5.03, nafill(as.integer64(c(NA,2,NA,3)), "locf"), as.integer64(c(NA,2,2,3)), output="nafillInteger64: took.*nafillR.*took")
170+
}
171+
})
171172

172173
# coerceAs int/numeric/int64 as used in nafill
173174
if (test_bit64) {
@@ -250,59 +251,61 @@ if (test_bit64) {
250251
}
251252

252253
# coerceAs verbose
253-
options(datatable.verbose=2L)
254-
input = 1
255-
# use levels= explicitly to avoid locale-related sorting of letters
256-
xy_factor = factor(c("x", "y"), levels=c("x", "y"))
257-
test(10.01, ans<-coerceAs(input, 1), 1, output="double[numeric] into double[numeric]")
258-
test(10.02, address(input)!=address(ans))
259-
test(10.03, ans<-coerceAs(input, 1, copy=FALSE), 1, output="copy=false and input already of expected type and class double[numeric]")
260-
test(10.04, address(input), address(ans))
261-
test(10.05, ans<-coerceAs(input, 1L), 1L, output="double[numeric] into integer[integer]")
262-
test(10.06, address(input)!=address(ans))
263-
test(10.07, ans<-coerceAs(input, 1L, copy=FALSE), 1L, output="double[numeric] into integer[integer]", notOutput="copy=false")
264-
test(10.08, address(input)!=address(ans))
265-
test(10.09, coerceAs("1", 1L), 1L, output="character[character] into integer[integer]", warning="Coercing.*character.*integer")
266-
test(10.10, coerceAs("1", 1), 1, output="character[character] into double[numeric]", warning="Coercing.*character.*double")
267-
test(10.11, coerceAs("a", factor("x")), factor("a", levels=c("x","a")), output="character[character] into integer[factor]") ## levels of 'as' are retained!
268-
test(10.12, coerceAs("a", factor()), factor("a"), output="character[character] into integer[factor]")
269-
test(10.13, coerceAs(1, factor("x")), factor("x"), output="double[numeric] into integer[factor]")
270-
test(10.14, coerceAs(1, factor("x", levels=c("x","y"))), factor("x", levels=c("x","y")), output="double[numeric] into integer[factor]")
271-
test(10.15, coerceAs(2, factor("x", levels=c("x","y"))), factor("y", levels=c("x","y")), output="double[numeric] into integer[factor]")
272-
test(10.16, coerceAs(1:2, xy_factor), xy_factor, output="integer[integer] into integer[factor]")
273-
test(10.17, coerceAs(1:3, xy_factor), output="integer[integer] into integer[factor]", error="factor numbers.*3 is outside the level range")
274-
test(10.18, coerceAs(c(1,2,3), xy_factor), output="double[numeric] into integer[factor]", error="factor numbers.*3.000000 is outside the level range")
275-
test(10.19, coerceAs(factor("x"), xy_factor), factor("x", levels=c("x","y")), output="integer[factor] into integer[factor]")
276-
test(10.20, coerceAs(factor("x"), xy_factor, copy=FALSE), factor("x", levels=c("x","y")), output="input already of expected type and class") ## copy=F has copyMostAttrib
277-
a = structure("a", class="a")
278-
b = structure("b", class="b")
279-
test(10.21, coerceAs(a, b), structure("a", class="b"), output="character[a] into character[b]")
280-
a = structure(1L, class="a")
281-
b = structure(2L, class="b")
282-
test(10.22, coerceAs(a, b), structure(1L, class="b"), output="integer[a] into integer[b]")
283-
a = structure(1, class="a")
284-
b = structure(2, class="b")
285-
test(10.23, coerceAs(a, b), structure(1, class="b"), output="double[a] into double[b]")
286-
a = structure(1, class="a")
287-
b = structure(2L, class="b")
288-
test(10.24, coerceAs(a, b), structure(1L, class="b"), output="double[a] into integer[b]")
289-
if (test_bit64) {
290-
x = as.integer64(1L)
291-
test(10.81, coerceAs(x, 1), 1, output="double[integer64] into double[numeric]")
292-
test(10.82, coerceAs(x, 1L), 1L, output="double[integer64] into integer[integer]")
293-
test(10.83, coerceAs(x, "1"), "1", output="double[integer64] into character[character]")
294-
test(10.84, coerceAs(1, x), x, output="double[numeric] into double[integer64]")
295-
test(10.85, coerceAs(1L, x), x, output="integer[integer] into double[integer64]")
296-
test(10.86, coerceAs("1", x), x, output="character[character] into double[integer64]", warning="Coercing.*character")
297-
options(datatable.verbose=3L)
298-
test(10.87, coerceAs(x, 1L), 1L, output=c("double[integer64] into integer[integer]","Zero-copy coerce when assigning 'integer64' to 'integer'"))
299-
test(10.88, coerceAs(1L, x), x, output=c("integer[integer] into double[integer64]","Zero-copy coerce when assigning 'integer' to 'integer64'"))
300-
options(datatable.verbose=2L)
301-
test(10.89, coerceAs(-2147483649, x), as.integer64(-2147483649), output="double[numeric] into double[integer64]")
302-
}
303-
# 10.91 tested nanotime moved to other.Rraw 27.21, #6139
254+
local({
255+
old = options(datatable.verbose=2L); on.exit(options(old))
256+
input = 1
257+
# use levels= explicitly to avoid locale-related sorting of letters
258+
xy_factor = factor(c("x", "y"), levels=c("x", "y"))
259+
test(10.01, ans<-coerceAs(input, 1), 1, output="double[numeric] into double[numeric]")
260+
test(10.02, address(input)!=address(ans))
261+
test(10.03, ans<-coerceAs(input, 1, copy=FALSE), 1, output="copy=false and input already of expected type and class double[numeric]")
262+
test(10.04, address(input), address(ans))
263+
test(10.05, ans<-coerceAs(input, 1L), 1L, output="double[numeric] into integer[integer]")
264+
test(10.06, address(input)!=address(ans))
265+
test(10.07, ans<-coerceAs(input, 1L, copy=FALSE), 1L, output="double[numeric] into integer[integer]", notOutput="copy=false")
266+
test(10.08, address(input)!=address(ans))
267+
test(10.09, coerceAs("1", 1L), 1L, output="character[character] into integer[integer]", warning="Coercing.*character.*integer")
268+
test(10.10, coerceAs("1", 1), 1, output="character[character] into double[numeric]", warning="Coercing.*character.*double")
269+
test(10.11, coerceAs("a", factor("x")), factor("a", levels=c("x","a")), output="character[character] into integer[factor]") ## levels of 'as' are retained!
270+
test(10.12, coerceAs("a", factor()), factor("a"), output="character[character] into integer[factor]")
271+
test(10.13, coerceAs(1, factor("x")), factor("x"), output="double[numeric] into integer[factor]")
272+
test(10.14, coerceAs(1, factor("x", levels=c("x","y"))), factor("x", levels=c("x","y")), output="double[numeric] into integer[factor]")
273+
test(10.15, coerceAs(2, factor("x", levels=c("x","y"))), factor("y", levels=c("x","y")), output="double[numeric] into integer[factor]")
274+
test(10.16, coerceAs(1:2, xy_factor), xy_factor, output="integer[integer] into integer[factor]")
275+
test(10.17, coerceAs(1:3, xy_factor), output="integer[integer] into integer[factor]", error="factor numbers.*3 is outside the level range")
276+
test(10.18, coerceAs(c(1,2,3), xy_factor), output="double[numeric] into integer[factor]", error="factor numbers.*3.000000 is outside the level range")
277+
test(10.19, coerceAs(factor("x"), xy_factor), factor("x", levels=c("x","y")), output="integer[factor] into integer[factor]")
278+
test(10.20, coerceAs(factor("x"), xy_factor, copy=FALSE), factor("x", levels=c("x","y")), output="input already of expected type and class") ## copy=F has copyMostAttrib
279+
a = structure("a", class="a")
280+
b = structure("b", class="b")
281+
test(10.21, coerceAs(a, b), structure("a", class="b"), output="character[a] into character[b]")
282+
a = structure(1L, class="a")
283+
b = structure(2L, class="b")
284+
test(10.22, coerceAs(a, b), structure(1L, class="b"), output="integer[a] into integer[b]")
285+
a = structure(1, class="a")
286+
b = structure(2, class="b")
287+
test(10.23, coerceAs(a, b), structure(1, class="b"), output="double[a] into double[b]")
288+
a = structure(1, class="a")
289+
b = structure(2L, class="b")
290+
test(10.24, coerceAs(a, b), structure(1L, class="b"), output="double[a] into integer[b]")
291+
if (test_bit64) {
292+
x = as.integer64(1L)
293+
test(10.81, coerceAs(x, 1), 1, output="double[integer64] into double[numeric]")
294+
test(10.82, coerceAs(x, 1L), 1L, output="double[integer64] into integer[integer]")
295+
test(10.83, coerceAs(x, "1"), "1", output="double[integer64] into character[character]")
296+
test(10.84, coerceAs(1, x), x, output="double[numeric] into double[integer64]")
297+
test(10.85, coerceAs(1L, x), x, output="integer[integer] into double[integer64]")
298+
test(10.86, coerceAs("1", x), x, output="character[character] into double[integer64]", warning="Coercing.*character")
299+
test(10.87, options=c(datatable.verbose=3L),
300+
coerceAs(x, 1L), 1L, output=c("double[integer64] into integer[integer]", "Zero-copy coerce when assigning 'integer64' to 'integer'"))
301+
test(10.88, options=c(datatable.verbose=3L),
302+
coerceAs(1L, x), x, output=c("integer[integer] into double[integer64]", "Zero-copy coerce when assigning 'integer' to 'integer64'"))
303+
test(10.89, options=c(datatable.verbose=2L),
304+
coerceAs(-2147483649, x), as.integer64(-2147483649), output="double[numeric] into double[integer64]")
305+
}
306+
# 10.91 tested nanotime moved to other.Rraw 27.21, #6139
307+
})
304308

305-
options(datatable.verbose=FALSE)
306309
test(11.01, coerceAs(list(a=1), 1), error="is not atomic")
307310
test(11.02, coerceAs(1, list(a=1)), list(1))
308311
test(11.03, coerceAs(sum, 1), error="is not atomic")
@@ -328,6 +331,4 @@ test(11.09, coerceAs(1L, a), error="must not be matrix or array")
328331
test(99.1, data.table(a=1,b=2)[1,1, verbose=1], error="verbose must be logical or integer")
329332
test(99.2, data.table(a=1,b=2)[1,1, verbose=1:2], error="verbose must be length 1 non-NA")
330333
test(99.3, data.table(a=1,b=2)[1,1, verbose=NA], error="verbose must be length 1 non-NA")
331-
options(datatable.verbose=1)
332-
test(99.4, coerceAs(1, 2L), error="verbose option must be length 1 non-NA logical or integer")
333-
options(datatable.verbose=FALSE)
334+
test(99.4, options=c(datatable.verbose=1), coerceAs(1, 2L), error="verbose option must be length 1 non-NA logical or integer")

0 commit comments

Comments
 (0)