Merge branch 'master' into issue7171

venom1204 · web-flow · commit 20be6f3e829f · 2025-07-15T16:22:25.000+05:30
diff --git a/NEWS.md b/NEWS.md
@@ -4,6 +4,10 @@
 
 ## data.table [v1.17.99](https://github.com/Rdatatable/data.table/milestone/35)  (in development)
 
+### NOTICE OF INTENDED FUTURE POTENTIAL BREAKING CHANGES 
+
+1. `data.table(x=1, <expr>)`, where `<expr>` is an expression resulting in a 1-column matrix without column names, will eventually have names `x` and `V2`, not `x` and `V1`, consistent with `data.table(x=1, <expr>)` where `<expr>` results in an atomic vector, for example `data.table(x=1, cbind(1))` and `data.table(x=1, 1)` will both have columns named `x` and `V2`. In this release, the matrix case continues to be named `V1`, but the new behavior can be activated by setting `options(datatable.old.matrix.autoname)` to `FALSE`. See point 5 under Bug Fixes for more context; this change will provide more internal consistency as well as more consistency with `data.frame()`.
+
 ### NEW FEATURES
 
 1. New `sort_by()` method for data.tables, [#6662](https://github.com/Rdatatable/data.table/issues/6662). It uses `forder()` to improve upon the data.frame method and also match `DT[order(...)]` behavior with respect to locale. Thanks @rikivillalba for the suggestion and PR.
@@ -94,7 +98,7 @@
 
 4. In rare cases, `data.table` failed to expand ALTREP columns when assigning a full column by reference. This could result in the target column getting modified unintentionally if the next call to the data.table was a modification by reference of the source column. E.g. in `DT[, b := as.character(a)]` the string conversion gets deferred and subsequent modification of column `a` would also modify column `b`, [#5400](https://github.com/Rdatatable/data.table/issues/5400). Thanks to @aquasync for the report and Václav Tlapák for the PR.
 
-5. `data.table()` function is now more aligned with `data.frame()` with respect to the names of the output when one of its inputs is a single-column matrix object, [#4124](https://github.com/Rdatatable/data.table/issues/4124). Thanks @PavoDive for the report, @jangorecki for the PR, and @MichaelChirico for a follow-up for back-compatibility.
+5. `data.table()` function is now more aligned with `data.frame()` with respect to the names of the output when one of its inputs is a single-column matrix object, [#4124](https://github.com/Rdatatable/data.table/issues/4124), [#3193](https://github.com/Rdatatable/data.table/issues/3193), and [#5367](https://github.com/Rdatatable/data.table/issues/5367). Thanks @PavoDive for the report, @jangorecki for the PR, and @MichaelChirico for a follow-up for back-compatibility.
 
 6. Including an `ITime` object as a named input to `data.frame()` respects the provided name, i.e. `data.frame(a = as.ITime(...))` will have column `a`, [#4673](https://github.com/Rdatatable/data.table/issues/4673). Thanks @shrektan for the report and @MichaelChirico for the fix.
 
@@ -129,6 +133,7 @@
    + On non-Windows systems, `fread()` now prints the reason why the file couldn't be opened, which could also be due to it being too large to map.
    + With `verbose=TRUE`, file sizes are now printed using correct binary SI prefixes (the sizes have always been reported as bytes denominated in powers of `2^10`, so e.g. `1024*1024` bytes was reported as `1 MB` where `1 MiB` or `1.05 MB` is correct).
 
+4. The default `format_list_item()` method (and hence `print.data.table()`) annotates truncated list items with their length, [#605](https://github.com/Rdatatable/data.table/issues/605). Thanks Matt Dowle for the original report (2012!) and @MichaelChirico for the fix.
 
 # data.table [v1.17.8](https://github.com/Rdatatable/data.table/milestone/41) (6 July 2025)
 
diff --git a/R/as.data.table.R b/R/as.data.table.R
@@ -50,7 +50,7 @@ as.data.table.matrix = function(x, keep.rownames=FALSE, key=NULL, ...) {
     ans = data.table(rn=rownames(x), x, keep.rownames=FALSE)
     # auto-inferred name 'x' is not back-compatible & inconsistent, #7145
     if (ncol(x) == 1L && is.null(colnames(x)))
-      setnames(ans, 'x', 'V1')
+      setnames(ans, 'x', 'V1', skip_absent=TRUE)
     if (is.character(keep.rownames))
       setnames(ans, 'rn', keep.rownames[1L])
     return(ans)
@@ -162,7 +162,7 @@ as.data.table.list = function(x,
       xi = x[[i]] = as.POSIXct(xi)
     } else if (is.matrix(xi) || is.data.frame(xi)) {
       if (!is.data.table(xi)) {
-        if (is.matrix(xi) && NCOL(xi)<=1L && is.null(colnames(xi))) { # 1 column matrix naming #4124
+        if (is.matrix(xi) && NCOL(xi)==1L && is.null(colnames(xi)) && isFALSE(getOption('datatable.old.matrix.autoname'))) { # 1 column matrix naming #4124
           xi = x[[i]] = c(xi)
         } else {
           xi = x[[i]] = as.data.table(xi, keep.rownames=keep.rownames)  # we will never allow a matrix to be a column; always unpack the columns
diff --git a/R/onLoad.R b/R/onLoad.R
@@ -73,7 +73,8 @@
   # In fread and fwrite we have moved back to using getOption's default argument since it is unlikely fread and fread will be called in a loop many times, plus they
   # are relatively heavy functions where the overhead in getOption() would not be noticed.  It's only really [.data.table where getOption default bit.
   # Improvement to base::getOption() now submitted (100x; 5s down to 0.05s):  https://bugs.r-project.org/bugzilla/show_bug.cgi?id=17394
-  opts = c("datatable.verbose"="FALSE",        # datatable.<argument name>
+  opts = c(
+       "datatable.verbose"="FALSE",            # datatable.<argument name>
        "datatable.optimize"="Inf",             # datatable.<argument name>
        "datatable.print.nrows"="100L",         # datatable.<argument name>
        "datatable.print.topn"="5L",            # datatable.<argument name>
@@ -85,12 +86,14 @@
        "datatable.show.indices"="FALSE",       # for print.data.table
        "datatable.allow.cartesian"="FALSE",    # datatable.<argument name>
        "datatable.join.many"="TRUE",           # mergelist, [.data.table #4383 #914
-       "datatable.dfdispatchwarn"="TRUE",                   # not a function argument
-       "datatable.warnredundantby"="TRUE",                  # not a function argument
+       "datatable.dfdispatchwarn"="TRUE",      # not a function argument
+       "datatable.warnredundantby"="TRUE",     # not a function argument
        "datatable.alloccol"="1024L",           # argument 'n' of alloc.col. Over-allocate 1024 spare column slots
        "datatable.auto.index"="TRUE",          # DT[col=="val"] to auto add index so 2nd time faster
        "datatable.use.index"="TRUE",           # global switch to address #1422
-       "datatable.prettyprint.char" = NULL     # FR #1091
+       "datatable.prettyprint.char" = NULL,    # FR #1091
+       "datatable.old.matrix.autoname"="TRUE", # #7145: how data.table(x=1, matrix(1)) is auto-named set to change
+       NULL
        )
   for (i in setdiff(names(opts),names(options()))) {
     eval(parse(text=paste0("options(",i,"=",opts[i],")")))
diff --git a/R/print.data.table.R b/R/print.data.table.R
@@ -227,7 +227,7 @@ format_list_item.default = function(x, ...) {
   if (is.null(x))  # NULL item in a list column
     "[NULL]" # not '' or 'NULL' to distinguish from those "common" string values in data
   else if (is.atomic(x) || inherits(x, "formula")) # FR #2591 - format.data.table issue with columns of class "formula"
-    paste(c(format(head(x, 6L), ...), if (length(x) > 6L) "..."), collapse=",") # fix for #5435 and #37 - format has to be added here...
+    paste(c(format(head(x, 6L), ...), if (length(x) > 6L) sprintf("...[%d]", length(x))), collapse=",") # fix for #5435, #37, and #605 - format has to be added here...
   else if (has_format_method(x) && length(formatted<-format(x, ...))==1L) {
     # the column's class does not have a format method (otherwise it would have been used by format_col and this
     # format_list_item would not be reached) but this particular list item does have a format method so use it
diff --git a/inst/tests/benchmark.Rraw b/inst/tests/benchmark.Rraw
@@ -190,20 +190,24 @@ DT = data.table(A=1:10,B=rnorm(10),C=paste("a",1:100010,sep=""))
 test(301.1, nrow(DT[,sum(B),by=C])==100010)
 
 # Test := by key, and that := to the key by key unsets the key. Make it non-trivial in size too.
-options(datatable.optimize=0L)
-set.seed(1)
-DT = data.table(a=sample(1:100,1e6,replace=TRUE),b=sample(1:1000,1e6,replace=TRUE),key="a")
-test(637.1, DT[,m:=sum(b),by=a][1:3], data.table(a=1L,b=c(156L,808L,848L),m=DT[J(1),sum(b)],key="a"))
-test(637.2, key(DT[J(43L),a:=99L]), NULL)
-setkey(DT,a)
-test(637.3, key(DT[,a:=99L,by=a]), NULL)
-options(datatable.optimize=2L)
-set.seed(1)
-DT = data.table(a=sample(1:100,1e6,replace=TRUE),b=sample(1:1000,1e6,replace=TRUE),key="a")
-test(638.1, DT[,m:=sum(b),by=a][1:3], data.table(a=1L,b=c(156L,808L,848L),m=DT[J(1),sum(b)],key="a"))
-test(638.2, key(DT[J(43L),a:=99L]), NULL)
-setkey(DT,a)
-test(638.3, key(DT[,a:=99L,by=a]), NULL)
+local({
+  old = options(datatable.optimize=0L); on.exit(options(old))
+  set.seed(1)
+  DT = data.table(a=sample(1:100, 1e6, replace=TRUE), b=sample(1:1000, 1e6, replace=TRUE), key="a")
+  test(637.1, DT[, m:=sum(b), by=a][1:3], data.table(a=1L, b=c(156L, 808L, 848L), m=DT[J(1), sum(b)], key="a"))
+  test(637.2, key(DT[J(43L), a:=99L]), NULL)
+  setkey(DT, a)
+  test(637.3, key(DT[, a:=99L, by=a]), NULL)
+})
+local({
+  options(datatable.optimize=2L); on.exit(options(old))
+  set.seed(1)
+  DT = data.table(a=sample(1:100, 1e6, replace=TRUE), b=sample(1:1000, 1e6, replace=TRUE), key="a")
+  test(638.1, DT[, m:=sum(b), by=a][1:3], data.table(a=1L, b=c(156L, 808L, 848L), m=DT[J(1), sum(b)], key="a"))
+  test(638.2, key(DT[J(43L), a:=99L]), NULL)
+  setkey(DT,a)
+  test(638.3, key(DT[, a:=99L, by=a]), NULL)
+})
 
 # Test X[Y] slowdown, #2216
 # Many minutes in 1.8.2!  Now well under 1s, but 10s for very wide tolerance for CRAN. We'd like CRAN to tell us if any changes
diff --git a/inst/tests/nafill.Rraw b/inst/tests/nafill.Rraw
@@ -160,14 +160,15 @@ names(dt) <- NULL
 test(4.36, colnamesInt(dt, "a"), error="has no names")
 
 # verbose
-dt = data.table(a=c(1L, 2L, NA_integer_), b=c(1, 2, NA_real_))
-old=options(datatable.verbose=TRUE)
-test(5.01, nafill(dt, "locf"), output="nafillInteger: took.*nafillDouble: took.*nafillR.*took")
-test(5.02, setnafill(dt, "locf"), output="nafillInteger: took.*nafillDouble: took.*nafillR.*took")
-if (test_bit64) {
-  test(5.03, nafill(as.integer64(c(NA,2,NA,3)), "locf"), as.integer64(c(NA,2,2,3)), output="nafillInteger64: took.*nafillR.*took")
-}
-options(old)
+local({
+  dt = data.table(a=c(1L, 2L, NA_integer_), b=c(1, 2, NA_real_))
+  old = options(datatable.verbose=TRUE); on.exit(options(old))
+  test(5.01, nafill(dt, "locf"), output="nafillInteger: took.*nafillDouble: took.*nafillR.*took")
+  test(5.02, setnafill(dt, "locf"), output="nafillInteger: took.*nafillDouble: took.*nafillR.*took")
+  if (test_bit64) {
+    test(5.03, nafill(as.integer64(c(NA,2,NA,3)), "locf"), as.integer64(c(NA,2,2,3)), output="nafillInteger64: took.*nafillR.*took")
+  }
+})
 
 # coerceAs int/numeric/int64 as used in nafill
 if (test_bit64) {
@@ -250,59 +251,61 @@ if (test_bit64) {
 }
 
 # coerceAs verbose
-options(datatable.verbose=2L)
-input = 1
-# use levels= explicitly to avoid locale-related sorting of letters
-xy_factor = factor(c("x", "y"), levels=c("x", "y"))
-test(10.01, ans<-coerceAs(input, 1), 1, output="double[numeric] into double[numeric]")
-test(10.02, address(input)!=address(ans))
-test(10.03, ans<-coerceAs(input, 1, copy=FALSE), 1, output="copy=false and input already of expected type and class double[numeric]")
-test(10.04, address(input), address(ans))
-test(10.05, ans<-coerceAs(input, 1L), 1L, output="double[numeric] into integer[integer]")
-test(10.06, address(input)!=address(ans))
-test(10.07, ans<-coerceAs(input, 1L, copy=FALSE), 1L, output="double[numeric] into integer[integer]", notOutput="copy=false")
-test(10.08, address(input)!=address(ans))
-test(10.09, coerceAs("1", 1L), 1L, output="character[character] into integer[integer]", warning="Coercing.*character.*integer")
-test(10.10, coerceAs("1", 1), 1, output="character[character] into double[numeric]", warning="Coercing.*character.*double")
-test(10.11, coerceAs("a", factor("x")), factor("a", levels=c("x","a")), output="character[character] into integer[factor]") ## levels of 'as' are retained!
-test(10.12, coerceAs("a", factor()), factor("a"), output="character[character] into integer[factor]")
-test(10.13, coerceAs(1, factor("x")), factor("x"), output="double[numeric] into integer[factor]")
-test(10.14, coerceAs(1, factor("x", levels=c("x","y"))), factor("x", levels=c("x","y")), output="double[numeric] into integer[factor]")
-test(10.15, coerceAs(2, factor("x", levels=c("x","y"))), factor("y", levels=c("x","y")), output="double[numeric] into integer[factor]")
-test(10.16, coerceAs(1:2, xy_factor), xy_factor, output="integer[integer] into integer[factor]")
-test(10.17, coerceAs(1:3, xy_factor), output="integer[integer] into integer[factor]", error="factor numbers.*3 is outside the level range")
-test(10.18, coerceAs(c(1,2,3), xy_factor), output="double[numeric] into integer[factor]", error="factor numbers.*3.000000 is outside the level range")
-test(10.19, coerceAs(factor("x"), xy_factor), factor("x", levels=c("x","y")), output="integer[factor] into integer[factor]")
-test(10.20, coerceAs(factor("x"), xy_factor, copy=FALSE), factor("x", levels=c("x","y")), output="input already of expected type and class") ## copy=F has copyMostAttrib
-a = structure("a", class="a")
-b = structure("b", class="b")
-test(10.21, coerceAs(a, b), structure("a", class="b"), output="character[a] into character[b]")
-a = structure(1L, class="a")
-b = structure(2L, class="b")
-test(10.22, coerceAs(a, b), structure(1L, class="b"), output="integer[a] into integer[b]")
-a = structure(1, class="a")
-b = structure(2, class="b")
-test(10.23, coerceAs(a, b), structure(1, class="b"), output="double[a] into double[b]")
-a = structure(1, class="a")
-b = structure(2L, class="b")
-test(10.24, coerceAs(a, b), structure(1L, class="b"), output="double[a] into integer[b]")
-if (test_bit64) {
-  x = as.integer64(1L)
-  test(10.81, coerceAs(x, 1), 1, output="double[integer64] into double[numeric]")
-  test(10.82, coerceAs(x, 1L), 1L, output="double[integer64] into integer[integer]")
-  test(10.83, coerceAs(x, "1"), "1", output="double[integer64] into character[character]")
-  test(10.84, coerceAs(1, x), x, output="double[numeric] into double[integer64]")
-  test(10.85, coerceAs(1L, x), x, output="integer[integer] into double[integer64]")
-  test(10.86, coerceAs("1", x), x, output="character[character] into double[integer64]", warning="Coercing.*character")
-  options(datatable.verbose=3L)
-  test(10.87, coerceAs(x, 1L), 1L, output=c("double[integer64] into integer[integer]","Zero-copy coerce when assigning 'integer64' to 'integer'"))
-  test(10.88, coerceAs(1L, x), x, output=c("integer[integer] into double[integer64]","Zero-copy coerce when assigning 'integer' to 'integer64'"))
-  options(datatable.verbose=2L)
-  test(10.89, coerceAs(-2147483649, x), as.integer64(-2147483649), output="double[numeric] into double[integer64]")
-}
-# 10.91 tested nanotime moved to other.Rraw 27.21, #6139
+local({
+  old = options(datatable.verbose=2L); on.exit(options(old))
+  input = 1
+  # use levels= explicitly to avoid locale-related sorting of letters
+  xy_factor = factor(c("x", "y"), levels=c("x", "y"))
+  test(10.01, ans<-coerceAs(input, 1), 1, output="double[numeric] into double[numeric]")
+  test(10.02, address(input)!=address(ans))
+  test(10.03, ans<-coerceAs(input, 1, copy=FALSE), 1, output="copy=false and input already of expected type and class double[numeric]")
+  test(10.04, address(input), address(ans))
+  test(10.05, ans<-coerceAs(input, 1L), 1L, output="double[numeric] into integer[integer]")
+  test(10.06, address(input)!=address(ans))
+  test(10.07, ans<-coerceAs(input, 1L, copy=FALSE), 1L, output="double[numeric] into integer[integer]", notOutput="copy=false")
+  test(10.08, address(input)!=address(ans))
+  test(10.09, coerceAs("1", 1L), 1L, output="character[character] into integer[integer]", warning="Coercing.*character.*integer")
+  test(10.10, coerceAs("1", 1), 1, output="character[character] into double[numeric]", warning="Coercing.*character.*double")
+  test(10.11, coerceAs("a", factor("x")), factor("a", levels=c("x","a")), output="character[character] into integer[factor]") ## levels of 'as' are retained!
+  test(10.12, coerceAs("a", factor()), factor("a"), output="character[character] into integer[factor]")
+  test(10.13, coerceAs(1, factor("x")), factor("x"), output="double[numeric] into integer[factor]")
+  test(10.14, coerceAs(1, factor("x", levels=c("x","y"))), factor("x", levels=c("x","y")), output="double[numeric] into integer[factor]")
+  test(10.15, coerceAs(2, factor("x", levels=c("x","y"))), factor("y", levels=c("x","y")), output="double[numeric] into integer[factor]")
+  test(10.16, coerceAs(1:2, xy_factor), xy_factor, output="integer[integer] into integer[factor]")
+  test(10.17, coerceAs(1:3, xy_factor), output="integer[integer] into integer[factor]", error="factor numbers.*3 is outside the level range")
+  test(10.18, coerceAs(c(1,2,3), xy_factor), output="double[numeric] into integer[factor]", error="factor numbers.*3.000000 is outside the level range")
+  test(10.19, coerceAs(factor("x"), xy_factor), factor("x", levels=c("x","y")), output="integer[factor] into integer[factor]")
+  test(10.20, coerceAs(factor("x"), xy_factor, copy=FALSE), factor("x", levels=c("x","y")), output="input already of expected type and class") ## copy=F has copyMostAttrib
+  a = structure("a", class="a")
+  b = structure("b", class="b")
+  test(10.21, coerceAs(a, b), structure("a", class="b"), output="character[a] into character[b]")
+  a = structure(1L, class="a")
+  b = structure(2L, class="b")
+  test(10.22, coerceAs(a, b), structure(1L, class="b"), output="integer[a] into integer[b]")
+  a = structure(1, class="a")
+  b = structure(2, class="b")
+  test(10.23, coerceAs(a, b), structure(1, class="b"), output="double[a] into double[b]")
+  a = structure(1, class="a")
+  b = structure(2L, class="b")
+  test(10.24, coerceAs(a, b), structure(1L, class="b"), output="double[a] into integer[b]")
+  if (test_bit64) {
+    x = as.integer64(1L)
+    test(10.81, coerceAs(x, 1), 1, output="double[integer64] into double[numeric]")
+    test(10.82, coerceAs(x, 1L), 1L, output="double[integer64] into integer[integer]")
+    test(10.83, coerceAs(x, "1"), "1", output="double[integer64] into character[character]")
+    test(10.84, coerceAs(1, x), x, output="double[numeric] into double[integer64]")
+    test(10.85, coerceAs(1L, x), x, output="integer[integer] into double[integer64]")
+    test(10.86, coerceAs("1", x), x, output="character[character] into double[integer64]", warning="Coercing.*character")
+    test(10.87, options=c(datatable.verbose=3L),
+         coerceAs(x, 1L), 1L, output=c("double[integer64] into integer[integer]", "Zero-copy coerce when assigning 'integer64' to 'integer'"))
+    test(10.88, options=c(datatable.verbose=3L),
+         coerceAs(1L, x), x, output=c("integer[integer] into double[integer64]", "Zero-copy coerce when assigning 'integer' to 'integer64'"))
+    test(10.89, options=c(datatable.verbose=2L),
+         coerceAs(-2147483649, x), as.integer64(-2147483649), output="double[numeric] into double[integer64]")
+  }
+  # 10.91 tested nanotime moved to other.Rraw 27.21, #6139
+})
 
-options(datatable.verbose=FALSE)
 test(11.01, coerceAs(list(a=1), 1), error="is not atomic")
 test(11.02, coerceAs(1, list(a=1)), list(1))
 test(11.03, coerceAs(sum, 1), error="is not atomic")
@@ -328,6 +331,4 @@ test(11.09, coerceAs(1L, a), error="must not be matrix or array")
 test(99.1, data.table(a=1,b=2)[1,1, verbose=1], error="verbose must be logical or integer")
 test(99.2, data.table(a=1,b=2)[1,1, verbose=1:2], error="verbose must be length 1 non-NA")
 test(99.3, data.table(a=1,b=2)[1,1, verbose=NA], error="verbose must be length 1 non-NA")
-options(datatable.verbose=1)
-test(99.4, coerceAs(1, 2L), error="verbose option must be length 1 non-NA logical or integer")
-options(datatable.verbose=FALSE)
+test(99.4, options=c(datatable.verbose=1), coerceAs(1, 2L), error="verbose option must be length 1 non-NA logical or integer")
diff --git a/inst/tests/other.Rraw b/inst/tests/other.Rraw
diff --git a/inst/tests/tests.Rraw b/inst/tests/tests.Rraw
diff --git a/man/data.table-options.Rd b/man/data.table-options.Rd