Rdatatable
diff --git a/‎.gitlab-ci.yml‎
Lines changed: 1 addition & 0 deletions b/‎.gitlab-ci.yml‎
Lines changed: 1 addition & 0 deletions
diff --git a/‎DESCRIPTION‎
Lines changed: 3 additions & 1 deletion b/‎DESCRIPTION‎
Lines changed: 3 additions & 1 deletion
diff --git a/‎NAMESPACE‎
Lines changed: 2 additions & 0 deletions b/‎NAMESPACE‎
Lines changed: 2 additions & 0 deletions
diff --git a/‎NEWS.md‎
Lines changed: 17 additions & 5 deletions b/‎NEWS.md‎
Lines changed: 17 additions & 5 deletions
diff --git a/‎R/bmerge.R‎
Lines changed: 90 additions & 54 deletions b/‎R/bmerge.R‎
Lines changed: 90 additions & 54 deletions
diff --git a/‎R/data.table.R‎
Lines changed: 3 additions & 2 deletions b/‎R/data.table.R‎
Lines changed: 3 additions & 2 deletions
diff --git a/‎R/onLoad.R‎
Lines changed: 9 additions & 0 deletions b/‎R/onLoad.R‎
Lines changed: 9 additions & 0 deletions
@@ -194,6 +194,7 @@ test-lin-dev-clang-cran:
     - echo 'CFLAGS=-g -O2 -fno-common -Wall -Wvla -pedantic -fstack-protector-strong -D_FORTIFY_SOURCE=2' > ~/.R/Makevars
     - echo 'CXXFLAGS=-g -O2 -fno-common -Wall -Wvla -pedantic -fstack-protector-strong -D_FORTIFY_SOURCE=2' >> ~/.R/Makevars
     - *install-deps
+    - clang-tidy -extra-arg=-I/usr/local/lib/R/include -checks='readability-inconsistent-declaration-parameter' src/*.c -- -std=c99
     - R CMD check --as-cran $(ls -1t data.table_*.tar.gz | head -n 1)
     - (! grep "warning:" data.table.Rcheck/00install.out)
     - >-
 
@@ -98,5 +98,7 @@ Authors@R: c(
   person("Christian", "Wia",       role="ctb"),
   person("Elise", "Maigné",        role="ctb"),
   person("Vincent", "Rocher",      role="ctb"),
-  person("Vijay", "Lulla",         role="ctb")
+  person("Vijay", "Lulla",         role="ctb"),
+  person("Aljaž", "Sluga",         role="ctb"),
+  person("Bill", "Evans",          role="ctb")
   )
@@ -104,6 +104,7 @@ if (getRversion() >= "4.0.0") {
   # version of R (and that is checked in .onLoad with error if not).
   export(.rbind.data.table) # only export in R<4.0.0 where it is still used; R-devel now detects it is missing doc, #5600
 }
+if (getRversion() >= "3.6.0") S3method(knitr::knit_print, data.table) # else manual delayed registration from the onLoad hook
 S3method(dim, data.table)
 S3method(dimnames, data.table)
 S3method("dimnames<-", data.table)
@@ -201,6 +202,7 @@ S3method(format_col, POSIXct)
 S3method(format_col, expression)
 export(format_list_item)
 S3method(format_list_item, default)
+S3method(format_list_item, data.frame)
 
 export(fdroplevels, setdroplevels)
 S3method(droplevels, data.table)
@@ -65,6 +65,8 @@ rowwiseDT(
 
 4. `patterns()` in `melt()` combines correctly with user-defined `cols=`, which can be useful to specify a subset of columns to reshape without having to use a regex, for example `patterns("2", cols=c("y1", "y2"))` will only give `y2` even if there are other columns in the input matching `2`, [#6498](https://github.com/Rdatatable/data.table/issues/6498). Thanks to @hongyuanjia for the report, and to @tdhock for the PR.
 
+5. `setcolorder()` gains `skip_absent` to ignore unrecognized columns (i.e. columns included in `neworder` but not present in the data), [#6044,  #6068](https://github.com/Rdatatable/data.table/pull/6044). Default behavior (`skip_absent=FALSE`) remains unchanged, i.e. unrecognized columns result in an error. Thanks to @sluga for the suggestion and @sluga & @Nj221102 for the PRs.
+
 ## BUG FIXES
 
 1. `fwrite()` respects `dec=','` for timestamp columns (`POSIXct` or `nanotime`) with sub-second accuracy, [#6446](https://github.com/Rdatatable/data.table/issues/6446). Thanks @kav2k for pointing out the inconsistency and @MichaelChirico for the PR.
@@ -107,17 +109,27 @@ rowwiseDT(
 
 11. `tables()` now returns the correct size for data.tables over 2GiB, [#6607](https://github.com/Rdatatable/data.table/issues/6607). Thanks to @vlulla for the report and the PR.
 
+12. Joins on multiple columns, such as `x[y, on=c("x1==y1", "x2==y1")]`, could fail during implicit type coercions if `x1` and `x2` had different but still compatible types, [#6602](https://github.com/Rdatatable/data.table/issues/6602). This was particularly unexpected when columns `x1`, `x2`, and `y1` were all of the same class, e.g. `Date`, but differed in their underlying storage types. Thanks to Benjamin Schwendinger for the report and the fix.
+
+13. `rbindlist(l, use.names=TRUE)` can now handle different encodings for the column names in different entries of `l`, [#5452](https://github.com/Rdatatable/data.table/issues/5452). Thanks to @MEO265 for the report, and Benjamin Schwendinger for the fix.
+
+14. Added a `data.frame` method for `format_list_item()` to fix error printing data.tables with columns containing 1-column data.frames, [#6592](https://github.com/Rdatatable/data.table/issues/6592). Thanks to @r2evans for the bug report and fix.
+
+15. The auto-printing suppression in `knitr` documents is now done by implementing a method for `knit_print` instead of looking up the call stack, [#6589](https://github.com/Rdatatable/data.table/pull/6589). Thanks to @jangorecki for the report [#6509](https://github.com/Rdatatable/data.table/issues/6509) and @aitap for the fix.
+
 ## NOTES
 
-1. Tests run again when some Suggests packages are missing, [#6411](https://github.com/Rdatatable/data.table/issues/6411). Thanks @aadler for the note and @MichaelChirico for the fix.
+1. There is a new vignette on joins! See `vignette("datatable-joins")`. Thanks to Angel Feliz for authoring it! Feedback welcome. This vignette has been highly requested since 2017: [#2181](https://github.com/Rdatatable/data.table/issues/2181).
+
+2. Tests run again when some Suggests packages are missing, [#6411](https://github.com/Rdatatable/data.table/issues/6411). Thanks @aadler for the note and @MichaelChirico for the fix.
 
-2. Some grouping operations run much faster under `verbose=TRUE`, [#6286](https://github.com/Rdatatable/data.table/issues/6286). Thanks @joshhwuu for the report and fix. This overhead was not present on Windows. As a rule, users should expect `verbose=TRUE` operations to run more slowly, as extra statistics might be calculated as part of the report; here was a case where the overhead was particularly high and the fix was particularly easy.
+3. Some grouping operations run much faster under `verbose=TRUE`, [#6286](https://github.com/Rdatatable/data.table/issues/6286). Thanks @joshhwuu for the report and fix. This overhead was not present on Windows. As a rule, users should expect `verbose=TRUE` operations to run more slowly, as extra statistics might be calculated as part of the report; here was a case where the overhead was particularly high and the fix was particularly easy.
 
-3. `set()` and `:=` now provide some extra guidance for common incorrect approaches to assigning `NULL` to some rows of a list column. The correct way is to put `list(list(NULL))` on the RHS of `:=` (or `.(.(NULL))` for short). Thanks to @MichaelChirico for the suggestion and @Nj221102 for the implementation.
+4. `set()` and `:=` now provide some extra guidance for common incorrect approaches to assigning `NULL` to some rows of a list column. The correct way is to put `list(list(NULL))` on the RHS of `:=` (or `.(.(NULL))` for short). Thanks to @MichaelChirico for the suggestion and @Nj221102 for the implementation.
 
-4. Improved the error message when trying to write code like `DT[, ":="(a := b, c := d)]` (which should be `DT[, ":="(a = b, c = d)]`), [#5296](https://github.com/Rdatatable/data.table/issues/5296). Thanks @MichaelChirico for the suggestion & fix.
+5. Improved the error message when trying to write code like `DT[, ":="(a := b, c := d)]` (which should be `DT[, ":="(a = b, c = d)]`), [#5296](https://github.com/Rdatatable/data.table/issues/5296). Thanks @MichaelChirico for the suggestion & fix.
 
-5. `measurev()` was implemented and documented in v1.15.0, for use within `melt()`, and it is now exported (dependent packages can now use without a NOTE from CRAN check).
+6. `measurev()` was implemented and documented in v1.15.0, for use within `melt()`, and it is now exported (dependent packages can now use without a NOTE from CRAN check).
 
 # data.table [v1.16.2](https://github.com/Rdatatable/data.table/milestone/35) (9 October 2024)
 
 
@@ -1,4 +1,25 @@
 
+
+mergeType = function(x) {
+  ans = typeof(x)
+  if      (ans=="integer") { if (is.factor(x))             ans = "factor"    }
+  else if (ans=="double")  { if (inherits(x, "integer64")) ans = "integer64" }
+  # do not call isReallyReal(x) yet because i) if both types are double we don't need to coerce even if one or both sides
+  # are int-as-double, and ii) to save calling it until we really need it
+  ans
+}
+
+cast_with_atts = function(x, as.f) {
+  ans = as.f(x)
+  if (!is.null(attributes(x))) attributes(ans) = attributes(x)
+  ans
+}
+
+coerce_col = function(dt, col, from_type, to_type, from_name, to_name, verbose_msg=NULL) {
+  if (!is.null(verbose_msg)) catf(verbose_msg, from_type, from_name, to_type, to_name, domain=NULL)
+  set(dt, j=col, value=cast_with_atts(dt[[col]], match.fun(paste0("as.", to_type))))
+}
+
 bmerge = function(i, x, icols, xcols, roll, rollends, nomatch, mult, ops, verbose)
 {
   callersi = i
@@ -25,95 +46,110 @@ bmerge = function(i, x, icols, xcols, roll, rollends, nomatch, mult, ops, verbos
 
   supported = c(ORDERING_TYPES, "factor", "integer64")
 
-  getClass = function(x) {
-    ans = typeof(x)
-    if      (ans=="integer") { if (is.factor(x))             ans = "factor"    }
-    else if (ans=="double")  { if (inherits(x, "integer64")) ans = "integer64" }
-    # do not call isReallyReal(x) yet because i) if both types are double we don't need to coerce even if one or both sides
-    # are int-as-double, and ii) to save calling it until we really need it
-    ans
-  }
-
   if (nrow(i)) for (a in seq_along(icols)) {
     # - check that join columns have compatible types
     # - do type coercions if necessary on just the shallow local copies for the purpose of join
     # - handle factor columns appropriately
     # Note that if i is keyed, if this coerces i's key gets dropped by set()
-    ic = icols[a]
-    xc = xcols[a]
-    xclass = getClass(x[[xc]])
-    iclass = getClass(i[[ic]])
-    xname = paste0("x.", names(x)[xc])
-    iname = paste0("i.", names(i)[ic])
-    if (!xclass %chin% supported) stopf("%s is type %s which is not supported by data.table join", xname, xclass)
-    if (!iclass %chin% supported) stopf("%s is type %s which is not supported by data.table join", iname, iclass)
-    if (xclass=="factor" || iclass=="factor") {
+    icol = icols[a]
+    xcol = xcols[a]
+    x_merge_type = mergeType(x[[xcol]])
+    i_merge_type = mergeType(i[[icol]])
+    xname = paste0("x.", names(x)[xcol])
+    iname = paste0("i.", names(i)[icol])
+    if (!x_merge_type %chin% supported) stopf("%s is type %s which is not supported by data.table join", xname, x_merge_type)
+    if (!i_merge_type %chin% supported) stopf("%s is type %s which is not supported by data.table join", iname, i_merge_type)
+    if (x_merge_type=="factor" || i_merge_type=="factor") {
       if (roll!=0.0 && a==length(icols))
         stopf("Attempting roll join on factor column when joining %s to %s. Only integer, double or character columns may be roll joined.", xname, iname)
-      if (xclass=="factor" && iclass=="factor") {
+      if (x_merge_type=="factor" && i_merge_type=="factor") {
         if (verbose) catf("Matching %s factor levels to %s factor levels.\n", iname, xname)
-        set(i, j=ic, value=chmatch(levels(i[[ic]]), levels(x[[xc]]), nomatch=0L)[i[[ic]]])  # nomatch=0L otherwise a level that is missing would match to NA values
+        set(i, j=icol, value=chmatch(levels(i[[icol]]), levels(x[[xcol]]), nomatch=0L)[i[[icol]]])  # nomatch=0L otherwise a level that is missing would match to NA values
         next
       } else {
-        if (xclass=="character") {
+        if (x_merge_type=="character") {
           if (verbose) catf("Coercing factor column %s to type character to match type of %s.\n", iname, xname)
-          set(i, j=ic, value=val<-as.character(i[[ic]]))
-          set(callersi, j=ic, value=val)  # factor in i joining to character in x will return character and not keep x's factor; e.g. for antaresRead #3581
+          set(i, j=icol, value=val<-as.character(i[[icol]]))
+          set(callersi, j=icol, value=val)  # factor in i joining to character in x will return character and not keep x's factor; e.g. for antaresRead #3581
           next
-        } else if (iclass=="character") {
+        } else if (i_merge_type=="character") {
           if (verbose) catf("Matching character column %s to factor levels in %s.\n", iname, xname)
-          newvalue = chmatch(i[[ic]], levels(x[[xc]]), nomatch=0L)
-          if (anyNA(i[[ic]])) newvalue[is.na(i[[ic]])] = NA_integer_  # NA_character_ should match to NA in factor, #3809
-          set(i, j=ic, value=newvalue)
+          newvalue = chmatch(i[[icol]], levels(x[[xcol]]), nomatch=0L)
+          if (anyNA(i[[icol]])) newvalue[is.na(i[[icol]])] = NA_integer_  # NA_character_ should match to NA in factor, #3809
+          set(i, j=icol, value=newvalue)
           next
         }
       }
-      stopf("Incompatible join types: %s (%s) and %s (%s). Factor columns must join to factor or character columns.", xname, xclass, iname, iclass)
+      stopf("Incompatible join types: %s (%s) and %s (%s). Factor columns must join to factor or character columns.", xname, x_merge_type, iname, i_merge_type)
     }
-    if (xclass == iclass) {
-      if (verbose) catf("%s has same type (%s) as %s. No coercion needed.\n", iname, xclass, xname)
+    # we check factors first to cater for the case when trying to do rolling joins on factors
+    if (x_merge_type == i_merge_type) {
+      if (verbose) catf("%s has same type (%s) as %s. No coercion needed.\n", iname, x_merge_type, xname)
       next
     }
-    if (xclass=="character" || iclass=="character" ||
-        xclass=="logical" || iclass=="logical" ||
-        xclass=="factor" || iclass=="factor") {
-      if (anyNA(i[[ic]]) && allNA(i[[ic]])) {
-        if (verbose) catf("Coercing all-NA %s (%s) to type %s to match type of %s.\n", iname, iclass, xclass, xname)
-        set(i, j=ic, value=match.fun(paste0("as.", xclass))(i[[ic]]))
+    cfl = c("character", "logical", "factor")
+    if (x_merge_type %chin% cfl || i_merge_type %chin% cfl) {
+      msg = if(verbose) gettext("Coercing all-NA %s column %s to type %s to match type of %s.\n") else NULL
+      if (anyNA(i[[icol]]) && allNA(i[[icol]])) {
+        coerce_col(i, icol, i_merge_type, x_merge_type, iname, xname, msg)
         next
       }
-      else if (anyNA(x[[xc]]) && allNA(x[[xc]])) {
-        if (verbose) catf("Coercing all-NA %s (%s) to type %s to match type of %s.\n", xname, xclass, iclass, iname)
-        set(x, j=xc, value=match.fun(paste0("as.", iclass))(x[[xc]]))
+      if (anyNA(x[[xcol]]) && allNA(x[[xcol]])) {
+        coerce_col(x, xcol, x_merge_type, i_merge_type, xname, iname, msg)
         next
       }
-      stopf("Incompatible join types: %s (%s) and %s (%s)", xname, xclass, iname, iclass)
+      stopf("Incompatible join types: %s (%s) and %s (%s)", xname, x_merge_type, iname, i_merge_type)
     }
-    if (xclass=="integer64" || iclass=="integer64") {
+    if (x_merge_type=="integer64" || i_merge_type=="integer64") {
       nm = c(iname, xname)
-      if (xclass=="integer64") { w=i; wc=ic; wclass=iclass; } else { w=x; wc=xc; wclass=xclass; nm=rev(nm) }  # w is which to coerce
+      if (x_merge_type=="integer64") { w=i; wc=icol; wclass=i_merge_type; } else { w=x; wc=xcol; wclass=x_merge_type; nm=rev(nm) }  # w is which to coerce
       if (wclass=="integer" || (wclass=="double" && !isReallyReal(w[[wc]]))) {
         if (verbose) catf("Coercing %s column %s%s to type integer64 to match type of %s.\n", wclass, nm[1L], if (wclass=="double") " (which contains no fractions)" else "", nm[2L])
         set(w, j=wc, value=bit64::as.integer64(w[[wc]]))
       } else stopf("Incompatible join types: %s is type integer64 but %s is type double and contains fractions", nm[2L], nm[1L])
     } else {
       # just integer and double left
-      if (iclass=="double") {
-        if (!isReallyReal(i[[ic]])) {
+      ic_idx = which(icol == icols) # check if on is joined on multiple conditions, #6602
+      if (i_merge_type=="double") {
+        coerce_x = FALSE
+        if (!isReallyReal(i[[icol]])) {
+          coerce_x = TRUE
           # common case of ad hoc user-typed integers missing L postfix joining to correct integer keys
           # we've always coerced to int and returned int, for convenience.
-          if (verbose) catf("Coercing double column %s (which contains no fractions) to type integer to match type of %s.\n", iname, xname)
-          val = as.integer(i[[ic]])
-          if (!is.null(attributes(i[[ic]]))) attributes(val) = attributes(i[[ic]])  # to retain Date for example; 3679
-          set(i, j=ic, value=val)
-          set(callersi, j=ic, value=val)       # change the shallow copy of i up in [.data.table to reflect in the result, too.
-        } else {
-          if (verbose) catf("Coercing integer column %s to type double to match type of %s which contains fractions.\n", xname, iname)
-          set(x, j=xc, value=as.double(x[[xc]]))
+          if (length(ic_idx)>1L) {
+            xc_idx = xcols[ic_idx]
+            for (xb in xc_idx[which(vapply_1c(.shallow(x, xc_idx), mergeType) == "double")]) {
+              if (isReallyReal(x[[xb]])) {
+                coerce_x = FALSE
+                break
+              }
+            }
+          }
+          if (coerce_x) {
+            msg = if (verbose) gettext("Coercing %s column %s (which contains no fractions) to type %s to match type of %s.\n") else NULL
+            coerce_col(i, icol, "double", "integer", iname, xname, msg)
+            set(callersi, j=icol, value=i[[icol]])       # change the shallow copy of i up in [.data.table to reflect in the result, too.
+            if (length(ic_idx)>1L) {
+              xc_idx = xcols[ic_idx]
+              for (xb in xc_idx[which(vapply_1c(.shallow(x, xc_idx), mergeType) == "double")]) {
+                coerce_col(x, xb, "double", "integer", paste0("x.", names(x)[xb]), xname, msg)
+              }
+            }
+          }
+        }
+        if (!coerce_x) {
+          msg = if (verbose) gettext("Coercing %s column %s to type %s to match type of %s which contains fractions.\n") else NULL
+          coerce_col(x, xcol, "integer", "double", xname, iname, msg)
         }
       } else {
-        if (verbose) catf("Coercing integer column %s to type double for join to match type of %s.\n", iname, xname)
-        set(i, j=ic, value=as.double(i[[ic]]))
+        msg = if (verbose) gettext("Coercing %s column %s to type %s for join to match type of %s.\n") else NULL
+        coerce_col(i, icol, "integer", "double", iname, xname, msg)
+        if (length(ic_idx)>1L) {
+          xc_idx = xcols[ic_idx]
+          for (xb in xc_idx[which(vapply_1c(.shallow(x, xc_idx), mergeType) == "integer")]) {
+            coerce_col(x, xb, "integer", "double", paste0("x.", names(x)[xb]), xname, msg)
+          }
+        }
       }
     }
   }
 
@@ -2736,15 +2736,16 @@ setnames = function(x,old,new,skip_absent=FALSE) {
   invisible(x)
 }
 
-setcolorder = function(x, neworder=key(x), before=NULL, after=NULL)  # before/after #4358
+setcolorder = function(x, neworder=key(x), before=NULL, after=NULL, skip_absent=FALSE)  # before/after #4358
 {
   if (is.character(neworder))
     check_duplicate_names(x)
   if (!is.null(before) && !is.null(after))
     stopf("Provide either before= or after= but not both")
   if (length(before)>1L || length(after)>1L)
     stopf("before=/after= accept a single column name or number, not more than one")
-  neworder = colnamesInt(x, neworder, check_dups=FALSE)  # dups are now checked inside Csetcolorder below
+  neworder = colnamesInt(x, neworder, check_dups=FALSE, skip_absent=skip_absent)  # dups are now checked inside Csetcolorder below
+  neworder = neworder[neworder != 0] # tests 498.11, 498.13 fail w/o this
   if (length(before))
     neworder = c(setdiff(seq_len(colnamesInt(x, before) - 1L), neworder), neworder)
   if (length(after))
 
@@ -66,6 +66,15 @@
       lockBinding("rbind.data.frame",baseenv())
     }
   }
+  if (session_r_version < "3.6.0") { # corresponds to S3method() directive in NAMESPACE
+    # no delayed registration support for NAMESPACE; perform it manually
+    if (isNamespaceLoaded("knitr")) {
+      registerS3method("knit_print", "data.table", knit_print.data.table, envir = asNamespace("knitr"))
+    }
+    setHook(packageEvent("knitr", "onLoad"), function(...) {
+      registerS3method("knit_print", "data.table", knit_print.data.table, envir = asNamespace("knitr"))
+    })
+  }
 
   # Set options for the speed boost in v1.8.0 by avoiding 'default' arg of getOption(,default=)
   # In fread and fwrite we have moved back to using getOption's default argument since it is unlikely fread and fread will be called in a loop many times, plus they
Original file line number	Diff line number	Diff line change
`@@ -98,5 +98,7 @@ Authors@R: c(`
`98`	`98`	`person("Christian", "Wia", role="ctb"),`
`99`	`99`	`person("Elise", "Maigné", role="ctb"),`
`100`	`100`	`person("Vincent", "Rocher", role="ctb"),`
`101`		`- person("Vijay", "Lulla", role="ctb")`
	`101`	`+ person("Vijay", "Lulla", role="ctb"),`
	`102`	`+ person("Aljaž", "Sluga", role="ctb"),`
	`103`	`+ person("Bill", "Evans", role="ctb")`
`102`	`104`	`)`