Merge branch 'master' into ReformattingAndCleanup

badasahog · web-flow · commit 064387a02a9c · 2025-07-22T15:44:34.000-04:00
diff --git a/NEWS.md b/NEWS.md
@@ -122,21 +122,21 @@
 
 5. A GitHub Actions workflow is now in place to warn the entire maintainer team, as well as any contributor following the GitHub repository, when the package is at risk of archival on CRAN [#7008](https://github.com/Rdatatable/data.table/issues/7008). Thanks @tdhock for the original report and @Bisaloo and @TysonStanley for the fix.
 
-# data.table [v1.17.8](https://github.com/Rdatatable/data.table/milestone/41) (6 July 2025)
+## data.table [v1.17.8](https://github.com/Rdatatable/data.table/milestone/41) (6 July 2025)
 
 1. Internal functions used to signal errors are now marked as non-returning, silencing a compiler warning about potentially unchecked allocation failure. Thanks to Prof. Brian D. Ripley for the report and @aitap for the fix, [#7070](https://github.com/Rdatatable/data.table/pull/7070).
 
-# data.table [v1.17.6](https://github.com/Rdatatable/data.table/milestone/40) (15 June 2025)
+## data.table [v1.17.6](https://github.com/Rdatatable/data.table/milestone/40) (15 June 2025)
 
 1. On a heavily loaded machine, a `forder` thread could try to perform a zero-length copy from a null pointer, which was de-facto harmless but is against the C standard and was caught by additional CRAN checks, [#7051](https://github.com/Rdatatable/data.table/issues/7051). Thanks to @helske for the report and @aitap for the PR.
 
-# data.table [v1.17.4](https://github.com/Rdatatable/data.table/milestone/39) (25 May 2025)
+## data.table [v1.17.4](https://github.com/Rdatatable/data.table/milestone/39) (25 May 2025)
 
 1. The C code now avoids passing invalid data pointers from 0-length vectors to `memcpy()`, which previously caused undefined behaviour. Thanks to Prof. Brian D. Ripley for the report and Michael Chirico for the fix, [#6911](https://github.com/Rdatatable/data.table/pull/6911).
 
-# data.table [v1.17.2](https://github.com/Rdatatable/data.table/milestone/38) (7 May 2025)
+## data.table [v1.17.2](https://github.com/Rdatatable/data.table/milestone/38) (7 May 2025)
 
-## BUG FIXES
+### BUG FIXES
 
 1. `fwrite(compress="gzip")` once again produces a gzip header when the column names are missing or disabled, [@6852](https://github.com/Rdatatable/data.table/issues/6852). Thanks @maxscheiber for the report and @aitap for the fix.
 
@@ -152,7 +152,7 @@
 
 7. `as.data.table()` now properly handles keys: specifying keys sets them, omitting keys preserves existing ones, and setting `key=NULL` clears them, [#6859](https://github.com/Rdatatable/data.table/issues/6859). Thanks @brookslogan for the report and @Mukulyadav2004 for the fix.
 
-## NOTES
+### NOTES
 
 1. Continued work to remove non-API C functions, [#6180](https://github.com/Rdatatable/data.table/issues/6180). Thanks Ivan Krylov for the PRs and for writing a clear and concise guide about the R API: https://aitap.codeberg.page/R-api/.
 
diff --git a/R/between.R b/R/between.R
@@ -30,8 +30,8 @@ between = function(x, lower, upper, incbounds=TRUE, NAbounds=TRUE, check=FALSE,
   }
   if (is.i64(x)) {
     if (!requireNamespace("bit64", quietly=TRUE)) stopf("trying to use integer64 class when 'bit64' package is not installed") # nocov
-    if (!is.i64(lower) && is.numeric(lower)) lower = bit64::as.integer64(lower)
-    if (!is.i64(upper) && is.numeric(upper)) upper = bit64::as.integer64(upper)
+    if (!is.i64(lower) && (is.integer(lower) || fitsInInt64(lower))) lower = bit64::as.integer64(lower)
+    if (!is.i64(upper) && (is.integer(upper) || fitsInInt64(upper))) upper = bit64::as.integer64(upper)
   }
   is.supported = function(x) is.numeric(x) || is.character(x) || is.px(x)
   if (is.supported(x) && is.supported(lower) && is.supported(upper)) {
diff --git a/inst/tests/tests.Rraw b/inst/tests/tests.Rraw
@@ -15068,7 +15068,7 @@ if (test_bit64) {
   as.i64 = bit64::as.integer64
   test(2039.01, between(1:10, as.i64(3), as.i64(6)),  error="x is not integer64 but.*Please align classes")
   test(2039.02, between(1:10, 3, as.i64(6)),          error="x is not integer64 but.*Please align classes")
-  test(2039.03, between(as.i64(1:3), "2", as.i64(4)), error="x is integer64 but lower and/or upper are not")
+  test(2039.03, between(as.i64(1:3), "2", as.i64(4)), error="x is integer64 but lower is not.*Please align classes")
   old = options("datatable.verbose"=TRUE)
   x = as.i64(1:10)
   ans36 = c(FALSE,FALSE,TRUE,TRUE,TRUE,TRUE,FALSE,FALSE,FALSE,FALSE)
@@ -15095,6 +15095,10 @@ if (test_bit64) {
   test(2039.19, between(x+maxint, 3+maxint, NA, incbounds=FALSE), c(head(ans36open, -5L), rep(TRUE, 5)), output="between parallel processing of integer64 took")
   test(2039.20, between(x+maxint, rep(NA, 10L), rep(6+maxint, 10L)), c(TRUE, TRUE, tail(ans36, -2L)), output="between parallel processing of integer64 took")
   test(2039.21, between(x+maxint, rep(3+maxint, 10L), rep(NA, 10L), incbounds=FALSE), c(head(ans36open, -5L), rep(TRUE, 5)), output="between parallel processing of integer64 took")
+  # must not blindly read integer64 values as doubles when the latter fit into int32, #7164
+  test(2039.22, between(42L, structure(41., class="integer64"), structure(43., class="integer64")), error="x is not integer64 but.*Please align classes")
+  # must not blindly convert numeric bounds to integer64, #7164
+  test(2039.23, between(as.i64(42), 41, -2^98), error="x is integer64 but upper is not.*Please align classes")
   options(old)
 }
 
diff --git a/src/between.c b/src/between.c
@@ -29,6 +29,19 @@ SEXP between(SEXP x, SEXP lower, SEXP upper, SEXP incbounds, SEXP NAboundsArg, S
   const bool check = LOGICAL(checkArg)[0];
   const bool verbose = GetVerbose();
 
+  // check before potential coercion which ignores methods, #7164
+  if (INHERITS(x, char_integer64)) {
+    if (!INHERITS(lower, char_integer64))
+      error(_("x is integer64 but %s is not. Please align classes."), "lower"); // e.g. between(int64, character, character)
+    if (!INHERITS(upper, char_integer64))
+      error(_("x is integer64 but %s is not. Please align classes."), "upper"); // e.g. between(int64, character, character)
+  } else {
+    if (INHERITS(lower, char_integer64))
+      error(_("x is not integer64 but %s is. Please align classes."), "lower");
+    if (INHERITS(upper, char_integer64))
+      error(_("x is not integer64 but %s is. Please align classes."), "upper");
+  }
+
   if (isInteger(x)) {
     if ((isInteger(lower) || fitsInInt32(lower)) &&
         (isInteger(upper) || fitsInInt32(upper))) { // #3517 coerce to num to int when possible
@@ -90,8 +103,6 @@ SEXP between(SEXP x, SEXP lower, SEXP upper, SEXP incbounds, SEXP NAboundsArg, S
 
   case REALSXP:
     if (INHERITS(x, char_integer64)) {
-      if (!INHERITS(lower, char_integer64) || !INHERITS(upper, char_integer64))
-        error(_("x is integer64 but lower and/or upper are not.")); // e.g. between(int64, character, character)
       const int64_t *lp = (int64_t *)REAL(lower);
       const int64_t *up = (int64_t *)REAL(upper);
       const int64_t *xp = (int64_t *)REAL(x);
@@ -117,8 +128,6 @@ SEXP between(SEXP x, SEXP lower, SEXP upper, SEXP incbounds, SEXP NAboundsArg, S
       }
       if (verbose) Rprintf(_("between parallel processing of integer64 took %8.3fs\n"), omp_get_wtime()-tic);
     } else {
-      if (INHERITS(lower, char_integer64) || INHERITS(upper, char_integer64))
-        error(_("x is not integer64 but lower and/or upper is integer64. Please align classes."));
       const double *lp = REAL(lower);
       const double *up = REAL(upper);
       const double *xp = REAL(x);
diff --git a/src/fread.c b/src/fread.c
@@ -927,10 +927,13 @@ static void parse_double_hexadecimal(FieldParseContext *ctx)
   if (neg) ch++;
   else if (*ch == '+') ch++;
 
-  const bool subnormal = ch[2] == '0';
+  bool subnormal = false;
 
+  // Important!
+  // Keep in mind that only ch[0] is guaranteed to be mapped.
+  // Rearranging these checks (e.g. to make 'subnormal' const) will lead to segfaults in rare cases.
   if (ch[0] == '0' && (ch[1] == 'x' || ch[1] == 'X') &&
-      (ch[2] == '1' || (subnormal)) && ch[3] == '.') {
+      (ch[2] == '1' || (subnormal = (ch[2] == '0'))) && ch[3] == '.') {
     ch += 4;
     uint64_t acc = 0;
     uint8_t digit;
diff --git a/src/utils.c b/src/utils.c
@@ -14,7 +14,7 @@ bool within_int64_repres(double x) {
 // used to error if not passed type double but this needed extra is.double() calls in calling R code
 // which needed a repeat of the argument. Hence simpler and more robust to return false when not type double.
 bool fitsInInt32(SEXP x) {
-  if (!isReal(x))
+  if (!isReal(x) || INHERITS(x, char_integer64))
     return false;
   R_xlen_t n=xlength(x), i=0;
   const double *dx = REAL(x);
@@ -31,7 +31,7 @@ SEXP fitsInInt32R(SEXP x) {
 }
 
 bool fitsInInt64(SEXP x) {
-  if (!isReal(x))
+  if (!isReal(x) || INHERITS(x, char_integer64))
     return false;
   R_xlen_t n=xlength(x), i=0;
   const double *dx = REAL(x);
diff --git a/vignettes/datatable-joins.Rmd b/vignettes/datatable-joins.Rmd
@@ -117,7 +117,7 @@ x[i, on, nomatch]
 \____ secondary data.table
 ```
 
-Note: Please keep in mind that the standard argument order in `data.table` is `dt[i, j, by]`. For join operations, it is recommended to pass the `on` and `nomatch` arguments by name to avoid using `j` and `by` when they are not needed.
+**Note**: Please keep in mind that the standard argument order in `data.table` is `dt[i, j, by]`. For join operations, it is recommended to pass the `on` and `nomatch` arguments by name to avoid using `j` and `by` when they are not needed.
 
 ## 3. Equi joins
 
@@ -439,7 +439,7 @@ ProductReceived[ProductSales,
                 allow.cartesian = TRUE]
 ```
 
-Note: `allow.cartesian` is defaulted to FALSE as this is seldom what the user wants, and such a cross join can lead to a very large number of rows in the result. For example, if Table A has 100 rows and Table B has 50 rows, their Cartesian product would result in 5000 rows (100 * 50). This can quickly become memory-intensive for large datasets.
+**Note**: `allow.cartesian` is defaulted to FALSE as this is seldom what the user wants, and such a cross join can lead to a very large number of rows in the result. For example, if Table A has 100 rows and Table B has 50 rows, their Cartesian product would result in 5000 rows (100 * 50). This can quickly become memory-intensive for large datasets.
 
 
 #### 3.6.1. Selecting one match
diff --git a/vignettes/fr/datatable-joins.Rmd b/vignettes/fr/datatable-joins.Rmd
@@ -117,7 +117,7 @@ x[i, on, nomatch]
 \____ data.table secondaire
 ```
 
-Note : n'oubliez pas que l'ordre standard des arguments dans data.table est `dt[i, j, by]`. Pour les opérations de jointure, il est recommandé de passer les arguments `on` et `nomatch` par nom pour éviter d'utiliser `j` et `by` quand ce n'est pas nécessaire.
+**Note** : n'oubliez pas que l'ordre standard des arguments dans data.table est `dt[i, j, by]`. Pour les opérations de jointure, il est recommandé de passer les arguments `on` et `nomatch` par nom pour éviter d'utiliser `j` et `by` quand ce n'est pas nécessaire.
 
 ## 3. Jointures équilibrées
 
@@ -439,7 +439,7 @@ ProductReceived[ProductSales,
                 allow.cartesian = TRUE]
 ```
 
-Note : `allow.cartesian` vaut par défaut FALSE car c'est ce que l'utilisateur a souhaité, et une telle jointure croisée peut conduire à un très grand nombre de lignes dans le résultat. Par exemple, si Table A possède 100 lignes et Table
+**Note** : `allow.cartesian` vaut par défaut FALSE car c'est ce que l'utilisateur a souhaité, et une telle jointure croisée peut conduire à un très grand nombre de lignes dans le résultat. Par exemple, si Table A possède 100 lignes et Table
 B en a 50, leur produit cartésien sera de 5000 lignes (100 * 50). Ce qui peut rapidement accroître la mémoire occupée pour de grands ensembles de données.
 
 #### 3.6.1. Selection d'une seule correspondance

Original file line number	Diff line number	Diff line change
`@@ -30,8 +30,8 @@ between = function(x, lower, upper, incbounds=TRUE, NAbounds=TRUE, check=FALSE,`
`30`	`30`	`}`
`31`	`31`	`if (is.i64(x)) {`
`32`	`32`	`if (!requireNamespace("bit64", quietly=TRUE)) stopf("trying to use integer64 class when 'bit64' package is not installed") # nocov`
`33`		`- if (!is.i64(lower) && is.numeric(lower)) lower = bit64::as.integer64(lower)`
`34`		`- if (!is.i64(upper) && is.numeric(upper)) upper = bit64::as.integer64(upper)`
	`33`	`+ if (!is.i64(lower) && (is.integer(lower) \|\| fitsInInt64(lower))) lower = bit64::as.integer64(lower)`
	`34`	`+ if (!is.i64(upper) && (is.integer(upper) \|\| fitsInInt64(upper))) upper = bit64::as.integer64(upper)`
`35`	`35`	`}`
`36`	`36`	`is.supported = function(x) is.numeric(x) \|\| is.character(x) \|\| is.px(x)`
`37`	`37`	`if (is.supported(x) && is.supported(lower) && is.supported(upper)) {`