You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Allow double-integer64 joins when double is in (integer32 , integer64] range (#6626)
* Allow double-integer64 joins when double is in (integer32 , integer64] range
* rename R-side argument for readability?
* Totally drop isReallyReal, just use isRealReallyInt with flavors for 32/64
* Error: flip result when changing to isRealReallyInt*
* Further simplify -- first* helpers not needed if we just return bool
* logical inversion
* Subtle difference vs isReallyReal (type check)
* Same subtle difference in .prepareFastSubset
* .prepareFastSubset fix
* fix test output
* Add duplicate bug number to NEWS
* amend a new call site for isReallyReal
* fix botched merge
* add codecov test
* add non exported function
* isRealReallyInt -> fitsInInt
---------
Co-authored-by: Benjamin Schwendinger <[email protected]>
Copy file name to clipboardExpand all lines: NEWS.md
+2Lines changed: 2 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -117,6 +117,8 @@ rowwiseDT(
117
117
118
118
15. The auto-printing suppression in `knitr` documents is now done by implementing a method for `knit_print` instead of looking up the call stack, [#6589](https://github.com/Rdatatable/data.table/pull/6589). Thanks to @jangorecki for the report [#6509](https://github.com/Rdatatable/data.table/issues/6509) and @aitap for the fix.
119
119
120
+
16. Joins of `integer64` and `double` columns succeed when the `double` column has lossless `integer64` representation, [#4167](https://github.com/Rdatatable/data.table/issues/4167) and [#6625](https://github.com/Rdatatable/data.table/issues/6625). Previously, this only worked when the double column had lossless _32-bit_ integer representation. Thanks @MichaelChirico for the reports and fix.
121
+
120
122
## NOTES
121
123
122
124
1. There is a new vignette on joins! See `vignette("datatable-joins")`. Thanks to Angel Feliz for authoring it! Feedback welcome. This vignette has been highly requested since 2017: [#2181](https://github.com/Rdatatable/data.table/issues/2181).
if (x_merge_type=="integer64"||i_merge_type=="integer64") {
104
104
nm= c(iname, xname)
105
105
if (x_merge_type=="integer64") { w=i; wc=icol; wclass=i_merge_type; } else { w=x; wc=xcol; wclass=x_merge_type; nm=rev(nm) } # w is which to coerce
106
-
if (wclass=="integer"|| (wclass=="double"&&!isReallyReal(w[[wc]]))) {
107
-
if (verbose) catf("Coercing %s column %s%s to type integer64 to match type of %s.\n", wclass, nm[1L], if (wclass=="double") " (which contains no fractions)"else"", nm[2L])
106
+
if (wclass=="integer"|| (wclass=="double"&&fitsInInt64(w[[wc]]))) {
107
+
if (verbose) catf("Coercing %s column %s%s to type integer64 to match type of %s.\n", wclass, nm[1L], if (wclass=="double") " (which has integer64 representation, e.g. no fractions)"else"", nm[2L])
108
108
set(w, j=wc, value=bit64::as.integer64(w[[wc]]))
109
-
} else stopf("Incompatible join types: %s is type integer64 but %s is type double and contains fractions", nm[2L], nm[1L])
109
+
} else stopf("Incompatible join types: %s is type integer64 but %s is type double and cannot be coerced to integer64 (e.g. has fractions)", nm[2L], nm[1L])
110
110
} else {
111
111
# just integer and double left
112
112
ic_idx= which(icol==icols) # check if on is joined on multiple conditions, #6602
113
113
if (i_merge_type=="double") {
114
114
coerce_x=FALSE
115
-
if (!isReallyReal(i[[icol]])) {
115
+
if (fitsInInt32(i[[icol]])) {
116
116
coerce_x=TRUE
117
117
# common case of ad hoc user-typed integers missing L postfix joining to correct integer keys
118
118
# we've always coerced to int and returned int, for convenience.
119
119
if (length(ic_idx)>1L) {
120
120
xc_idx=xcols[ic_idx]
121
121
for (xbinxc_idx[which(vapply_1c(.shallow(x, xc_idx), mergeType) =="double")]) {
if (length(RHS) != nrow(x)) stopf("RHS of %s is length %d which is not 1 or nrow (%d). For robustness, no recycling is allowed (other than of length 1 RHS). Consider %%in%% instead.", operator, length(RHS), nrow(x))
if ( mode(x[[col]]) != mode(RHS) ||# mode() so that doubleLHS/integerRHS and integerLHS/doubleRHS!isReallyReal are optimized (both sides mode 'numeric')
3245
-
is.factor(x[[col]])+is.factor(RHS) ==1L||# but factor is also mode 'numeric' so treat that separately
3246
-
is.integer(x[[col]]) &&isReallyReal(RHS) ) { # and if RHS contains fractions then don't optimize that as bmerge truncates the fractions to match to the target integer type
3244
+
if ( (mode(x[[col]]) != mode(RHS))||# mode() so that doubleLHS/integerRHS and integerLHS/doubleRHS&fitsInInt32 are optimized (both sides mode 'numeric')
3245
+
(is.factor(x[[col]])+is.factor(RHS) ==1L)||# but factor is also mode 'numeric' so treat that separately
3246
+
(is.integer(x[[col]]) &&is.double(RHS) &&!fitsInInt32(RHS)) ) { # and if RHS contains fractions then don't optimize that as bmerge truncates the fractions to match to the target integer type
3247
3247
# re-direct non-matching type cases to base R, as data.table's binary
3248
3248
# search based join is strict in types. #957, #961 and #1361
3249
3249
# the mode() checks also deals with NULL since mode(NULL)=="NULL" and causes this return, as one CRAN package (eplusr 0.9.1) relies on
if (verbose&&INHERITS(x, char_Date) &&INTEGER(isReallyReal(x))[0]==0) {
599
-
Rprintf(_("\n*** Column %d passed to forder is a date stored as an 8 byte double but no fractions are present. Please consider a 4 byte integer date such as IDate to save space and time.\n"), col+1);
600
-
// Note the (slightly expensive) isReallyReal will only run when verbose is true. Prefix '***' just to make it stand out in verbose output
598
+
if (verbose&&INHERITS(x, char_Date) &&fitsInInt32(x)) {
599
+
// Note the (slightly expensive) fitsInInt32 will only run when verbose is true. Prefix '***' just to make it stand out in verbose output
601
600
// In future this could be upgraded to option warning. But I figured that's what we use verbose to do (to trace problems and look for efficiencies).
602
601
// If an automatic coerce is desired (see discussion in #1738) then this is the point to do that in this file. Move the INTSXP case above to be
603
602
// next, do the coerce of Date to integer now to a tmp, and then let this case fall through to INTSXP in the same way as CPLXSXP falls through to REALSXP.
603
+
Rprintf(_("\n*** Column %d passed to forder is a date stored as an 8 byte double but no fractions are present. Please consider a 4 byte integer date such as IDate to save space and time.\n"), col+1);
0 commit comments