You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: NEWS.md
+37-1Lines changed: 37 additions & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -18,7 +18,31 @@
18
18
19
19
6.`between()` gains the argument `ignore_tzone=FALSE`. Normally, a difference in time zone between `lower` and `upper` will produce an error, and a difference in time zone between `x` and either of the others will produce a message. Setting `ignore_tzone=TRUE` bypasses the checks, allowing both comparisons to proceed without error or message about time zones.
20
20
21
-
7. New `frev(x)` as a faster analogue to `base::rev()` for atomic vectors/lists, [#5885](https://github.com/Rdatatable/data.table/issues/5885). Twice as fast as `base::rev()` on large inputs, and faster with more threads. Thanks to Benjamin Schwendinger for suggesting and implementing.
21
+
7. New helper function `fctr` as an extended version of `factor()`, [#4837](https://github.com/Rdatatable/data.table/issues/4837). Most notably, it supports (1) retaining input level ordering by default, i.e. `levels=unique(x)` as opposed to `levels = sort(unique(x))`; (2) `rev=` to reverse the levels; and (3) `sort=` to allow more feature parity with `factor()`. The choice of default is motivated by convenience in the common case when order of elements needs be preserved, for example when using `dcast` or adding a legend to a plot. This also matches the default sort ordering of groups in `by=`.
9.New`frev(x)`asafasteranalogueto`base::rev()`foratomicvectors/lists, [#5885](https://github.com/Rdatatable/data.table/issues/5885). Twice as fast as `base::rev()` on large inputs, and faster with more threads. Thanks to Benjamin Schwendinger for suggesting and implementing.
22
46
23
47
### BUG FIXES
24
48
@@ -46,6 +70,18 @@
46
70
47
71
12. Internal functions used to signal errors are now marked as non-returning, silencing a compiler warning about potentially unchecked allocation failure. Thanks to Prof. Brian D. Ripley for the report and @aitap for the fix, [#7070](https://github.com/Rdatatable/data.table/pull/7070).
48
72
73
+
13. In rare cases, `data.table` failed to expand ALTREP columns when assigning a full column by reference. This could result in the target column getting modified unintentionally if the next call to the data.table was a modification by reference of the source column. E.g. in `DT[, b := as.character(a)]` the string conversion gets deferred and subsequent modification of column `a` would also modify column `b`, [#5400](https://github.com/Rdatatable/data.table/issues/5400). Thanks to @aquasync for the report and Václav Tlapák for the PR.
74
+
75
+
14. `data.table()` function is now more aligned with `data.frame()` with respect to the names of the output when one of its inputs is a single-column matrix object, [#4124](https://github.com/Rdatatable/data.table/issues/4124). Thanks @PavoDive for the report and @jangorecki for the PR.
76
+
77
+
15. Including an `ITime` object as a named input to `data.frame()` respects the provided name, i.e. `data.frame(a = as.ITime(...))` will have column `a`, [#4673](https://github.com/Rdatatable/data.table/issues/4673). Thanks @shrektan for the report and @MichaelChirico for the fix.
78
+
79
+
16. `fread()` now handles the `na.strings` argument for quoted text columns, making it possible to specify `na.strings = '""'` and read empty quoted strings as `NA`s, [#6974](https://github.com/Rdatatable/data.table/issues/6974). Thanks to @AngelFelizR for the report and @aitap for the PR.
80
+
81
+
17. A data.table with a column of class `vctrs_list_of` (from package {vctrs}) prints as expected, [#5948](https://github.com/Rdatatable/data.table/issues/5948). Before, they could be printed messily, e.g. printing every entry in a nested data.frame. Thanks @jesse-smith for the report, @DavisVaughan and @r2evans for contributing, and @MichaelChirico for the PR.
82
+
83
+
18. Fixed incorrect sorting of merges where the first column of a key is a factor with non-`sort()`-ed levels (e.g. `factor(1:2, 2:1)` and it is joined to a character column, [#5361](https://github.com/Rdatatable/data.table/issues/5361). Thanks to @gbrunick for the report and Benjamin Schwendinger for the fix.
84
+
49
85
### NOTES
50
86
51
87
1. Continued work to remove non-API C functions, [#6180](https://github.com/Rdatatable/data.table/issues/6180). Thanks Ivan Krylov for the PRs and for writing a clear and concise guide about the R API: https://aitap.codeberg.page/R-api/.
if (is.factor(x[[last_x_idx]]) || is.factor(i[[last_i_idx]]))
34
+
stopf("Attempting roll join on factor column when joining x.%s to i.%s. Only integer, double or character columns may be roll joined.", names(x)[last_x_idx], names(i)[last_i_idx])
35
+
}
36
+
30
37
callersi=i
31
38
i= shallow(i)
32
39
# Just before the call to bmerge() in [.data.table there is a shallow() copy of i to prevent coercions here
if (!x_merge_type %chin% supported) stopf("%s is type %s which is not supported by data.table join", xname, x_merge_type)
66
73
if (!i_merge_type %chin% supported) stopf("%s is type %s which is not supported by data.table join", iname, i_merge_type)
74
+
# we check factors first because they might have different levels
67
75
if (x_merge_type=="factor"||i_merge_type=="factor") {
68
-
if (roll!=0.0&&a==length(icols))
69
-
stopf("Attempting roll join on factor column when joining %s to %s. Only integer, double or character columns may be roll joined.", xname, iname)
70
76
if (x_merge_type=="factor"&&i_merge_type=="factor") {
71
77
if (verbose) catf("Matching %s factor levels to %s factor levels.\n", iname, xname)
72
78
set(i, j=icol, value=chmatch(levels(i[[icol]]), levels(x[[xcol]]), nomatch=0L)[i[[icol]]]) # nomatch=0L otherwise a level that is missing would match to NA values
stopf("Incompatible join types: %s (%s) and %s (%s). Factor columns must join to factor or character columns.", xname, x_merge_type, iname, i_merge_type)
88
94
}
89
-
# we check factors first to cater for the case when trying to do rolling joins on factors
90
95
if (x_merge_type==i_merge_type) {
91
96
if (verbose) catf("%s has same type (%s) as %s. No coercion needed.\n", iname, x_merge_type, xname)
} else as.double(nrow(x)+nrow(i))) # rows in i might not match to x so old max(nrow(x),nrow(i)) wasn't enough. But this limit now only applies when there are duplicates present so the reason now for nrow(x)+nrow(i) is just to nail it down and be bigger than max(nrow(x),nrow(i)).
545
+
if (allLen1) {
546
+
irows=f__
547
+
} else {
548
+
join.many= isTRUE(getOption("datatable.join.many")) # #914, default TRUE for backward compatibility
549
+
anyDups=!notjoin&&
550
+
(
551
+
# #698. When notjoin=TRUE, ignore allow.cartesian. Rows in answer will never be > nrow(x).
552
+
(join.many&&!allow.cartesian) ||
553
+
# special case of scalar i match to const duplicated x, not handled by anyDuplicate: data.table(x=c(1L,1L))[data.table(x=1L), on="x"]
limit=if (anyDups) { # #742. If 'i' has no duplicates, ignore
558
+
if (!join.many) stopf("Joining resulted in many-to-many join. Perform quality check on your data, use mult!='all', or set 'datatable.join.many' option to TRUE to allow rows explosion.")
559
+
if (allow.cartesian) internal_error("checking allow.cartesian and join.many, unexpected else branch reached") # nocov
560
+
as.double(nrow(x)+nrow(i)) # rows in i might not match to x so old max(nrow(x),nrow(i)) wasn't enough. But this limit now only applies when there are duplicates present so the reason now for nrow(x)+nrow(i) is just to nail it down and be bigger than max(nrow(x),nrow(i)).
561
+
}
562
+
irows= vecseq(f__, len__, limit)
563
+
}
551
564
if (verbose) {cat(timetaken(last.started.at),"\n"); flush.console()}
552
565
# Fix for #1092 and #1074
553
566
# TODO: implement better version of "any"/"all"/"which" to avoid
if ( (keylen>len||chk) &&!.Call(CisOrderedSubset, irows, nrow(x))) {
1345
-
keylen=if (!chk) lenelse0L# fix for #1268
1346
-
}
1347
-
## check key on i as well!
1348
-
ichk= is.data.table(i) && haskey(i) &&
1349
-
identical(head(key(i), length(leftcols)), names_i[leftcols]) # i has the correct key, #3061
1350
-
if (keylen&& (ichk|| is.logical(i) || (.Call(CisOrderedSubset, irows, nrow(x)) && ((roll==FALSE) || length(irows) ==1L)))) # see #1010. don't set key when i has no key, but irows is ordered and roll != FALSE
1351
-
setattr(ans,"sorted",head(key(x),keylen))
1352
-
}
1351
+
# NB: could be NULL
1352
+
setattr(ans, "sorted", .join_result_key(x, i, ans, if (!missing(on)) names(on), ansvars, leftcols, rightcols, names_i, irows, roll))
1353
1353
setattr(ans, "class", class(x)) # retain class that inherits from data.table, #64
stopf("External command failed with exit code %d. This can happen when the disk is full in the temporary directory ('%s'). See ?fread for the tmpdir argument.", status, tmpdir)
stopf("To read %s files directly, fread() requires 'R.utils' package which cannot be found. Please install 'R.utils' using 'install.packages('R.utils')'.", if (w<=2L||gzsig) "gz"else"bz2") # nocov
118
122
FUN=if (w<=2L||gzsig) gzfileelsebzfile
119
-
R.utils::decompressFile(file, decompFile<-tempfile(tmpdir=tmpdir), ext=NULL, FUN=FUN, remove=FALSE) # ext is not used by decompressFile when destname is supplied, but isn't optional
120
-
file=decompFile# don't use 'tmpFile' symbol again, as tmpFile might be the http://domain.org/file.csv.gz download
123
+
decompFile= tempfile(tmpdir=tmpdir)
121
124
on.exit(unlink(decompFile), add=TRUE)
125
+
tryCatch({
126
+
R.utils::decompressFile(file, decompFile, ext=NULL, FUN=FUN, remove=FALSE) # ext is not used by decompressFile when destname is supplied, but isn't optional
127
+
}, error=function(e) {
128
+
stopf("R.utils::decompressFile failed to decompress file '%s':\n %s\n. This can happen when the disk is full in the temporary directory ('%s'). See ?fread for the tmpdir argument.", file, conditionMessage(e), tmpdir)
129
+
})
130
+
file=decompFile# don't use 'tmpFile' symbol again, as tmpFile might be the http://domain.org/file.csv.gz download
122
131
}
123
132
file= enc2native(file) # CfreadR cannot handle UTF-8 if that is not the native encoding, see #3078.
0 commit comments