You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: NEWS.md
+6Lines changed: 6 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -19,14 +19,20 @@
19
19
20
20
2. `fread()` can now read a remote compressed file in one step; `fread("https://domain.org/file.csv.bz2")`. The `file=` argument now supports `.gz` and `.bz2` too; i.e. `fread(file="file.csv.gz")` works now where only `fread("file.csv.gz")` worked in 1.11.8.
21
21
22
+
2. `nomatch=NULL` now does the same as `nomatch=0L`; i.e. discards missing values silently (inner join). The default is still `nomatch=NA` (outer join) for statistical safety so that missing values are retained by default. You have to explicitly write `nomatch=NULL` to indicate to the reader of your code that you intend to discard missing values silently. After several years have elapsed, we will start to deprecate `0L`; please start using `NULL`. TO DO ... `nomatch=.(0)` fills with `0` instead of `NA`, [#857](https://github.com/Rdatatable/data.table/issues/857) and `nomatch="error"`.
23
+
22
24
#### BUG FIXES
23
25
24
26
1. Providing an `i` subset expression when attempting to delete a column correctly failed with helpful error, but when the column was missing too created a new column full of `NULL` values, [#3089](https://github.com/Rdatatable/data.table/issues/3089). Thanks to Michael Chirico for reporting.
25
27
28
+
2. Column names that look like expressions (e.g. `"a<=colB"`) caused an error when used in `on=` even when wrapped with backticks, [#3092](https://github.com/Rdatatable/data.table/issues/3092). Additionally, `on=` now supports white spaces around operators; e.g. `on = "colA == colB"`. Thanks to @mt1022 for reporting and to @MarkusBonsch for fixing.
29
+
26
30
#### NOTES
27
31
28
32
1. When data.table first loads it now checks the DLL's MD5. This is to detect installation issues on Windows when you upgrade and i) the DLL is in use by another R session and ii) the CRAN source version > CRAN binary binary which happens just after a new release (R prompts users to install from source until the CRAN binary is available). This situation can lead to a state where the package's new R code calls old C code in the old DLL; [R#17478](https://bugs.r-project.org/bugzilla/show_bug.cgi?id=17478), [#3056](https://github.com/Rdatatable/data.table/issues/3056). This broken state can persist until, hopefully, you experience a strange error caused by the mismatch. Otherwise, wrong results may occur silently. This situation applies to any R package with compiled code not just data.table, is Windows-only, and is long-standing. It has only recently been understood as it typically only occurs during the few days after each new release until binaries are available on CRAN. Thanks to Gabor Csardi for the suggestion to use `tools::checkMD5sums()`.
29
33
34
+
2. When `on=` is provided but not `i=`, a helpful error is now produced rather than silently ignoring `on=`. Thanks to Dirk Eddelbuettel for the idea.
if (length(rollends)==1L) rollends=rep.int(rollends,2L)
248
248
# TO DO (document/faq/example). Removed for now ... if ((roll || rolltolast) && missing(mult)) mult="last" # for when there is exact match to mult. This does not control cases where the roll is mult, that is always the last one.
249
249
missingnomatch= missing(nomatch)
250
-
if (!is.na(nomatch) &&nomatch!=0L) stop("nomatch must either be NA or 0, or (ideally) NA_integer_ or 0L")
250
+
if (is.null(nomatch)) nomatch=0L# allow nomatch=NULL API already now, part of: https://github.com/Rdatatable/data.table/issues/857
251
+
if (!is.na(nomatch) &&nomatch!=0L) stop("nomatch= must be either NA or NULL (or 0 for backwards compatibility which is the same as NULL)")
251
252
nomatch= as.integer(nomatch)
252
-
if (!is.logical(which) || length(which)>1L) stop("'which' must be a logical vector length 1. Either FALSE, TRUE or NA.")
253
-
if ((isTRUE(which)||is.na(which)) &&!missing(j)) stop("'which' is ",which," (meaning return row numbers) but 'j' is also supplied. Either you need row numbers or the result of j, but only one type of result can be returned.")
253
+
if (!is.logical(which) || length(which)>1L) stop("which= must be a logical vector length 1. Either FALSE, TRUE or NA.")
254
+
if ((isTRUE(which)||is.na(which)) &&!missing(j)) stop("which==",which," (meaning return row numbers) but j is also supplied. Either you need row numbers or the result of j, but only one type of result can be returned.")
254
255
if (!is.na(nomatch) && is.na(which)) stop("which=NA with nomatch=0 would always return an empty vector. Please change or remove either which or nomatch.")
255
256
.global$print=""
256
257
if (missing(i) && missing(j)) {
257
258
# ...[] == oops at console, forgot print(...)
258
259
# or some kind of dynamic construction that has edge case of no contents inside [...]
260
+
if (nargs()>2L) # 2 is minimum: 1) method name, 2) x
261
+
stop("When i and j are both missing, no other argument should be used. Empty [] is useful after := to have the result printed.")
259
262
return(x)
260
263
}
264
+
if (!with&& missing(j)) stop("j must be provided when with=FALSE")
265
+
if (missing(i) &&!missing(on)) stop("i must be provided when on= is provided")
261
266
if (!missing(keyby)) {
262
267
if (!missing(by)) stop("Provide either 'by' or 'keyby' but not both")
test(1948.8, DT[i, on = c( id = "idi", "sameName", "`counts(a>=0)`==` weirdName>=`")], DT[i, on = "id==idi", c("id", "counts(a>=0)", "sameName")])
12299
+
## testing 'eval' in on clause
12300
+
test(1948.9, DT[i, on = eval(eval("id<=idi"))], DT[i, on = "id<=idi"])
12301
+
## testing for errors
12302
+
test(1948.11, DT[i, on = ""], error = "'on' contains no column name: . Each 'on' clause must contain one or two column names.")
12303
+
test(1948.12, DT[i, on = "id>=idi>=1"], error = "Found more than one operator in one 'on' statement: id>=idi>=1. Please specify a single operator.")
12304
+
test(1948.13, DT[i, on = "`id``idi`<=id"], error = "'on' contains more than 2 column names: `id``idi`<=id. Each 'on' clause must contain one or two column names.")
12305
+
test(1948.14, DT[i, on = "id != idi"], error = "Invalid operators !=. Only allowed operators are ==<=<>=>.")
12306
+
test(1948.15, DT[i, on = 1L], error = "'on' argument should be a named atomic vector of column names indicating which columns in 'i' should be joined with which columns in 'x'.")
12307
+
12308
+
# helpful error when on= is provided but not i, rather than silently ignoring on=
12309
+
test(1949.1, DT[,,on=A], error="When i and j are both missing, no other argument should be used.")
12310
+
test(1949.2, DT[,1,on=A], error="i must be provided when on= is provided")
12311
+
test(1949.3, DT[1,,with=FALSE], error="j must be provided when with=FALSE")
12312
+
12313
+
if (test_bit64) {
12314
+
# explicit coverage of 2-column real case in uniqlist. Keeps coming up in codecov checks in PRs that don't touch uniqlist.c
When \code{j} is a character vector of column names, a numeric vector of column positions to select or of the form \code{startcol:endcol}, and the value returned is always a \code{data.table}. \code{with=FALSE} is not necessary anymore to select columns dynamically. Note that \code{x[, cols]} is equivalent to \code{x[, ..cols]} and to \code{x[, cols, with=FALSE]} and to \code{x[, .SD, .SDcols=cols]}.}
119
119
120
-
\item{nomatch}{ Same as \code{nomatch} in \code{\link{match}}. When a row in \code{i} has no match to \code{x}, \code{nomatch=NA} (default) means \code{NA} is returned. \code{0} means no rows will be returned for that row of \code{i}. Use \code{options(datatable.nomatch=0)} to change the default value (used when \code{nomatch} is not supplied).}
120
+
\item{nomatch}{ Same as \code{nomatch} in \code{\link{match}}. When a row in \code{i} has no match to \code{x}, \code{nomatch=NA} (default) means \code{NA} is returned. \code{NULL} (or \code{0} for backward compatibility) means no rows will be returned for that row of \code{i}. Use \code{options(datatable.nomatch=NULL)} to change the default value (used when \code{nomatch} is not supplied).}
121
121
122
122
\item{mult}{ When \code{i} is a \code{list} (or \code{data.frame} or \code{data.table}) and \emph{multiple} rows in \code{x} match to the row in \code{i}, \code{mult} controls which are returned: \code{"all"} (default), \code{"first"} or \code{"last"}.}
123
123
@@ -289,7 +289,7 @@ DT[x!="b" | y!=3] # not yet optimized, currently vector scan subset
289
289
DT[.("b", 3), on=c("x", "y")] # join on columns x,y of DT; uses binary search (fast)
290
290
DT[.("b", 3), on=.(x, y)] # same, but using on=.()
291
291
DT[.("b", 1:2), on=c("x", "y")] # no match returns NA
292
-
DT[.("b", 1:2), on=.(x, y), nomatch=0] # no match row is not returned
292
+
DT[.("b", 1:2), on=.(x, y), nomatch=NULL] # no match row is not returned
0 commit comments