You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: NEWS.md
+7-1Lines changed: 7 additions & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -98,7 +98,13 @@
98
98
99
99
15.`dcast()`nowissuesawarningwhen`fun.aggregate`isusedbutnotprovidedbytheuser.`fun.aggregate`defaultsto`length`inthiscase.Previously, onlyamessagewasissued.However, relyingonthisdefaultoftensignalsunexpectedduplicatesinthedata.Therefore, astricterclassofsignalwasdeemedmoreappropriate, [#5386](https://github.com/Rdatatable/data.table/issues/5386). The warning is classed as `dt_missing_fun_aggregate_warning`, allowing for more targeted handling in user code. Thanks @MichaelChirico for the suggestion and @Nj221102 for the fix.
100
100
101
-
16.Assigning`list(NULL)`toalistcolumnnowreplacesthecolumnwith`list(NULL)`, insteadofdeletingthecolumn [#5558](https://github.com/Rdatatable/data.table/issues/5558). This behavior is now consistent with base `data.frame`. Thanks @tdhock for the report and @joshhwuu for the fix.
101
+
16.`print.data.table`gainsnewargument`show.indices`andoption`datatable.show.indices`thatallowstheusertoprinta`data.table`'s indices as columns without having to modify the `data.table` itself. Thanks @MichaelChirico for the report and @joshhwuu for the PR.
102
+
103
+
17. Assigning `list(NULL)` to a list column now replaces the column with `list(NULL)`, instead of deleting the column [#5558](https://github.com/Rdatatable/data.table/issues/5558). This behavior is now consistent with base `data.frame`. Thanks @tdhock for the report and @joshhwuu for the fix.
104
+
105
+
## TRANSLATIONS
106
+
107
+
1. Fix a typo in a Mandarin translation of an error message that was hiding the actual error message, [#6172](https://github.com/Rdatatable/data.table/issues/6172). Thanks @trafficfan for the report and @MichaelChirico for the fix.
102
108
103
109
# data.table [v1.15.0](https://github.com/Rdatatable/data.table/milestone/29) (30 Jan 2024)
Copy file name to clipboardExpand all lines: man/assign.Rd
+16-1Lines changed: 16 additions & 1 deletion
Original file line number
Diff line number
Diff line change
@@ -87,7 +87,7 @@ While in most cases standard and functional form of \code{:=} are interchangeabl
87
87
Since \code{[.data.table} incurs overhead to check the existence and type of arguments (for example), \code{set()} provides direct (but less flexible) assignment by reference with low overhead, appropriate for use inside a \code{for} loop. See examples. \code{:=} is more powerful and flexible than \code{set()} because \code{:=} is intended to be combined with \code{i} and \code{by} in single queries on large datasets.
88
88
}
89
89
\note{
90
-
\code{DT[a > 4, b := c]} is different from \code{DT[a > 4][, b := c]}. The first expression updates (or adds) column \code{b} with the value \code{c} on those rows where \code{a > 4} evaluates to \code{TRUE}. \code{X} is updated \emph{by reference}, therefore no assignment needed.
90
+
\code{DT[a > 4, b := c]} is different from \code{DT[a > 4][, b := c]}. The first expression updates (or adds) column \code{b} with the value \code{c} on those rows where \code{a > 4} evaluates to \code{TRUE}. \code{X} is updated \emph{by reference}, therefore no assignment needed. Note that this does not apply when `i` is missing, i.e. \code{DT[]}.
91
91
92
92
The second expression on the other hand updates a \emph{new} \code{data.table} that'sreturnedbythesubsetoperation.Sincethesubsetteddata.tableis ephemeral (itisnotassignedtoasymbol), theresultwouldbelost; unlesstheresultisassigned, forexample, asfollows: \code{ans<-DT[a>4][, b:=c]}.
Copy file name to clipboardExpand all lines: vignettes/datatable-intro.Rmd
+44-1Lines changed: 44 additions & 1 deletion
Original file line number
Diff line number
Diff line change
@@ -250,6 +250,49 @@ The function `length()` requires an input argument. We just need to compute the
250
250
251
251
This type of operation occurs quite frequently, especially while grouping (as we will see in the next section), to the point where `data.table` provides a *special symbol*`.N` for it.
252
252
253
+
### g) Handle non-existing elements in `i`
254
+
255
+
#### -- What happens when querying for non-existing elements?
256
+
257
+
When querying a `data.table` for elements that do not exist, the behavior differs based on the method used.
258
+
259
+
```r
260
+
setkeyv(flights, "origin")
261
+
```
262
+
263
+
***Key-based subsetting: `dt["d"]`**
264
+
265
+
This performs a right join on the key column `x`, resulting in a row with `d` and `NA` for columns not found. When using `setkeyv`, the table is sorted by the specified keys and an internal index is created, enabling binary search for efficient subsetting.
266
+
267
+
```r
268
+
flights["XYZ"]
269
+
# Returns:
270
+
# origin year month day dep_time sched_dep_time dep_delay arr_time sched_arr_time arr_delay carrier flight tailnum ...
271
+
# 1: XYZ NA NA NA NA NA NA NA NA NA NA NA NA ...
272
+
```
273
+
274
+
***Logical subsetting: `dt[x == "d"]`**
275
+
276
+
This performs a standard subset operation that does not find any matching rows and thus returns an empty `data.table`.
277
+
278
+
```r
279
+
flights[origin=="XYZ"]
280
+
# Returns:
281
+
# Empty data.table (0 rows and 19 cols): year,month,day,dep_time,sched_dep_time,dep_delay,arr_time,sched_arr_time,arr_delay,...
282
+
```
283
+
284
+
***Exact match using `nomatch=NULL`**
285
+
286
+
For exact matches without `NA` for non-existing elements, use `nomatch=NULL`:
287
+
288
+
```r
289
+
flights["XYZ", nomatch=NULL]
290
+
# Returns:
291
+
# Empty data.table (0 rows and 19 cols): year,month,day,dep_time,sched_dep_time,dep_delay,arr_time,sched_arr_time,arr_delay,...
292
+
```
293
+
294
+
Understanding these behaviors can help prevent confusion when dealing with non-existing elements in your data.
295
+
253
296
#### Special symbol `.N`: {#special-N}
254
297
255
298
`.N` is a special built-in variable that holds the number of observations _in the current group_. It is particularly useful when combined with `by` as we'll see in the next section. In the absence of group by operations, it simply returns the number of rows in the subset.
@@ -269,7 +312,7 @@ ans
269
312
270
313
We could have accomplished the same operation by doing `nrow(flights[origin == "JFK" & month == 6L])`. However, it would have to subset the entire `data.table` first corresponding to the *row indices* in `i`*and then* return the rows using `nrow()`, which is unnecessary and inefficient. We will cover this and other optimisation aspects in detail under the *`data.table` design* vignette.
271
314
272
-
### g) Great! But how can I refer to columns by names in `j` (like in a `data.frame`)? {#refer_j}
315
+
### h) Great! But how can I refer to columns by names in `j` (like in a `data.frame`)? {#refer_j}
273
316
274
317
If you're writing out the column names explicitly, there's no difference compared to a `data.frame` (since v1.9.8).
Copy file name to clipboardExpand all lines: vignettes/datatable-reference-semantics.Rmd
+2Lines changed: 2 additions & 0 deletions
Original file line number
Diff line number
Diff line change
@@ -78,6 +78,8 @@ A *shallow* copy is just a copy of the vector of column pointers (corresponding
78
78
79
79
A *deep* copy on the other hand copies the entire data to another location in memory.
80
80
81
+
When subsetting a *data.table* using `i` (e.g., `DT[1:10]`), a *deep* copy is made. However, when `i` is not provided or equals `TRUE`, a *shallow* copy is made.
82
+
81
83
#
82
84
With *data.table's*`:=` operator, absolutely no copies are made in *both* (1) and (2), irrespective of R version you are using. This is because `:=` operator updates *data.table* columns *in-place* (by reference).
0 commit comments