You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
\item or of the form \code{startcol:endcol}: e.g., \code{DT[, sum(a), by=x:z]}
112
112
}
113
113
114
-
\emph{Advanced:} When \code{i} is a \code{list} (or \code{data.frame} or \code{data.table}), \code{DT[i, j, by=.EACHI]} evaluates \code{j} for the groups in \code{DT} that each row in \code{i} joins to. That is, you can join (in \code{i}) and aggregate (in \code{j}) simultaneously. We call this \emph{grouping by each i}. See \href{https://stackoverflow.com/a/27004566/559784}{this StackOverflow answer} for a more detailed explanation until we \href{https://github.com/Rdatatable/data.table/issues/944}{roll out vignettes}.
114
+
\emph{Advanced:} When \code{i} is a \code{list} (or \code{data.frame} or \code{data.table}), \code{DT[i, j, by=.EACHI]} evaluates \code{j} for the groups in \code{DT} that each row in \code{i} joins to. That is, you can join (in \code{i}) and aggregate (in \code{j}) simultaneously. We call this \emph{grouping by each i}. Note that for rows in \code{i} with no match, the group of matching rows in \code{x} is empty. Special symbols that operate on rows (e.g., \code{.I} or \code{.N}) will therefore evaluate to \code{0} for such groups. This differs from selecting a column from \code{x} (e.g., \code{x$col}), which results in \code{NA} as governed by the \code{nomatch} argument. See \href{https://stackoverflow.com/a/27004566/559784}{this StackOverflow answer} for a more detailed explanation until we \href{https://github.com/Rdatatable/data.table/issues/944}{roll out vignettes}.
115
115
116
116
\emph{Advanced:} In the \code{X[Y, j]} form of grouping, the \code{j} expression sees variables in \code{X} first, then \code{Y}. We call this \emph{join inherited scope}. If the variable is not in \code{X} or \code{Y} then the calling frame is searched, its calling frame, and so on in the usual way up to and including the global environment.}
117
117
@@ -320,6 +320,13 @@ DT[!"a", sum(v), by=.EACHI, on="x"] # same, but using subsets-as-joins
320
320
DT[c("b","c"), sum(v), by=.EACHI, on="x"] # same
321
321
DT[c("b","c"), sum(v), by=.EACHI, on=.(x)] # same, using on=.()
322
322
323
+
#' # Why .I is 0 for non-matching rows with by=.EACHI:
Copy file name to clipboardExpand all lines: vignettes/datatable-joins.Rmd
+36Lines changed: 36 additions & 0 deletions
Original file line number
Diff line number
Diff line change
@@ -259,6 +259,42 @@ dt2 = ProductReceived[
259
259
identical(dt1, dt2)
260
260
```
261
261
262
+
##### Understanding `j` Evaluation with `by=.EACHI` for Non-Matches
263
+
264
+
A common point of confusion arises when using special symbols like `.I` in `j` with `by=.EACHI`. The behavior for non-matching rows differs from what you might expect when selecting a regular column.
When we select a column from `x (d1)`, non-matching rows from `i (d2)` result in `NA`. This is the standard behavior governed by `nomatch = NA`.
275
+
```{r}
276
+
d1[d2, on = .(v), .(i_col), by = .EACHI]
277
+
```
278
+
279
+
For the rows `D` and `G` in `d2`, there is no matching row in `d1`, so the value for `i_col` is missing `(NA)`.
280
+
281
+
*Case 2: Evaluating the special symbol `.I`*
282
+
283
+
However, when we use the special symbol `.I`, non-matching rows evaluate to `0`.
284
+
```{r}
285
+
d1[d2, on = .(v), .I, by = .EACHI]
286
+
```
287
+
288
+
The reason for this difference is crucial:
289
+
- In Case 1, we are performing a value lookup. A failed lookup results in a missing value (`NA`).
290
+
- In Case 2, we are performing an evaluation. The symbol `.I` is defined as "the row indices in `x` for the current group". For non-matching rows like `D`, the group of matching rows in d1 is empty. The set of indices for an empty group is integer(0). data.table represents this zero-length result as a single `0` in the output.
291
+
292
+
This logic is consistent with other special symbols like `.N` (the number of rows in a group), which also correctly evaluates to `0` for non-matching groups.
293
+
294
+
```{r}
295
+
d1[d2, on = .(v), .N, by = .EACHI]
296
+
```
297
+
262
298
#### 3.1.4. Joining based on several columns
263
299
264
300
So far we have just joined `data.table`s based on 1 column, but it's important to know that the package can join tables matching several columns.
0 commit comments