|
316 | 316 |
|
317 | 317 | We could have accomplished the same operation by doing `nrow(flights[origin == "JFK" & month == 6L])`. However, it would have to subset the entire `data.table` first corresponding to the *row indices* in `i` *and then* return the rows using `nrow()`, which is unnecessary and inefficient. We will cover this and other optimisation aspects in detail under the *`data.table` design* vignette. |
318 | 318 |
|
319 | | -### h) Great! But how can I refer to columns by names in `j` (like in a `data.frame`)? {#refer_j} |
| 319 | +### h) Great! But how can I refer to columns by names in `j` (like in a `data.frame`)? {#refer-j} |
320 | 320 |
|
321 | 321 | If you're writing out the column names explicitly, there's no difference compared to a `data.frame` (since v1.9.8). |
322 | 322 |
|
|
422 | 422 |
|
423 | 423 | We'll use this convenient form wherever applicable hereafter. |
424 | 424 |
|
425 | | -#### -- How can we calculate the number of trips for each origin airport for carrier code `"AA"`? {#origin-.N} |
| 425 | +#### -- How can we calculate the number of trips for each origin airport for carrier code `"AA"`? {#origin-N} |
426 | 426 |
|
427 | 427 | The unique carrier code `"AA"` corresponds to *American Airlines Inc.* |
428 | 428 |
|
|
435 | 435 |
|
436 | 436 | * Using those *row indices*, we obtain the number of rows while grouped by `origin`. Once again no columns are actually materialised here, because the `j-expression` does not require any columns to be actually subsetted and is therefore fast and memory efficient. |
437 | 437 |
|
438 | | -#### -- How can we get the total number of trips for each `origin, dest` pair for carrier code `"AA"`? {#origin-dest-.N} |
| 438 | +#### -- How can we get the total number of trips for each `origin, dest` pair for carrier code `"AA"`? {#origin-dest-N} |
439 | 439 |
|
440 | 440 | ```{r} |
441 | 441 | ans <- flights[carrier == "AA", .N, by = .(origin, dest)] |
@@ -483,7 +483,7 @@ We'll learn more about `keys` in the [`vignette("datatable-keys-fast-subset", pa |
483 | 483 |
|
484 | 484 | ### c) Chaining |
485 | 485 |
|
486 | | -Let's reconsider the task of [getting the total number of trips for each `origin, dest` pair for carrier *"AA"*](#origin-dest-.N). |
| 486 | +Let's reconsider the task of [getting the total number of trips for each `origin, dest` pair for carrier *"AA"*](#origin-dest-N). |
487 | 487 |
|
488 | 488 | ```{r} |
489 | 489 | ans <- flights[carrier == "AA", .N, by = .(origin, dest)] |
@@ -583,7 +583,7 @@ We are almost there. There is one little thing left to address. In our `flights` |
583 | 583 |
|
584 | 584 | Using the argument `.SDcols`. It accepts either column names or column indices. For example, `.SDcols = c("arr_delay", "dep_delay")` ensures that `.SD` contains only these two columns for each group. |
585 | 585 |
|
586 | | -Similar to [part g)](#refer_j), you can also specify the columns to remove instead of columns to keep using `-` or `!`. Additionally, you can select consecutive columns as `colA:colB` and deselect them as `!(colA:colB)` or `-(colA:colB)`. |
| 586 | +Similar to [part g)](#refer-j), you can also specify the columns to remove instead of columns to keep using `-` or `!`. Additionally, you can select consecutive columns as `colA:colB` and deselect them as `!(colA:colB)` or `-(colA:colB)`. |
587 | 587 |
|
588 | 588 | Now let us try to use `.SD` along with `.SDcols` to get the `mean()` of `arr_delay` and `dep_delay` columns grouped by `origin`, `dest` and `month`. |
589 | 589 |
|
|
0 commit comments