You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: vignettes/datatable-faq.Rmd
+4-4Lines changed: 4 additions & 4 deletions
Original file line number
Diff line number
Diff line change
@@ -26,6 +26,10 @@ h2 {
26
26
}
27
27
</style>
28
28
29
+
```{r echo=FALSE, file='_translation_links.R'}
30
+
```
31
+
`r .write.translation.links("Translations of this document are available in: %s")`
32
+
29
33
```{r, echo = FALSE, message = FALSE}
30
34
library(data.table)
31
35
knitr::opts_chunk$set(
@@ -37,10 +41,6 @@ knitr::opts_chunk$set(
37
41
.old.th = setDTthreads(1)
38
42
```
39
43
40
-
```{r echo=FALSE, file='_translation_links.R'}
41
-
```
42
-
`r .write.translation.links("Translations of this document are available in: %s")`
43
-
44
44
The first section, Beginner FAQs, is intended to be read in order, from start to finish. It's just written in a FAQ style to be digested more easily. It isn't really the most frequently asked questions. A better measure for that is looking on Stack Overflow.
45
45
46
46
This FAQ is required reading and considered core documentation. Please do not ask questions on Stack Overflow or raise issues on GitHub until you have read it. We can all tell when you ask that you haven't read it. So if you do ask and haven't read it, don't use your real name.
Copy file name to clipboardExpand all lines: vignettes/datatable-intro.Rmd
+29-9Lines changed: 29 additions & 9 deletions
Original file line number
Diff line number
Diff line change
@@ -9,6 +9,10 @@ vignette: >
9
9
\usepackage[utf8]{inputenc}
10
10
---
11
11
12
+
```{r echo=FALSE, file='_translation_links.R'}
13
+
```
14
+
`r .write.translation.links("Translations of this document are available in: %s")`
15
+
12
16
```{r, echo = FALSE, message = FALSE}
13
17
require(data.table)
14
18
knitr::opts_chunk$set(
@@ -21,10 +25,6 @@ knitr::opts_chunk$set(
21
25
.old.th = setDTthreads(1)
22
26
```
23
27
24
-
```{r echo=FALSE, file='_translation_links.R'}
25
-
```
26
-
`r .write.translation.links("Translations of this document are available in: %s")`
27
-
28
28
This vignette introduces the `data.table` syntax, its general form, how to *subset* rows, *select and compute* on columns, and perform aggregations *by group*. Familiarity with the `data.frame` data structure from base R is useful, but not essential to follow this vignette.
29
29
30
30
***
@@ -316,7 +316,7 @@ ans
316
316
317
317
We could have accomplished the same operation by doing `nrow(flights[origin == "JFK" & month == 6L])`. However, it would have to subset the entire `data.table` first corresponding to the *row indices* in `i`*and then* return the rows using `nrow()`, which is unnecessary and inefficient. We will cover this and other optimisation aspects in detail under the *`data.table` design* vignette.
318
318
319
-
### h) Great! But how can I refer to columns by names in `j` (like in a `data.frame`)? {#refer_j}
319
+
### h) Great! But how can I refer to columns by names in `j` (like in a `data.frame`)? {#refer-j}
320
320
321
321
If you're writing out the column names explicitly, there's no difference compared to a `data.frame` (since v1.9.8).
322
322
@@ -422,7 +422,7 @@ ans
422
422
423
423
We'll use this convenient form wherever applicable hereafter.
424
424
425
-
#### -- How can we calculate the number of trips for each origin airport for carrier code `"AA"`? {#origin-.N}
425
+
#### -- How can we calculate the number of trips for each origin airport for carrier code `"AA"`? {#origin-N}
426
426
427
427
The unique carrier code `"AA"` corresponds to *American Airlines Inc.*
428
428
@@ -435,7 +435,7 @@ ans
435
435
436
436
* Using those *row indices*, we obtain the number of rows while grouped by `origin`. Once again no columns are actually materialised here, because the `j-expression` does not require any columns to be actually subsetted and is therefore fast and memory efficient.
437
437
438
-
#### -- How can we get the total number of trips for each `origin, dest` pair for carrier code `"AA"`? {#origin-dest-.N}
438
+
#### -- How can we get the total number of trips for each `origin, dest` pair for carrier code `"AA"`? {#origin-dest-N}
439
439
440
440
```{r}
441
441
ans <- flights[carrier == "AA", .N, by = .(origin, dest)]
@@ -483,7 +483,7 @@ We'll learn more about `keys` in the [`vignette("datatable-keys-fast-subset", pa
483
483
484
484
### c) Chaining
485
485
486
-
Let's reconsider the task of [getting the total number of trips for each `origin, dest` pair for carrier *"AA"*](#origin-dest-.N).
486
+
Let's reconsider the task of [getting the total number of trips for each `origin, dest` pair for carrier *"AA"*](#origin-dest-N).
487
487
488
488
```{r}
489
489
ans <- flights[carrier == "AA", .N, by = .(origin, dest)]
@@ -583,7 +583,7 @@ We are almost there. There is one little thing left to address. In our `flights`
583
583
584
584
Using the argument `.SDcols`. It accepts either column names or column indices. For example, `.SDcols = c("arr_delay", "dep_delay")` ensures that `.SD` contains only these two columns for each group.
585
585
586
-
Similar to [part g)](#refer_j), you can also specify the columns to remove instead of columns to keep using `-` or `!`. Additionally, you can select consecutive columns as `colA:colB` and deselect them as `!(colA:colB)` or `-(colA:colB)`.
586
+
Similar to [part g)](#refer-j), you can also specify the columns to remove instead of columns to keep using `-` or `!`. Additionally, you can select consecutive columns as `colA:colB` and deselect them as `!(colA:colB)` or `-(colA:colB)`.
587
587
588
588
Now let us try to use `.SD` along with `.SDcols` to get the `mean()` of `arr_delay` and `dep_delay` columns grouped by `origin`, `dest` and `month`.
In (1), for each group, a vector is returned, with length = 6,4,2 here. However, (2) returns a list of length 1 for each group, with its first element holding vectors of length 6,4,2. Therefore, (1) results in a length of ` 6+4+2 = `r 6+4+2``, whereas (2) returns `1+1+1=`r 1+1+1``.
645
645
646
+
Flexibility of j allows us to store any list object as an element of data.table. For example, when statistical models are fit to groups, these models can be stored in a data.table. Code is concise and easy to understand.
647
+
648
+
```{r}
649
+
## Do long distance flights cover up departure delay more than short distance flights?
0 commit comments