Rdatatable
diff --git a/‎vignettes/datatable-faq.Rmd‎
Lines changed: 4 additions & 4 deletions b/‎vignettes/datatable-faq.Rmd‎
Lines changed: 4 additions & 4 deletions
diff --git a/‎vignettes/datatable-intro.Rmd‎
Lines changed: 29 additions & 9 deletions b/‎vignettes/datatable-intro.Rmd‎
Lines changed: 29 additions & 9 deletions
@@ -26,6 +26,10 @@ h2 {
 }
 </style>
 
+```{r echo=FALSE, file='_translation_links.R'}
+```
+`r .write.translation.links("Translations of this document are available in: %s")`
+
 ```{r, echo = FALSE, message = FALSE}
 library(data.table)
 knitr::opts_chunk$set(
@@ -37,10 +41,6 @@ knitr::opts_chunk$set(
 .old.th = setDTthreads(1)
 ```
 
-```{r echo=FALSE, file='_translation_links.R'}
-```
-`r .write.translation.links("Translations of this document are available in: %s")`
-
 The first section, Beginner FAQs, is intended to be read in order, from start to finish.  It's just written in a FAQ style to be digested more easily. It isn't really the most frequently asked questions. A better measure for that is looking on Stack Overflow.
 
 This FAQ is required reading and considered core documentation. Please do not ask questions on Stack Overflow or raise issues on GitHub until you have read it. We can all tell when you ask that you haven't read it. So if you do ask and haven't read it, don't use your real name.
 
@@ -9,6 +9,10 @@ vignette: >
   \usepackage[utf8]{inputenc}
 ---
 
+```{r echo=FALSE, file='_translation_links.R'}
+```
+`r .write.translation.links("Translations of this document are available in: %s")`
+
 ```{r, echo = FALSE, message = FALSE}
 require(data.table)
 knitr::opts_chunk$set(
@@ -21,10 +25,6 @@ knitr::opts_chunk$set(
 .old.th = setDTthreads(1)
 ```
 
-```{r echo=FALSE, file='_translation_links.R'}
-```
-`r .write.translation.links("Translations of this document are available in: %s")`
-
 This vignette introduces the `data.table` syntax, its general form, how to *subset* rows, *select and compute* on columns, and perform aggregations *by group*. Familiarity with the `data.frame` data structure from base R is useful, but not essential to follow this vignette.
 
 ***
@@ -316,7 +316,7 @@ ans
 
 We could have accomplished the same operation by doing `nrow(flights[origin == "JFK" & month == 6L])`. However, it would have to subset the entire `data.table` first corresponding to the *row indices* in `i` *and then* return the rows using `nrow()`, which is unnecessary and inefficient. We will cover this and other optimisation aspects in detail under the *`data.table` design* vignette.
 
-### h) Great! But how can I refer to columns by names in `j` (like in a `data.frame`)? {#refer_j}
+### h) Great! But how can I refer to columns by names in `j` (like in a `data.frame`)? {#refer-j}
 
 If you're writing out the column names explicitly, there's no difference compared to a `data.frame` (since v1.9.8).
 
@@ -422,7 +422,7 @@ ans
 
     We'll use this convenient form wherever applicable hereafter.
 
-#### -- How can we calculate the number of trips for each origin airport for carrier code `"AA"`? {#origin-.N}
+#### -- How can we calculate the number of trips for each origin airport for carrier code `"AA"`? {#origin-N}
 
 The unique carrier code `"AA"` corresponds to *American Airlines Inc.*
 
@@ -435,7 +435,7 @@ ans
 
 * Using those *row indices*, we obtain the number of rows while grouped by `origin`. Once again no columns are actually materialised here, because the `j-expression` does not require any columns to be actually subsetted and is therefore fast and memory efficient.
 
-#### -- How can we get the total number of trips for each `origin, dest` pair for carrier code `"AA"`? {#origin-dest-.N}
+#### -- How can we get the total number of trips for each `origin, dest` pair for carrier code `"AA"`? {#origin-dest-N}
 
 ```{r}
 ans <- flights[carrier == "AA", .N, by = .(origin, dest)]
@@ -483,7 +483,7 @@ We'll learn more about `keys` in the [`vignette("datatable-keys-fast-subset", pa
 
 ### c) Chaining
 
-Let's reconsider the task of [getting the total number of trips for each `origin, dest` pair for carrier *"AA"*](#origin-dest-.N).
+Let's reconsider the task of [getting the total number of trips for each `origin, dest` pair for carrier *"AA"*](#origin-dest-N).
 
 ```{r}
 ans <- flights[carrier == "AA", .N, by = .(origin, dest)]
@@ -583,7 +583,7 @@ We are almost there. There is one little thing left to address. In our `flights`
 
 Using the argument `.SDcols`. It accepts either column names or column indices. For example, `.SDcols = c("arr_delay", "dep_delay")` ensures that `.SD` contains only these two columns for each group.
 
-Similar to [part g)](#refer_j), you can also specify the columns to remove instead of columns to keep using `-` or `!`. Additionally, you can select consecutive columns as `colA:colB` and deselect them as `!(colA:colB)` or `-(colA:colB)`.
+Similar to [part g)](#refer-j), you can also specify the columns to remove instead of columns to keep using `-` or `!`. Additionally, you can select consecutive columns as `colA:colB` and deselect them as `!(colA:colB)` or `-(colA:colB)`.
 
 Now let us try to use `.SD` along with `.SDcols` to get the `mean()` of `arr_delay` and `dep_delay` columns grouped by `origin`, `dest` and `month`.
 
@@ -643,6 +643,26 @@ DT[, print(list(c(a,b))), by = ID] # (2)
 
 In (1), for each group, a vector is returned, with length = 6,4,2 here. However, (2) returns a list of length 1 for each group, with its first element holding vectors of length 6,4,2. Therefore, (1) results in a length of ` 6+4+2 = `r 6+4+2``, whereas (2) returns `1+1+1=`r 1+1+1``.
 
+Flexibility of j allows us to store any list object as an element of data.table.  For example, when statistical models are fit to groups, these models can be stored in a data.table.  Code is concise and easy to understand.
+
+```{r}
+## Do long distance flights cover up departure delay more than short distance flights?
+## Does cover up vary by month?
+flights[, `:=`(makeup = dep_delay - arr_delay)]
+
+makeup.models <- flights[, .(fit = list(lm(makeup ~ distance))), by = .(month)]
+makeup.models[, .(coefdist = coef(fit[[1]])[2], rsq = summary(fit[[1]])$r.squared), by = .(month)]
+```
+Using data.frames, we need more complicated code to obtain same result.
+```{r}
+setDF(flights)
+flights.split <- split(flights, f = flights$month)
+makeup.models.list <- lapply(flights.split, function(df) c(month = df$month[1], fit = list(lm(makeup ~ distance, data = df))))
+makeup.models.df <- do.call(rbind, makeup.models.list)
+sapply(makeup.models.df[, "fit"], function(model) c(coefdist = coef(model)[2], rsq =  summary(model)$r.squared)) |> t() |> data.frame()
+setDT(flights)
+```
+
 ## Summary
 
 The general form of `data.table` syntax is: