Using hyperlinks and vignette() calls for readability (#6617)

Anirban166 · web-flow · commit 546259ddaba0 · 2024-11-20T15:32:24.000-05:00
diff --git a/vignettes/datatable-intro.Rmd b/vignettes/datatable-intro.Rmd
@@ -101,7 +101,7 @@ You can also convert existing objects to a `data.table` using `setDT()` (for `da
     getOption("datatable.print.nrows")
     ```
 
-* `data.table` doesn't set or use *row names*, ever. We will see why in the *"Keys and fast binary search based subset"* vignette.
+* `data.table` doesn't set or use *row names*, ever. We will see why in the [`vignette("datatable-keys-fast-subset", package="data.table")`](datatable-keys-fast-subset.html) vignette.
 
 ### b) General form - in what way is a `data.table` *enhanced*? {#enhanced-1b}
 
@@ -479,7 +479,7 @@ ans
 
 **Keys:** Actually `keyby` does a little more than *just ordering*. It also *sets a key* after ordering by setting an `attribute` called `sorted`. 
 
-We'll learn more about `keys` in the `vignette("datatable-keys-fast-subset", package="data.table")`; for now, all you have to know is that you can use `keyby` to automatically order the result by the columns specified in `by`.
+We'll learn more about `keys` in the [`vignette("datatable-keys-fast-subset", package="data.table")`](datatable-keys-fast-subset.html) vignette; for now, all you have to know is that you can use `keyby` to automatically order the result by the columns specified in `by`.
 
 ### c) Chaining
 
@@ -659,7 +659,7 @@ We have seen so far that,
 
 * We can also sort a `data.table` using `order()`, which internally uses data.table's fast order for better performance.
 
-We can do much more in `i` by keying a `data.table`, which allows for blazing fast subsets and joins. We will see this in the `vignette("datatable-keys-fast-subset", package="data.table")` and the `vignette("datatable-joins", package="data.table")`.
+We can do much more in `i` by keying a `data.table`, which allows for blazing fast subsets and joins. We will see this in the vignettes [`vignette("datatable-keys-fast-subset", package="data.table")`](datatable-keys-fast-subset.html) and [`vignette("datatable-joins", package="data.table")`](datatable-joins.html).
 
 #### Using `j`:
 
@@ -693,7 +693,7 @@ We can do much more in `i` by keying a `data.table`, which allows for blazing fa
 
 As long as `j` returns a `list`, each element of the list will become a column in the resulting `data.table`.
 
-We will see how to *add/update/delete* columns *by reference* and how to combine them with `i` and `by` in the next vignette (`vignette("datatable-reference-semantics", package="data.table")`).
+We will see how to *add/update/delete* columns *by reference* and how to combine them with `i` and `by` in the [next vignette (`vignette("datatable-reference-semantics", package="data.table")`)](datatable-reference-semantics.html).
 
 ***
 
diff --git a/vignettes/datatable-joins.Rmd b/vignettes/datatable-joins.Rmd
@@ -26,9 +26,9 @@ In this vignette you will learn how to perform any join operation using resource
 
 It assumes familiarity with the `data.table` syntax. If that is not the case, please read the following vignettes:
 
-- `vignette("datatable-intro", package="data.table")`
-- `vignette("datatable-reference-semantics", package="data.table")`
-- `vignette("datatable-keys-fast-subset", package="data.table")`
+- [`vignette("datatable-intro", package="data.table")`](datatable-intro.html)
+- [`vignette("datatable-reference-semantics", package="data.table")`](datatable-reference-semantics.html)
+- [`vignette("datatable-keys-fast-subset", package="data.table")`](datatable-keys-fast-subset.html)
 
 ***
 
diff --git a/vignettes/datatable-keys-fast-subset.Rmd b/vignettes/datatable-keys-fast-subset.Rmd
@@ -24,13 +24,13 @@ knitr::opts_chunk$set(
 .old.th = setDTthreads(1)
 ```
 
-This vignette is aimed at those who are already familiar with *data.table* syntax, its general form, how to subset rows in `i`, select and compute on columns, add/modify/delete columns *by reference* in `j` and group by using `by`. If you're not familiar with these concepts, please read the `vignette("datatable-intro", package="data.table")` and the `vignette("datatable-reference-semantics", package="data.table")` first.
+This vignette is aimed at those who are already familiar with *data.table* syntax, its general form, how to subset rows in `i`, select and compute on columns, add/modify/delete columns *by reference* in `j` and group by using `by`. If you're not familiar with these concepts, please read the vignettes [`vignette("datatable-intro", package="data.table")`](datatable-intro.html) and [`vignette("datatable-reference-semantics", package="data.table")`](datatable-reference-semantics.html) first.
 
 ***
 
 ## Data {#data}
 
-We will use the same `flights` data as in the `vignette("datatable-intro", package="data.table")`.
+We will use the same `flights` data as in the [`vignette("datatable-intro", package="data.table")`](datatable-intro.html) vignette.
 
 ```{r echo = FALSE}
 options(width = 100L)
@@ -58,7 +58,7 @@ In this vignette, we will
 
 ### a) What is a *key*?
 
-In the `vignette("datatable-intro", package="data.table")`, we saw how to subset rows in `i` using logical expressions, row numbers and using `order()`. In this section, we will look at another way of subsetting incredibly fast - using *keys*.
+In the [`vignette("datatable-intro", package="data.table")`](datatable-intro.html) vignette, we saw how to subset rows in `i` using logical expressions, row numbers and using `order()`. In this section, we will look at another way of subsetting incredibly fast - using *keys*.
 
 But first, let's start by looking at *data.frames*. All *data.frames* have a row names attribute. Consider the *data.frame* `DF` below.
 
@@ -143,7 +143,7 @@ head(flights)
 
 * Alternatively you can pass a character vector of column names to the function `setkeyv()`. This is particularly useful while designing functions to pass columns to set key on as function arguments.
 
-* Note that we did not have to assign the result back to a variable. This is because like the `:=` function we saw in the `vignette("datatable-reference-semantics", package="data.table")`, `setkey()` and `setkeyv()` modify the input *data.table* *by reference*. They return the result invisibly.
+* Note that we did not have to assign the result back to a variable. This is because like the `:=` function we saw in the [`vignette("datatable-reference-semantics", package="data.table")`](datatable-reference-semantics.html) vignette, `setkey()` and `setkeyv()` modify the input *data.table* *by reference*. They return the result invisibly.
 
 * The *data.table* is now reordered (or sorted) by the column we provided - `origin`. Since we reorder by reference, we only require additional memory of one column of length equal to the number of rows in the *data.table*, and is therefore very memory efficient.
 
@@ -262,7 +262,7 @@ flights[.("LGA", "TPA"), .(arr_delay)]
 
 * The *row indices* corresponding to `origin == "LGA"` and `dest == "TPA"` are obtained using *key based subset*.
 
-* Once we have the row indices, we look at `j` which requires only the `arr_delay` column. So we simply select the column `arr_delay` for those *row indices* in the exact same way as we have seen in `vignette("datatable-intro", package="data.table")`.
+* Once we have the row indices, we look at `j` which requires only the `arr_delay` column. So we simply select the column `arr_delay` for those *row indices* in the exact same way as we have seen in the [`vignette("datatable-intro", package="data.table")`](datatable-intro.html) vignette.
 
 * We could have returned the result by using `with = FALSE` as well.
 
@@ -290,7 +290,7 @@ flights[.("LGA", "TPA"), max(arr_delay)]
 
 ### d) *sub-assign* by reference using `:=` in `j`
 
-We have seen this example already in the `vignette("datatable-reference-semantics", package="data.table")`. Let's take a look at all the `hours` available in the `flights` *data.table*:
+We have seen this example already in the [`vignette("datatable-reference-semantics", package="data.table")`](datatable-reference-semantics.html) vignette. Let's take a look at all the `hours` available in the `flights` *data.table*:
 
 ```{r}
 # get all 'hours' in flights
@@ -498,7 +498,7 @@ In this vignette, we have learnt another method to subset rows in `i` by keying
 
 * combine key based subsets with `j` and `by`. Note that the `j` and `by` operations are exactly the same as before.
 
-Key based subsets are **incredibly fast** and are particularly useful when the task involves *repeated subsetting*. But it may not be always desirable to set key and physically reorder the *data.table*. In the next `vignette("datatable-secondary-indices-and-auto-indexing", package="data.table")`, we will address this using a *new* feature -- *secondary indexes*.
+Key based subsets are **incredibly fast** and are particularly useful when the task involves *repeated subsetting*. But it may not be always desirable to set key and physically reorder the *data.table*. In the next [next vignette (`vignette("datatable-secondary-indices-and-auto-indexing", package="data.table")`)](datatable-secondary-indices-and-auto-indexing.html), we will address this using a *new* feature -- *secondary indexes*.
 
 
 ```{r, echo=FALSE}
diff --git a/vignettes/datatable-reference-semantics.Rmd b/vignettes/datatable-reference-semantics.Rmd
@@ -23,13 +23,13 @@ knitr::opts_chunk$set(
  collapse = TRUE)
 .old.th = setDTthreads(1)
 ```
-This vignette discusses *data.table*'s reference semantics which allows to *add/update/delete* columns of a *data.table by reference*, and also combine them with `i` and `by`. It is aimed at those who are already familiar with *data.table* syntax, its general form, how to subset rows in `i`, select and compute on columns, and perform aggregations by group. If you're not familiar with these concepts, please read the `vignette("datatable-intro", package="data.table")` first.
+This vignette discusses *data.table*'s reference semantics which allows to *add/update/delete* columns of a *data.table by reference*, and also combine them with `i` and `by`. It is aimed at those who are already familiar with *data.table* syntax, its general form, how to subset rows in `i`, select and compute on columns, and perform aggregations by group. If you're not familiar with these concepts, please read the [`vignette("datatable-intro", package="data.table")`](datatable-intro.html) vignette first.
 
 ***
 
 ## Data {#data}
 
-We will use the same `flights` data as in the `vignette("datatable-intro", package="data.table")`.
+We will use the same `flights` data as in the [`vignette("datatable-intro", package="data.table")`](datatable-intro.html) vignette.
 
 ```{r echo = FALSE}
 options(width = 100L)
@@ -169,7 +169,7 @@ We see that there are totally `25` unique values in the data. Both *0* and *24*
 flights[hour == 24L, hour := 0L]
 ```
 
-* We can use `i` along with `:=` in `j` the very same way as we have already seen in the `vignette("datatable-intro", package="data.table")`.
+* We can use `i` along with `:=` in `j` the very same way as we have already seen in the [`vignette("datatable-intro", package="data.table")`](datatable-intro.html) vignette.
 
 * Column `hour` is replaced with `0` only on those *row indices* where the condition `hour == 24L` specified in `i` evaluates to `TRUE`.
 
@@ -234,7 +234,7 @@ head(flights)
 
 * We provide the columns to group by the same way as shown in the *Introduction to data.table* vignette. For each group, `max(speed)` is computed, which returns a single value. That value is recycled to fit the length of the group. Once again, no copies are being made at all. `flights` *data.table* is modified *in-place*.
 
-* We could have also provided `by` with a *character vector* as we saw in the `vignette("datatable-intro", package="data.table")`, e.g., `by = c("origin", "dest")`.
+* We could have also provided `by` with a *character vector* as we saw in the [`vignette("datatable-intro", package="data.table")`](datatable-intro.html) vignette, e.g., `by = c("origin", "dest")`.
 
 #
 
@@ -253,7 +253,7 @@ head(flights)
 
 * Note that since we allow assignment by reference without quoting column names when there is only one column as explained in [Section 2c](#delete-convenience), we can not do `out_cols := lapply(.SD, max)`. That would result in adding one new column named `out_cols`. Instead we should do either `c(out_cols)` or simply `(out_cols)`. Wrapping the variable name with `(` is enough to differentiate between the two cases.
 
-* The `LHS := RHS` form allows us to operate on multiple columns. In the RHS, to compute the `max` on columns specified in `.SDcols`, we make use of the base function `lapply()` along with `.SD` in the same way as we have seen before in the `vignette("datatable-intro", package="data.table")`. It returns a list of two elements, containing the maximum value corresponding to `dep_delay` and `arr_delay` for each group.
+* The `LHS := RHS` form allows us to operate on multiple columns. In the RHS, to compute the `max` on columns specified in `.SDcols`, we make use of the base function `lapply()` along with `.SD` in the same way as we have seen before in the [`vignette("datatable-intro", package="data.table")`](datatable-intro.html) vignette. It returns a list of two elements, containing the maximum value corresponding to `dep_delay` and `arr_delay` for each group.
 
 #
 Before moving on to the next section, let's clean up the newly created columns `speed`, `max_speed`, `max_dep_delay` and `max_arr_delay`.
@@ -369,7 +369,7 @@ However we could improve this functionality further by *shallow* copying instead
 
 * It is used to *add/update/delete* columns by reference.
 
-* We have also seen how to use `:=` along with `i` and `by` the same way as we have seen in the `vignette("datatable-intro", package="data.table")`. We can in the same way use `keyby`, chain operations together, and pass expressions to `by` as well all in the same way. The syntax is *consistent*.
+* We have also seen how to use `:=` along with `i` and `by` the same way as we have seen in the [`vignette("datatable-intro", package="data.table")`](datatable-intro.html) vignette. We can in the same way use `keyby`, chain operations together, and pass expressions to `by` as well all in the same way. The syntax is *consistent*.
 
 * We can use `:=` for its side effect or use `copy()` to not modify the original object while updating by reference.
 
@@ -379,6 +379,6 @@ setDTthreads(.old.th)
 
 #
 
-So far we have seen a whole lot in `j`, and how to combine it with `by` and little of `i`. Let's turn our attention back to `i` in the next vignette `vignette("datatable-keys-fast-subset", package="data.table")` to perform *blazing fast subsets* by *keying data.tables*.
+So far we have seen a whole lot in `j`, and how to combine it with `by` and little of `i`. Let's turn our attention back to `i` in the [next vignette (`vignette("datatable-keys-fast-subset", package="data.table")`)](datatable-keys-fast-subset.html) to perform *blazing fast subsets* by *keying data.tables*.
 
 ***
diff --git a/vignettes/datatable-sd-usage.Rmd b/vignettes/datatable-sd-usage.Rmd
@@ -124,7 +124,7 @@ head(unique(Teams[[fkt[1L]]]))
 Note: 
 
 
-1. The `:=` is an assignment operator to update the `data.table` in place without making a copy. See `vignette("datatable-reference-semantics", package="data.table")` for more.
+1. The `:=` is an assignment operator to update the `data.table` in place without making a copy. See [`vignette("datatable-reference-semantics", package="data.table")`](datatable-reference-semantics.html) for more.
 2. The LHS, `names(.SD)`, indicates which columns we are updating - in this case we update the entire `.SD`.
 3. The RHS, `lapply()`, loops through each column of the `.SD` and converts the column to a factor.
 4. We use the `.SDcols` to only select columns that have pattern of `teamID`.
diff --git a/vignettes/datatable-secondary-indices-and-auto-indexing.Rmd b/vignettes/datatable-secondary-indices-and-auto-indexing.Rmd
@@ -24,13 +24,13 @@ knitr::opts_chunk$set(
 .old.th = setDTthreads(1)
 ```
 
-This vignette assumes that the reader is familiar with data.table's `[i, j, by]` syntax, and how to perform fast key based subsets. If you're not familiar with these concepts, please read the *"Introduction to data.table"*,  *"Reference semantics"* and *"Keys and fast binary search based subset"* vignettes first.
+This vignette assumes that the reader is familiar with data.table's `[i, j, by]` syntax, and how to perform fast key based subsets. If you're not familiar with these concepts, please read the [`vignette("datatable-intro", package="data.table")`](datatable-intro.html), [`vignette("datatable-reference-semantics", package="data.table")`](datatable-reference-semantics.html), and [`vignette("datatable-keys-fast-subset", package="data.table")`](datatable-keys-fast-subset.html) vignettes first.
 
 ***
 
 ## Data {#data}
 
-We will use the same `flights` data as in the `vignette("datatable-intro", package="data.table")`.
+We will use the same `flights` data as in the [`vignette("datatable-intro", package="data.table")`](datatable-intro.html) vignette.
 
 ```{r echo = FALSE}
 options(width = 100L)
@@ -193,7 +193,7 @@ flights[.("JFK", "LAX"), on = c("origin", "dest")][1:5]
 
 ### b) Select in `j`
 
-All the operations we will discuss below are no different to the ones we already saw in the `vignette("datatable-keys-fast-subset", package="data.table")`. Except we'll be using the `on` argument instead of setting keys.
+All the operations we will discuss below are no different to the ones we already saw in the [`vignette("datatable-keys-fast-subset", package="data.table")`](datatable-keys-fast-subset.html) vignette. Except we'll be using the `on` argument instead of setting keys.
 
 #### -- Return `arr_delay` column alone as a data.table corresponding to `origin = "LGA"` and `dest = "TPA"`
 
@@ -219,7 +219,7 @@ flights[.("LGA", "TPA"), max(arr_delay), on = c("origin", "dest")]
 
 ### e) *sub-assign* by reference using `:=` in `j`
 
-We have seen this example already in the `vignette("datatable-reference-semantics", package="data.table")` and the `vignette("datatable-keys-fast-subset", package="data.table")`. Let's take a look at all the `hours` available in the `flights` *data.table*:
+We have seen this example already in the vignettes [`vignette("datatable-reference-semantics", package="data.table")`](datatable-reference-semantics.html) and [`vignette("datatable-keys-fast-subset", package="data.table")`](datatable-keys-fast-subset.html). Let's take a look at all the `hours` available in the `flights` *data.table*:
 
 ```{r}
 # get all 'hours' in flights
@@ -253,7 +253,7 @@ head(ans)
 
 ### g) The *mult* argument
 
-The other arguments including `mult` work exactly the same way as we saw in the `vignette("datatable-keys-fast-subset", package="data.table")`. The default value for `mult` is "all". We can choose, instead only the "first" or "last" matching rows should be returned.
+The other arguments including `mult` work exactly the same way as we saw in the [`vignette("datatable-keys-fast-subset", package="data.table")`](datatable-keys-fast-subset.html) vignette. The default value for `mult` is "all". We can choose, instead only the "first" or "last" matching rows should be returned.
 
 #### -- Subset only the first matching row where `dest` matches *"BOS"* and *"DAY"*
 
@@ -327,7 +327,7 @@ system.time(dt[x %in% 1989:2012])
 
 In recent version we extended auto indexing to expressions involving more than one column (combined with `&` operator). In the future, we plan to extend binary search to work with more binary operators like `<`, `<=`, `>` and `>=`.
 
-We will discuss fast *subsets* using keys and secondary indices to *joins* in the next vignette, `vignette("datatable-joins", package="data.table")`.
+We will discuss fast *subsets* using keys and secondary indices to *joins* in the [next vignette (`vignette("datatable-joins", package="data.table")`)](datatable-joins.html).
 
 ***