Skip to content

Commit 546259d

Browse files
authored
Using hyperlinks and vignette() calls for readability (#6617)
1 parent e9a511d commit 546259d

File tree

6 files changed

+28
-28
lines changed

6 files changed

+28
-28
lines changed

vignettes/datatable-intro.Rmd

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -101,7 +101,7 @@ You can also convert existing objects to a `data.table` using `setDT()` (for `da
101101
getOption("datatable.print.nrows")
102102
```
103103
104-
* `data.table` doesn't set or use *row names*, ever. We will see why in the *"Keys and fast binary search based subset"* vignette.
104+
* `data.table` doesn't set or use *row names*, ever. We will see why in the [`vignette("datatable-keys-fast-subset", package="data.table")`](datatable-keys-fast-subset.html) vignette.
105105
106106
### b) General form - in what way is a `data.table` *enhanced*? {#enhanced-1b}
107107
@@ -479,7 +479,7 @@ ans
479479

480480
**Keys:** Actually `keyby` does a little more than *just ordering*. It also *sets a key* after ordering by setting an `attribute` called `sorted`.
481481

482-
We'll learn more about `keys` in the `vignette("datatable-keys-fast-subset", package="data.table")`; for now, all you have to know is that you can use `keyby` to automatically order the result by the columns specified in `by`.
482+
We'll learn more about `keys` in the [`vignette("datatable-keys-fast-subset", package="data.table")`](datatable-keys-fast-subset.html) vignette; for now, all you have to know is that you can use `keyby` to automatically order the result by the columns specified in `by`.
483483

484484
### c) Chaining
485485

@@ -659,7 +659,7 @@ We have seen so far that,
659659

660660
* We can also sort a `data.table` using `order()`, which internally uses data.table's fast order for better performance.
661661

662-
We can do much more in `i` by keying a `data.table`, which allows for blazing fast subsets and joins. We will see this in the `vignette("datatable-keys-fast-subset", package="data.table")` and the `vignette("datatable-joins", package="data.table")`.
662+
We can do much more in `i` by keying a `data.table`, which allows for blazing fast subsets and joins. We will see this in the vignettes [`vignette("datatable-keys-fast-subset", package="data.table")`](datatable-keys-fast-subset.html) and [`vignette("datatable-joins", package="data.table")`](datatable-joins.html).
663663

664664
#### Using `j`:
665665

@@ -693,7 +693,7 @@ We can do much more in `i` by keying a `data.table`, which allows for blazing fa
693693

694694
As long as `j` returns a `list`, each element of the list will become a column in the resulting `data.table`.
695695

696-
We will see how to *add/update/delete* columns *by reference* and how to combine them with `i` and `by` in the next vignette (`vignette("datatable-reference-semantics", package="data.table")`).
696+
We will see how to *add/update/delete* columns *by reference* and how to combine them with `i` and `by` in the [next vignette (`vignette("datatable-reference-semantics", package="data.table")`)](datatable-reference-semantics.html).
697697

698698
***
699699

vignettes/datatable-joins.Rmd

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -26,9 +26,9 @@ In this vignette you will learn how to perform any join operation using resource
2626

2727
It assumes familiarity with the `data.table` syntax. If that is not the case, please read the following vignettes:
2828

29-
- `vignette("datatable-intro", package="data.table")`
30-
- `vignette("datatable-reference-semantics", package="data.table")`
31-
- `vignette("datatable-keys-fast-subset", package="data.table")`
29+
- [`vignette("datatable-intro", package="data.table")`](datatable-intro.html)
30+
- [`vignette("datatable-reference-semantics", package="data.table")`](datatable-reference-semantics.html)
31+
- [`vignette("datatable-keys-fast-subset", package="data.table")`](datatable-keys-fast-subset.html)
3232

3333
***
3434

vignettes/datatable-keys-fast-subset.Rmd

Lines changed: 7 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -24,13 +24,13 @@ knitr::opts_chunk$set(
2424
.old.th = setDTthreads(1)
2525
```
2626

27-
This vignette is aimed at those who are already familiar with *data.table* syntax, its general form, how to subset rows in `i`, select and compute on columns, add/modify/delete columns *by reference* in `j` and group by using `by`. If you're not familiar with these concepts, please read the `vignette("datatable-intro", package="data.table")` and the `vignette("datatable-reference-semantics", package="data.table")` first.
27+
This vignette is aimed at those who are already familiar with *data.table* syntax, its general form, how to subset rows in `i`, select and compute on columns, add/modify/delete columns *by reference* in `j` and group by using `by`. If you're not familiar with these concepts, please read the vignettes [`vignette("datatable-intro", package="data.table")`](datatable-intro.html) and [`vignette("datatable-reference-semantics", package="data.table")`](datatable-reference-semantics.html) first.
2828

2929
***
3030

3131
## Data {#data}
3232

33-
We will use the same `flights` data as in the `vignette("datatable-intro", package="data.table")`.
33+
We will use the same `flights` data as in the [`vignette("datatable-intro", package="data.table")`](datatable-intro.html) vignette.
3434

3535
```{r echo = FALSE}
3636
options(width = 100L)
@@ -58,7 +58,7 @@ In this vignette, we will
5858

5959
### a) What is a *key*?
6060

61-
In the `vignette("datatable-intro", package="data.table")`, we saw how to subset rows in `i` using logical expressions, row numbers and using `order()`. In this section, we will look at another way of subsetting incredibly fast - using *keys*.
61+
In the [`vignette("datatable-intro", package="data.table")`](datatable-intro.html) vignette, we saw how to subset rows in `i` using logical expressions, row numbers and using `order()`. In this section, we will look at another way of subsetting incredibly fast - using *keys*.
6262

6363
But first, let's start by looking at *data.frames*. All *data.frames* have a row names attribute. Consider the *data.frame* `DF` below.
6464

@@ -143,7 +143,7 @@ head(flights)
143143

144144
* Alternatively you can pass a character vector of column names to the function `setkeyv()`. This is particularly useful while designing functions to pass columns to set key on as function arguments.
145145

146-
* Note that we did not have to assign the result back to a variable. This is because like the `:=` function we saw in the `vignette("datatable-reference-semantics", package="data.table")`, `setkey()` and `setkeyv()` modify the input *data.table* *by reference*. They return the result invisibly.
146+
* Note that we did not have to assign the result back to a variable. This is because like the `:=` function we saw in the [`vignette("datatable-reference-semantics", package="data.table")`](datatable-reference-semantics.html) vignette, `setkey()` and `setkeyv()` modify the input *data.table* *by reference*. They return the result invisibly.
147147

148148
* The *data.table* is now reordered (or sorted) by the column we provided - `origin`. Since we reorder by reference, we only require additional memory of one column of length equal to the number of rows in the *data.table*, and is therefore very memory efficient.
149149

@@ -262,7 +262,7 @@ flights[.("LGA", "TPA"), .(arr_delay)]
262262

263263
* The *row indices* corresponding to `origin == "LGA"` and `dest == "TPA"` are obtained using *key based subset*.
264264

265-
* Once we have the row indices, we look at `j` which requires only the `arr_delay` column. So we simply select the column `arr_delay` for those *row indices* in the exact same way as we have seen in `vignette("datatable-intro", package="data.table")`.
265+
* Once we have the row indices, we look at `j` which requires only the `arr_delay` column. So we simply select the column `arr_delay` for those *row indices* in the exact same way as we have seen in the [`vignette("datatable-intro", package="data.table")`](datatable-intro.html) vignette.
266266

267267
* We could have returned the result by using `with = FALSE` as well.
268268

@@ -290,7 +290,7 @@ flights[.("LGA", "TPA"), max(arr_delay)]
290290

291291
### d) *sub-assign* by reference using `:=` in `j`
292292

293-
We have seen this example already in the `vignette("datatable-reference-semantics", package="data.table")`. Let's take a look at all the `hours` available in the `flights` *data.table*:
293+
We have seen this example already in the [`vignette("datatable-reference-semantics", package="data.table")`](datatable-reference-semantics.html) vignette. Let's take a look at all the `hours` available in the `flights` *data.table*:
294294

295295
```{r}
296296
# get all 'hours' in flights
@@ -498,7 +498,7 @@ In this vignette, we have learnt another method to subset rows in `i` by keying
498498

499499
* combine key based subsets with `j` and `by`. Note that the `j` and `by` operations are exactly the same as before.
500500

501-
Key based subsets are **incredibly fast** and are particularly useful when the task involves *repeated subsetting*. But it may not be always desirable to set key and physically reorder the *data.table*. In the next `vignette("datatable-secondary-indices-and-auto-indexing", package="data.table")`, we will address this using a *new* feature -- *secondary indexes*.
501+
Key based subsets are **incredibly fast** and are particularly useful when the task involves *repeated subsetting*. But it may not be always desirable to set key and physically reorder the *data.table*. In the next [next vignette (`vignette("datatable-secondary-indices-and-auto-indexing", package="data.table")`)](datatable-secondary-indices-and-auto-indexing.html), we will address this using a *new* feature -- *secondary indexes*.
502502

503503

504504
```{r, echo=FALSE}

vignettes/datatable-reference-semantics.Rmd

Lines changed: 7 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -23,13 +23,13 @@ knitr::opts_chunk$set(
2323
collapse = TRUE)
2424
.old.th = setDTthreads(1)
2525
```
26-
This vignette discusses *data.table*'s reference semantics which allows to *add/update/delete* columns of a *data.table by reference*, and also combine them with `i` and `by`. It is aimed at those who are already familiar with *data.table* syntax, its general form, how to subset rows in `i`, select and compute on columns, and perform aggregations by group. If you're not familiar with these concepts, please read the `vignette("datatable-intro", package="data.table")` first.
26+
This vignette discusses *data.table*'s reference semantics which allows to *add/update/delete* columns of a *data.table by reference*, and also combine them with `i` and `by`. It is aimed at those who are already familiar with *data.table* syntax, its general form, how to subset rows in `i`, select and compute on columns, and perform aggregations by group. If you're not familiar with these concepts, please read the [`vignette("datatable-intro", package="data.table")`](datatable-intro.html) vignette first.
2727

2828
***
2929

3030
## Data {#data}
3131

32-
We will use the same `flights` data as in the `vignette("datatable-intro", package="data.table")`.
32+
We will use the same `flights` data as in the [`vignette("datatable-intro", package="data.table")`](datatable-intro.html) vignette.
3333

3434
```{r echo = FALSE}
3535
options(width = 100L)
@@ -169,7 +169,7 @@ We see that there are totally `25` unique values in the data. Both *0* and *24*
169169
flights[hour == 24L, hour := 0L]
170170
```
171171

172-
* We can use `i` along with `:=` in `j` the very same way as we have already seen in the `vignette("datatable-intro", package="data.table")`.
172+
* We can use `i` along with `:=` in `j` the very same way as we have already seen in the [`vignette("datatable-intro", package="data.table")`](datatable-intro.html) vignette.
173173

174174
* Column `hour` is replaced with `0` only on those *row indices* where the condition `hour == 24L` specified in `i` evaluates to `TRUE`.
175175

@@ -234,7 +234,7 @@ head(flights)
234234

235235
* We provide the columns to group by the same way as shown in the *Introduction to data.table* vignette. For each group, `max(speed)` is computed, which returns a single value. That value is recycled to fit the length of the group. Once again, no copies are being made at all. `flights` *data.table* is modified *in-place*.
236236

237-
* We could have also provided `by` with a *character vector* as we saw in the `vignette("datatable-intro", package="data.table")`, e.g., `by = c("origin", "dest")`.
237+
* We could have also provided `by` with a *character vector* as we saw in the [`vignette("datatable-intro", package="data.table")`](datatable-intro.html) vignette, e.g., `by = c("origin", "dest")`.
238238

239239
#
240240

@@ -253,7 +253,7 @@ head(flights)
253253

254254
* Note that since we allow assignment by reference without quoting column names when there is only one column as explained in [Section 2c](#delete-convenience), we can not do `out_cols := lapply(.SD, max)`. That would result in adding one new column named `out_cols`. Instead we should do either `c(out_cols)` or simply `(out_cols)`. Wrapping the variable name with `(` is enough to differentiate between the two cases.
255255

256-
* The `LHS := RHS` form allows us to operate on multiple columns. In the RHS, to compute the `max` on columns specified in `.SDcols`, we make use of the base function `lapply()` along with `.SD` in the same way as we have seen before in the `vignette("datatable-intro", package="data.table")`. It returns a list of two elements, containing the maximum value corresponding to `dep_delay` and `arr_delay` for each group.
256+
* The `LHS := RHS` form allows us to operate on multiple columns. In the RHS, to compute the `max` on columns specified in `.SDcols`, we make use of the base function `lapply()` along with `.SD` in the same way as we have seen before in the [`vignette("datatable-intro", package="data.table")`](datatable-intro.html) vignette. It returns a list of two elements, containing the maximum value corresponding to `dep_delay` and `arr_delay` for each group.
257257

258258
#
259259
Before moving on to the next section, let's clean up the newly created columns `speed`, `max_speed`, `max_dep_delay` and `max_arr_delay`.
@@ -369,7 +369,7 @@ However we could improve this functionality further by *shallow* copying instead
369369
370370
* It is used to *add/update/delete* columns by reference.
371371
372-
* We have also seen how to use `:=` along with `i` and `by` the same way as we have seen in the `vignette("datatable-intro", package="data.table")`. We can in the same way use `keyby`, chain operations together, and pass expressions to `by` as well all in the same way. The syntax is *consistent*.
372+
* We have also seen how to use `:=` along with `i` and `by` the same way as we have seen in the [`vignette("datatable-intro", package="data.table")`](datatable-intro.html) vignette. We can in the same way use `keyby`, chain operations together, and pass expressions to `by` as well all in the same way. The syntax is *consistent*.
373373
374374
* We can use `:=` for its side effect or use `copy()` to not modify the original object while updating by reference.
375375
@@ -379,6 +379,6 @@ setDTthreads(.old.th)
379379

380380
#
381381

382-
So far we have seen a whole lot in `j`, and how to combine it with `by` and little of `i`. Let's turn our attention back to `i` in the next vignette `vignette("datatable-keys-fast-subset", package="data.table")` to perform *blazing fast subsets* by *keying data.tables*.
382+
So far we have seen a whole lot in `j`, and how to combine it with `by` and little of `i`. Let's turn our attention back to `i` in the [next vignette (`vignette("datatable-keys-fast-subset", package="data.table")`)](datatable-keys-fast-subset.html) to perform *blazing fast subsets* by *keying data.tables*.
383383

384384
***

vignettes/datatable-sd-usage.Rmd

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -124,7 +124,7 @@ head(unique(Teams[[fkt[1L]]]))
124124
Note:
125125

126126

127-
1. The `:=` is an assignment operator to update the `data.table` in place without making a copy. See `vignette("datatable-reference-semantics", package="data.table")` for more.
127+
1. The `:=` is an assignment operator to update the `data.table` in place without making a copy. See [`vignette("datatable-reference-semantics", package="data.table")`](datatable-reference-semantics.html) for more.
128128
2. The LHS, `names(.SD)`, indicates which columns we are updating - in this case we update the entire `.SD`.
129129
3. The RHS, `lapply()`, loops through each column of the `.SD` and converts the column to a factor.
130130
4. We use the `.SDcols` to only select columns that have pattern of `teamID`.

vignettes/datatable-secondary-indices-and-auto-indexing.Rmd

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -24,13 +24,13 @@ knitr::opts_chunk$set(
2424
.old.th = setDTthreads(1)
2525
```
2626

27-
This vignette assumes that the reader is familiar with data.table's `[i, j, by]` syntax, and how to perform fast key based subsets. If you're not familiar with these concepts, please read the *"Introduction to data.table"*, *"Reference semantics"* and *"Keys and fast binary search based subset"* vignettes first.
27+
This vignette assumes that the reader is familiar with data.table's `[i, j, by]` syntax, and how to perform fast key based subsets. If you're not familiar with these concepts, please read the [`vignette("datatable-intro", package="data.table")`](datatable-intro.html), [`vignette("datatable-reference-semantics", package="data.table")`](datatable-reference-semantics.html), and [`vignette("datatable-keys-fast-subset", package="data.table")`](datatable-keys-fast-subset.html) vignettes first.
2828

2929
***
3030

3131
## Data {#data}
3232

33-
We will use the same `flights` data as in the `vignette("datatable-intro", package="data.table")`.
33+
We will use the same `flights` data as in the [`vignette("datatable-intro", package="data.table")`](datatable-intro.html) vignette.
3434

3535
```{r echo = FALSE}
3636
options(width = 100L)
@@ -193,7 +193,7 @@ flights[.("JFK", "LAX"), on = c("origin", "dest")][1:5]
193193

194194
### b) Select in `j`
195195

196-
All the operations we will discuss below are no different to the ones we already saw in the `vignette("datatable-keys-fast-subset", package="data.table")`. Except we'll be using the `on` argument instead of setting keys.
196+
All the operations we will discuss below are no different to the ones we already saw in the [`vignette("datatable-keys-fast-subset", package="data.table")`](datatable-keys-fast-subset.html) vignette. Except we'll be using the `on` argument instead of setting keys.
197197

198198
#### -- Return `arr_delay` column alone as a data.table corresponding to `origin = "LGA"` and `dest = "TPA"`
199199

@@ -219,7 +219,7 @@ flights[.("LGA", "TPA"), max(arr_delay), on = c("origin", "dest")]
219219

220220
### e) *sub-assign* by reference using `:=` in `j`
221221

222-
We have seen this example already in the `vignette("datatable-reference-semantics", package="data.table")` and the `vignette("datatable-keys-fast-subset", package="data.table")`. Let's take a look at all the `hours` available in the `flights` *data.table*:
222+
We have seen this example already in the vignettes [`vignette("datatable-reference-semantics", package="data.table")`](datatable-reference-semantics.html) and [`vignette("datatable-keys-fast-subset", package="data.table")`](datatable-keys-fast-subset.html). Let's take a look at all the `hours` available in the `flights` *data.table*:
223223

224224
```{r}
225225
# get all 'hours' in flights
@@ -253,7 +253,7 @@ head(ans)
253253

254254
### g) The *mult* argument
255255

256-
The other arguments including `mult` work exactly the same way as we saw in the `vignette("datatable-keys-fast-subset", package="data.table")`. The default value for `mult` is "all". We can choose, instead only the "first" or "last" matching rows should be returned.
256+
The other arguments including `mult` work exactly the same way as we saw in the [`vignette("datatable-keys-fast-subset", package="data.table")`](datatable-keys-fast-subset.html) vignette. The default value for `mult` is "all". We can choose, instead only the "first" or "last" matching rows should be returned.
257257

258258
#### -- Subset only the first matching row where `dest` matches *"BOS"* and *"DAY"*
259259

@@ -327,7 +327,7 @@ system.time(dt[x %in% 1989:2012])
327327

328328
In recent version we extended auto indexing to expressions involving more than one column (combined with `&` operator). In the future, we plan to extend binary search to work with more binary operators like `<`, `<=`, `>` and `>=`.
329329

330-
We will discuss fast *subsets* using keys and secondary indices to *joins* in the next vignette, `vignette("datatable-joins", package="data.table")`.
330+
We will discuss fast *subsets* using keys and secondary indices to *joins* in the [next vignette (`vignette("datatable-joins", package="data.table")`)](datatable-joins.html).
331331

332332
***
333333

0 commit comments

Comments
 (0)