Skip to content

Commit ee3edfb

Browse files
committed
updated docs
1 parent 5bbc4d5 commit ee3edfb

File tree

2 files changed

+66
-0
lines changed

2 files changed

+66
-0
lines changed

man/setkey.Rd

Lines changed: 23 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -39,6 +39,20 @@ ordering vector of the dataset's rows according to the provided columns. This or
3939
is stored as an attribute of the \code{data.table} and the dataset retains the original order
4040
of rows in memory. See the \href{../doc/datatable-secondary-indices-and-auto-indexing.html}{\code{vignette("datatable-secondary-indices-and-auto-indexing")}} for more details.
4141

42+
\subsection{Key vs. Index Subsetting}{
43+
When using \code{setkey}:
44+
\itemize{
45+
\item Subsetting can omit \code{on} (e.g., \code{DT[.(value)]})
46+
\item Data is physically reordered in RAM
47+
}
48+
When using \code{setindex}:
49+
\itemize{
50+
\item Must specify \code{on} (e.g., \code{DT[.(value), on = "col"]})
51+
\item Multiple indices can coexist via \code{setindexv(x, list(cols))}
52+
\item Original row order is preserved
53+
}
54+
}
55+
4256
\code{key} returns the \code{data.table}'s key if it exists; \code{NULL} if none exists.
4357
4458
\code{haskey} returns \code{TRUE}/\code{FALSE} if the \code{data.table} has a key.
@@ -142,6 +156,15 @@ setindex(DT, B)
142156
indices(DT) # get indices single vector
143157
indices(DT, vectors = TRUE) # get indices list
144158

159+
# Keyed subsetting (no 'on' needed)
160+
setkey(DT, B)
161+
DT[.("e")] # Returns row where B = "e"
162+
163+
# Indexed subsetting (requires 'on')
164+
setindex(DT, B)
165+
DT[.("e"), on = "B"] # Works
166+
# DT[.("e")] # Would error (missing 'on')
167+
145168
# Setting multiple indices at once
146169
DT = data.table(A = 5:1, B = letters[5:1], C = 10:6)
147170
setindexv(DT, list(c("A", "B"), c("B", "C")))

vignettes/datatable-secondary-indices-and-auto-indexing.Rmd

Lines changed: 43 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -62,6 +62,49 @@ Secondary indices are similar to `keys` in *data.table*, except for two major di
6262

6363
* There can be more than one secondary index for a data.table (as we will see below).
6464

65+
#### 1.1 Keyed vs. Indexed Subsetting
66+
67+
While both **keys** and **indices** enable fast binary search subsetting, they differ critically in usage:
68+
69+
**Keyed subsetting** (implicit column matching)
70+
```{r}
71+
DT <- data.table(a = c(TRUE, FALSE), b = 1:2)
72+
setkey(DT, a) # Set key
73+
DT[.(TRUE)] # No 'on' needed - key is used
74+
```
75+
76+
**Indexed subsetting** (explicit column specification)
77+
78+
```{r}
79+
DT <- data.table(a = c(TRUE, FALSE), b = 1:2)
80+
setindex(DT, a) # Set index only (no reorder)
81+
DT[.(TRUE), on = "a"] # Must specify 'on'
82+
```
83+
84+
Always use `on`with indices to avoid errors. Keys establish permanent implicit matching, while indices require explicit direction for each operation.
85+
86+
#### **Hierarchy of Lookup Operations**
87+
88+
When `on` is specified, `data.table` uses this lookup order:
89+
90+
- Existing keys matching `on` columns
91+
92+
- Pre-built secondary indices matching `on` columns
93+
94+
- Temporary auto-generated index (if none exist)
95+
96+
**Example**: Error When Omitting on with Indices
97+
98+
```{r}
99+
DT <- data.table(a = c(TRUE, FALSE), b = 1:2)
100+
setindex(DT, a)
101+
# DT[.(TRUE)] # Uncommenting this line causes: Error - must specify 'on'
102+
```
103+
104+
- Use `DT[.(value)]` when the table is keyed by the relevant column(s).
105+
106+
- Use `DT[.(value), on = "col"]` when using secondary indices or when no key is set.
107+
65108
### b) Set and get secondary indices
66109

67110
#### -- How can we set the column `origin` as a secondary index in the *data.table* `flights`?

0 commit comments

Comments
 (0)