Skip to content

Commit aa0ecc0

Browse files
committed
updated
1 parent ee3edfb commit aa0ecc0

File tree

2 files changed

+28
-32
lines changed

2 files changed

+28
-32
lines changed

man/setkey.Rd

Lines changed: 22 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -39,20 +39,6 @@ ordering vector of the dataset's rows according to the provided columns. This or
3939
is stored as an attribute of the \code{data.table} and the dataset retains the original order
4040
of rows in memory. See the \href{../doc/datatable-secondary-indices-and-auto-indexing.html}{\code{vignette("datatable-secondary-indices-and-auto-indexing")}} for more details.
4141

42-
\subsection{Key vs. Index Subsetting}{
43-
When using \code{setkey}:
44-
\itemize{
45-
\item Subsetting can omit \code{on} (e.g., \code{DT[.(value)]})
46-
\item Data is physically reordered in RAM
47-
}
48-
When using \code{setindex}:
49-
\itemize{
50-
\item Must specify \code{on} (e.g., \code{DT[.(value), on = "col"]})
51-
\item Multiple indices can coexist via \code{setindexv(x, list(cols))}
52-
\item Original row order is preserved
53-
}
54-
}
55-
5642
\code{key} returns the \code{data.table}'s key if it exists; \code{NULL} if none exists.
5743
5844
\code{haskey} returns \code{TRUE}/\code{FALSE} if the \code{data.table} has a key.
@@ -88,6 +74,20 @@ The sort is \emph{stable}; i.e., the order of ties (if any) is preserved.
8874
For character vectors, \code{data.table} takes advantage of R's internal global string cache, also exported as \code{\link{chorder}}.
8975
}
9076

77+
\section{Key vs. Index Subsetting}{
78+
When using \code{setkey}:
79+
\itemize{
80+
\item Subsetting can omit \code{on} (e.g., \code{DT[.(value)]})
81+
\item Data is physically reordered in RAM
82+
}
83+
When using \code{setindex}:
84+
\itemize{
85+
\item Must specify \code{on} (e.g., \code{DT[.(value), on = "col"]})
86+
\item Multiple indices can coexist via \code{setindexv(x, list(cols))}
87+
\item Original row order is preserved
88+
}
89+
}
90+
9191
\section{Good practice}{
9292
In general, it's good practice to use column names rather than numbers. This is
9393
why \code{setkey} and \code{setkeyv} only accept column names.
@@ -157,14 +157,22 @@ indices(DT) # get indices single vector
157157
indices(DT, vectors = TRUE) # get indices list
158158

159159
# Keyed subsetting (no 'on' needed)
160+
DT = data.table(A=5:1,B=letters[5:1])
160161
setkey(DT, B)
161162
DT[.("e")] # Returns row where B = "e"
162163

163164
# Indexed subsetting (requires 'on')
165+
DT = data.table(A=5:1,B=letters[5:1])
164166
setindex(DT, B)
165167
DT[.("e"), on = "B"] # Works
166168
# DT[.("e")] # Would error (missing 'on')
167169

170+
DT <- data.table(a = c(1,2,1,2), b = c("x", "x", "y", "y"), val = 1:4)
171+
# Set an index on a single column
172+
setindex(DT, a)
173+
# Set multiple indices at once using setindexv
174+
setindexv(DT, list("a", c("a", "b")))
175+
168176
# Setting multiple indices at once
169177
DT = data.table(A = 5:1, B = letters[5:1], C = 10:6)
170178
setindexv(DT, list(c("A", "B"), c("B", "C")))

vignettes/datatable-secondary-indices-and-auto-indexing.Rmd

Lines changed: 6 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -64,46 +64,34 @@ Secondary indices are similar to `keys` in *data.table*, except for two major di
6464

6565
#### 1.1 Keyed vs. Indexed Subsetting
6666

67-
While both **keys** and **indices** enable fast binary search subsetting, they differ critically in usage:
67+
While both **keys** and **indices** enable fast binary search subsetting, they differ significantly in usage:
6868

6969
**Keyed subsetting** (implicit column matching)
7070
```{r}
71-
DT <- data.table(a = c(TRUE, FALSE), b = 1:2)
71+
DT = data.table(a = c(TRUE, FALSE), b = 1:2)
7272
setkey(DT, a) # Set key
7373
DT[.(TRUE)] # No 'on' needed - key is used
7474
```
7575

7676
**Indexed subsetting** (explicit column specification)
7777

7878
```{r}
79-
DT <- data.table(a = c(TRUE, FALSE), b = 1:2)
79+
DT = data.table(a = c(TRUE, FALSE), b = 1:2)
8080
setindex(DT, a) # Set index only (no reorder)
8181
DT[.(TRUE), on = "a"] # Must specify 'on'
8282
```
8383

84-
Always use `on`with indices to avoid errors. Keys establish permanent implicit matching, while indices require explicit direction for each operation.
84+
For join-like operations such as `DT[.(values)]` on an indexed table, the `on=` argument is essential to specify the join column(s). `setindex` does not establish the implicit matching that setkey does; thus, this `explicit` direction is needed.
8585

86-
#### **Hierarchy of Lookup Operations**
86+
#### Hierarchy of Lookup Operations
8787

8888
When `on` is specified, `data.table` uses this lookup order:
8989

9090
- Existing keys matching `on` columns
9191

9292
- Pre-built secondary indices matching `on` columns
9393

94-
- Temporary auto-generated index (if none exist)
95-
96-
**Example**: Error When Omitting on with Indices
97-
98-
```{r}
99-
DT <- data.table(a = c(TRUE, FALSE), b = 1:2)
100-
setindex(DT, a)
101-
# DT[.(TRUE)] # Uncommenting this line causes: Error - must specify 'on'
102-
```
103-
104-
- Use `DT[.(value)]` when the table is keyed by the relevant column(s).
105-
106-
- Use `DT[.(value), on = "col"]` when using secondary indices or when no key is set.
94+
- Auto-generated secondary index (created if none exist; its optimized ordering is used internally for efficient joins)
10795

10896
### b) Set and get secondary indices
10997

0 commit comments

Comments
 (0)