Skip to content

Commit 4d17949

Browse files
committed
Merge branch 'macroRemoval' of https://github.com/Rdatatable/data.table into macroRemoval
2 parents 2a4cbe4 + 4cf8743 commit 4d17949

File tree

6 files changed

+38
-1
lines changed

6 files changed

+38
-1
lines changed

NEWS.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -38,6 +38,8 @@
3838

3939
9. Joins to extended data.frames, e.g. `x[i, col := x.col1 + i.col2]` where `i` is a `tbl`, can use the `x.` and `i.` prefix forms, [#6998](https://github.com/Rdatatable/data.table/issues/6998). Thanks @MichaelChirico for the bug and PR.
4040

41+
10. On a heavily loaded machine, a `forder` thread could try to perform a zero-length copy from a null pointer, which was de-facto harmless but is against the C standard and was caught by additional CRAN checks, [#7051](https://github.com/Rdatatable/data.table/issues/7051). Thanks to @helske for the report and @aitap for the PR.
42+
4143
### NOTES
4244

4345
1. Continued work to remove non-API C functions, [#6180](https://github.com/Rdatatable/data.table/issues/6180). Thanks Ivan Krylov for the PRs and for writing a clear and concise guide about the R API: https://aitap.codeberg.page/R-api/.

man/data.table.Rd

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -174,7 +174,7 @@ data.table(\dots, keep.rownames=FALSE, check.names=FALSE, key=NULL, stringsAsFac
174174
\item For convenience during interactive scenarios, it is also possible to use \code{.()} syntax as \code{X[Y, on=.(a, b)]}.
175175
\item From v1.9.8, (non-equi) joins using binary operators \code{>=, >, <=, <} are also possible, e.g., \code{X[Y, on=c("x>=a", "y<=b")]}, or for interactive use as \code{X[Y, on=.(x>=a, y<=b)]}.
176176
}
177-
See examples as well as \href{../doc/datatable-secondary-indices-and-auto-indexing.html}{\code{vignette("datatable-secondary-indices-and-auto-indexing")}}.
177+
Note that providing \code{on} is \emph{required} for \code{X[Y]} joins when \code{X} is unkeyed. See examples as well as \href{../doc/datatable-secondary-indices-and-auto-indexing.html}{\code{vignette("datatable-secondary-indices-and-auto-indexing")}}.
178178
}
179179
180180
\item{env}{ List or an environment, passed to \code{\link{substitute2}} for substitution of parameters in \code{i}, \code{j} and \code{by} (or \code{keyby}). Use \code{verbose} to preview constructed expressions. For more details see \href{../doc/datatable-programming.html}{\code{vignette("datatable-programming")}}. }
@@ -298,7 +298,9 @@ DT[, sum(v), by=x][order(x)] # same but by chaining expressions together
298298

299299
# fast ad hoc row subsets (subsets as joins)
300300
DT["a", on="x"] # same as x == "a" but uses binary search (fast)
301+
# NB: requires DT to be keyed!
301302
DT["a", on=.(x)] # same, for convenience, no need to quote every column
303+
# NB: works regardless of whether or not DT is keyed!
302304
DT[.("a"), on="x"] # same
303305
DT[x=="a"] # same, single "==" internally optimised to use binary search (fast)
304306
DT[x!="b" | y!=3] # not yet optimized, currently vector scan subset

man/last.Rd

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -18,6 +18,9 @@ of \code{xts::first} is deployed. }
1818
\item{\dots}{ Not applicable for \code{data.table} first/last. Any arguments here
1919
are passed through to \code{xts}'s first/last. }
2020
}
21+
\details{
22+
Note: For zero-length vectors, \code{first(x)} and \code{last(x)} mimic \code{head(x, 1)} and \code{tail(x, 1)} by returning an empty vector instead of \code{NA}. However, unlike \code{head()}/\code{tail()} and base R subsetting (e.g., \code{x[1]}), they do not preserve attributes like names.
23+
}
2124
\value{
2225
If no other arguments are supplied it depends on the type of \code{x}. The first/last item
2326
of a vector or list. The first/last row of a \code{data.frame} or \code{data.table}.

man/setkey.Rd

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -74,6 +74,14 @@ The sort is \emph{stable}; i.e., the order of ties (if any) is preserved.
7474
For character vectors, \code{data.table} takes advantage of R's internal global string cache, also exported as \code{\link{chorder}}.
7575
}
7676

77+
\section{Keys vs. Indices}{
78+
Setting a key (with \code{setkey}) and an index (with \code{setindex}) are similar, but have very important distinctions.
79+
80+
Setting a key physically reorders the data in RAM.
81+
82+
Setting an index computes the sort order, but instead of applying the reordering, simply \emph{stores} this computed ordering. That means that multiple indices can coexist, and that the original row order is preserved.
83+
}
84+
7785
\section{Good practice}{
7886
In general, it's good practice to use column names rather than numbers. This is
7987
why \code{setkey} and \code{setkeyv} only accept column names.

src/forder.c

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -128,6 +128,8 @@ static void flush(void) {
128128
if (!retgrp) return;
129129
int me = omp_get_thread_num();
130130
int n = gs_thread_n[me];
131+
// normally doesn't happen, can be encountered under heavy load, #7051
132+
if (!n) return; // # nocov
131133
int newn = gs_n + n;
132134
if (gs_alloc < newn) {
133135
gs_alloc = (newn < nrow/3) ? (1+(newn*2)/4096)*4096 : nrow;

vignettes/datatable-secondary-indices-and-auto-indexing.Rmd

Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -62,6 +62,26 @@ Secondary indices are similar to `keys` in *data.table*, except for two major di
6262

6363
* There can be more than one secondary index for a data.table (as we will see below).
6464

65+
#### Keyed vs. Indexed Subsetting
66+
67+
While both **keys** and **indices** enable fast binary search subsetting, they differ significantly in usage:
68+
69+
**Keyed subsetting** (implicit column matching)
70+
71+
```{r keyed_operations}
72+
DT = data.table(a = c(TRUE, FALSE), b = 1:2)
73+
setkey(DT, a) # Set key, reordering DT
74+
DT[.(TRUE)] # 'on' is optional; if omitted, the key is used
75+
```
76+
77+
**Indexed subsetting** (explicit column specification)
78+
79+
```{r unkeyed_operations}
80+
DT = data.table(a = c(TRUE, FALSE), b = 1:2)
81+
setindex(DT, a) # Set index only (no reorder)
82+
DT[.(TRUE), on = "a"] # 'on' is required
83+
```
84+
6585
### b) Set and get secondary indices
6686

6787
#### -- How can we set the column `origin` as a secondary index in the *data.table* `flights`?

0 commit comments

Comments
 (0)