Skip to content

Commit c76a16c

Browse files
committed
clarify env substitution of functions
1 parent 3a9c185 commit c76a16c

File tree

2 files changed

+30
-17
lines changed

2 files changed

+30
-17
lines changed

man/data.table.Rd

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -420,7 +420,9 @@ DT[, list(MySum=sum(v),
420420
MyMax=max(v)),
421421
by=.(x, y\%\%2)] # by 2 expressions
422422

423-
DT[, .(f=f(a)), by=grp_var, env=list(f="sum", grp_var="x")] # programmatic query with env=
423+
# programmatic query with env=
424+
DT[, .(funvar = fun(var)), by=grp_var,
425+
env = list(fun="sum", var="a", funvar="sum_a_by_x", grp_var="x")]
424426

425427
DT[, .(a = .(a), b = .(b)), by=x] # list columns
426428
DT[, .(seq = min(a):max(b)), by=x] # j is not limited to just aggregations

vignettes/datatable-programming.Rmd

Lines changed: 27 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -156,22 +156,6 @@ substitute2(
156156

157157
We can see in the output that both the functions names, as well as the names of the variables passed to those functions, have been replaced. We used `substitute2` for convenience. In this simple case, base R's `substitute` could have been used as well, though it would've required usage of `lapply(env, as.name)`.
158158

159-
Note that `outer = sqrt`, i.e., providing the _symbol_ instead of the function as a _character string_ is supported, and sometimes required (e.g. for namespace-qualified names like `pkg::foo`), but that there are some subtle differences between the two approaches. To get a hint of this, observe:
160-
161-
```{r hypotenuse_substitute2_symbol}
162-
substitute2(
163-
outer(inner(var1) + inner(var2)),
164-
env = list(
165-
outer = sqrt,
166-
inner = "square",
167-
var1 = "a",
168-
var2 = "b"
169-
)
170-
)
171-
```
172-
173-
The implications of this are beyond the scope of this vignette. We only note here that similar considerations apply for whether to specify `what=` in `do.call()` as a string or a symbol.
174-
175159
Now, to use substitution inside `[.data.table`, we don't need to call the `substitute2` function. As it is now being used internally, all we have to do is to provide `env` argument, the same way as we've provided it to the `substitute2` function in the example above. Substitution can be applied to the `i`, `j` and `by` (or `keyby`) arguments of the `[.data.table` method. Note that setting the `verbose` argument to `TRUE` can be used to print expressions after substitution is applied. This is very useful for debugging.
176160

177161
Let's use the `iris` data set as a demonstration. Just as an example, let's pretend we want to compute the `Sepal.Hypotenuse`, treating the sepal width and length as if they were legs of a right triangle.
@@ -220,6 +204,33 @@ DT[filter_col %in% filter_val,
220204
)]
221205
```
222206

207+
### Substitute functions
208+
209+
A tiny clarification may be useful on how we can substitute a function name in an expression.
210+
Note that providing `outer="sqrt"` (string) vs. `outer=sqrt` (symbol) is very different:
211+
```{r substitute_fun1, result='hide'}
212+
DT[, outer(Sepal.Length), env = list(outer="sqrt"), verbose=TRUE]
213+
#Argument 'j' after substitute: sqrt(Sepal.Length)
214+
## DT[, sqrt(Sepal.Length)]
215+
216+
DT[, outer(Sepal.Length), env = list(outer=sqrt), verbose=TRUE]
217+
#Argument 'j' after substitute: .Primitive("sqrt")(Sepal.Length)
218+
## DT[, .Primitive("sqrt")(Sepal.Length)]
219+
```
220+
And while `.Primitive("sqrt")(Sepal.Length)` still works, it is almost never the desired form.
221+
222+
Even more importantly, if the symbol form is meant to be used, then it can, and should, be used directly in the expression, as there is no need for substitution.
223+
```{r substitute_fun2, result='hide'}
224+
DT[, sqrt(Sepal.Length)]
225+
```
226+
227+
If function name to be substituted needs to be namespace-qualified then namespace and function name can be substituted as any other symbol in the expression:
228+
```{r substitute_fun3, result='hide'}
229+
DT[, ns::fun(Sepal.Length), env = list(ns="base", fun="sqrt"), verbose=TRUE]
230+
#Argument 'j' after substitute: base::sqrt(Sepal.Length)
231+
## DT[, base::sqrt(Sepal.Length)]
232+
```
233+
223234
### Substitute variables and character values
224235

225236
In the above example, we have seen a convenient feature of `substitute2`: automatic conversion from strings into names/symbols. An obvious question arises: what if we actually want to substitute a parameter with a *character* value, so as to have base R `substitute` behaviour. We provide a mechanism to escape automatic conversion by wrapping the elements into base R `I()` call. The `I` function marks an object as *AsIs*, preventing its arguments from character-to-symbol automatic conversion. (Read the `?AsIs` documentation for more details.) If base R behaviour is desired for the whole `env` argument, then it's best to wrap the whole argument in `I()`. Alternatively, each list element can be wrapped in `I()` individually. Let's explore both cases below.

0 commit comments

Comments
 (0)