You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: vignettes/datatable-programming.Rmd
+41-39Lines changed: 41 additions & 39 deletions
Original file line number
Diff line number
Diff line change
@@ -46,45 +46,6 @@ subset(iris, Species == "setosa")
46
46
47
47
Here, `subset` takes the second argument and evaluates it within the scope of the `data.frame` given as its first argument. This removes the need for variable repetition, making it less prone to errors, and makes the code more readable.
48
48
49
-
### Dynamic Grouping and Naming Syntax
50
-
51
-
Besides the programmatic use with `env`, `data.table` offers some powerful and concise syntax for interactive use, especially in the `by` argument.
52
-
53
-
```{r by_syntax_setup_concise}
54
-
d = data.table(x = 1:4, y = 2:5)
55
-
```
56
-
57
-
#### Grouping by Expressions in `by`
58
-
59
-
For convenience, `data.table` allows you to group by a single expression directly without `list()` or `.()`. To name the resulting grouping column, you have two options:
60
-
61
-
```{r by_syntax_naming_concise}
62
-
# 1. The canonical way: a named list (required for multiple expressions)
63
-
d[, sum(y), by = .(grp = x %% 2)]
64
-
65
-
# 2. A concise shortcut: parentheses (for a single expression)
66
-
d[, sum(y), by = (grp = x %% 2)]
67
-
```
68
-
69
-
The `(grp = ...)` syntax is a base R feature that `data.table` leverages to see the intended column name.
70
-
71
-
#### Important Contrast: Naming in `j` vs. `by`
72
-
73
-
This parentheses shortcut for naming does **not** work in `j`. In `j`, you must use the canonical `.(new_name = ...)` syntax to create a named column.
74
-
75
-
```{r by_syntax_j_concise}
76
-
# Correct way to name a new column in `j`
77
-
d[, .(sum_y = sum(y)), by = .(grp = x %% 2)]
78
-
79
-
# This will not create a column named 'sum_y'
80
-
d[, (sum_y = sum(y)), by = .(grp = x %% 2)]
81
-
```
82
-
In the second case, the parentheses cause base R to evaluate the expression, returning only the final value. `data.table` then gives this unnamed result a default column name (`V1`).
83
-
84
-
**Takeaway:**
85
-
* In `by`, `(name = expr)` is a valid shortcut for `.(name = expr)`.
86
-
* In `j`, you must always use `.(name = expr)` to create a named column.
87
-
88
49
## Problem description
89
50
90
51
The problem with this kind of interface is that we cannot easily parameterize the code that uses it. This is because the expressions passed to those functions are substituted before being evaluated.
@@ -160,6 +121,47 @@ There are third party packages that can achieve what base R computing on the lan
160
121
161
122
Though these can be helpful, we will be discussing a `data.table`-unique approach here.
162
123
124
+
### Dynamic Grouping and Naming Syntax
125
+
126
+
Besides the programmatic use with `env`, data.table offers some powerful and concise syntax for interactive use, especially in the `by` argument.
127
+
128
+
```{r}
129
+
d = data.table(x = 1:4, y = 2:5)
130
+
```
131
+
132
+
#### Grouping by Expressions in `by`
133
+
134
+
For convenience, data.table allows you to group by a single expression directly without `list()` or `.()`. To name the resulting grouping column, you have two options:
135
+
136
+
```{r}
137
+
# 1. The canonical way: a named list (required for multiple expressions)
138
+
d[, sum(y), by = .(grp = x %% 2)]
139
+
140
+
# 2. A concise shortcut: parentheses (for a single expression)
141
+
d[, sum(y), by = (grp = x %% 2)]
142
+
```
143
+
144
+
The `(grp = ...)` syntax is a base R feature that data.table leverages to see the intended column name.
145
+
146
+
#### Important Contrast: Naming in `j` vs. `by`
147
+
148
+
This parentheses shortcut for naming does **not** work in `j`. In `j`, you must use the canonical `. (new_name = ...)` syntax to create a named column.
149
+
150
+
```{r}
151
+
# Correct way to name a new column in `j`
152
+
d[, .(sum_y = sum(y)), by = .(grp = x %% 2)]
153
+
154
+
# This will not create a column named 'sum_y'
155
+
d[, (sum_y = sum(y)), by = .(grp = x %% 2)]
156
+
```
157
+
158
+
In the second case, the parentheses cause base R to evaluate the expression, returning only the final value. data.table then gives this unnamed result a default column name (`V1`).
159
+
160
+
**Takeaway:**
161
+
* In `by`, `(name = expr)` is a valid shortcut for `. (name = expr)`.
162
+
* In `j`, you must always use `. (name = expr)` to create a named column.
163
+
164
+
163
165
## Programming on data.table
164
166
165
167
Now that we've established the proper way to parameterize code that uses *lazy evaluation*, we can move on to the main subject of this vignette, *programming on data.table*.
0 commit comments