Skip to content

Commit cf73711

Browse files
authored
V16.3 (#129)
* parsing bugfixes * - Updated documentation on new preferred method of interpolation using `@eval` and `$` - Added documentation on using other macros inside of TidierData macros
1 parent bc2c1b1 commit cf73711

File tree

6 files changed

+134
-5
lines changed

6 files changed

+134
-5
lines changed

NEWS.md

Lines changed: 5 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,10 @@
11
# TidierData.jl updates
22

3-
## v16.3
4-
- Bugfix: `@summary` no longer errors with non-numeric columns. Instead, it only reports non-numeric summary stats on non-numeric columns. Minor changes to summary column names to be lowercase and snakecase.
3+
## v0.16.3 - 2024-12-28
4+
- Bugfix: `@summary` no longer errors with non-numeric columns. Instead, it only reports non-numeric summary stats on non-numeric columns. Minor changes to summary column names to be snake_case.
5+
- Bugfix: Reverted a bug introduced in v0.13.4, which escaped all macros. Now, string macros remain escaped (i.e., keeping it possible to work with Unitful units, e.g. `u"psi"`), but other macros are *not* escaped to allow for those macros to refer to column names within arguments.
6+
- Updated documentation on new preferred method of interpolation using `@eval` and `$`
7+
- Added documentation on using other macros inside of TidierData macros
58

69
## v0.16.2 - 2024-09-03
710
- Bugfix: `@slice_min` and `@slice_max` respect the `n` argument

docs/Project.toml

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,12 +1,13 @@
11
[deps]
2+
BenchmarkTools = "6e4b80f9-dd63-53aa-95a3-0cdb28fa8baf"
23
CSV = "336ed68f-0bac-5ca0-87d4-7b16caf5d00b"
34
Chain = "8be319e6-bccf-4806-a6f7-6fae938471bc"
45
DataFrames = "a93c6f00-e57d-5684-b7b6-d8193f3e46c0"
56
Documenter = "e30172f5-a6a5-5a46-863b-614d45cd2de4"
67
DocumenterMarkdown = "997ab1e6-3595-5248-9280-8efb232c3433"
78
Literate = "98b081ad-f1c9-55d3-8b20-4c87d4299306"
8-
Random = "9a3f8284-a2c9-5f02-9a11-845980a1fd5c"
99
RDatasets = "ce6b1742-4840-55fa-b093-852dadbb1d8b"
10+
Random = "9a3f8284-a2c9-5f02-9a11-845980a1fd5c"
1011
StableRNGs = "860ef19b-820b-49d6-a774-d7a799459cd3"
1112
TidierData = "fe2206b3-d496-4ee9-a338-6a095c4ece80"
12-
BenchmarkTools = "6e4b80f9-dd63-53aa-95a3-0cdb28fa8baf"
13+
Unitful = "1986cc42-f94f-5a68-af5c-568840ba703d"

docs/examples/UserGuide/interpolation.jl

Lines changed: 71 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,74 @@
1+
# ## Native (and preferred) method of interpolating using `@eval` and `$`
2+
3+
# TidierData relies on "non-standard evaluation," which has the side effect of making interpolation slightly more complicated. For example, in the expression `@mutate(df, a = b + 1)`, the `df` refers to a data frame, while `a` and `b` refer to column names within the data frame. What would happen if you created a variable `var` that contains the value `:a`. Would this interpolated expression work?
4+
5+
# ```julia
6+
# using TidierData
7+
# df = DataFrame(a = 1:5, b = 6:10)
8+
#
9+
# var = :a
10+
# @mutate(df, $var = b + 1)
11+
# ```
12+
13+
# Unfortunately, this does not work because it produces `@mutate(df, :a = b + 1)`. Since TidierData uses bare variables (and not symbols) to refer to column names, this will result in an error. However, there is a slight modification we can apply to make this code work: prefixing it with an `@eval`.
14+
15+
using TidierData
16+
df = DataFrame(a = 1:5, b = 6:10, c = 11:15)
17+
18+
var = :a
19+
@eval @mutate(df, $var = b + 1)
20+
21+
# ### Why does adding an `@eval` to the beginning of the expression make interpolation work?
22+
23+
# Adding `@eval` to the beginning causes the interpolated expressions to be evaluated prior to be interpolated. So `$var`, which contains the value `:a`, is evaluated to `a`, which produces the desired expression `@mutate(df, a = b + 1)`. The need of `@eval` here then is primarily because TidierData expects an `a` rather than an `:a` to refer to the column "a" in a data frame.
24+
25+
# ### How can I use `@eval` with a chained set of expressions?
26+
27+
# The answer is simple: use `@eval @chain` instead of `@chain`.
28+
29+
var = :a
30+
31+
@eval @chain df begin
32+
@select($var)
33+
@mutate($var = $var + 1)
34+
end
35+
36+
# If you want to select multiple variables, just use a `...` to splat the vector (or tuple) of variables.
37+
38+
vars = [:a, :b]
39+
40+
@eval @chain df begin
41+
@select($vars...)
42+
end
43+
44+
# The `@eval`-based interpolation syntax is highly flexible in that it should work anywhere you might need it across the entire package.
45+
46+
@eval @chain df begin
47+
@summarize(across($vars..., mean))
48+
end
49+
50+
# ### Does `@eval` work inside of user-defined functions?
51+
52+
# Yes. Here's an example of how you could roll up a new `select_new` function wrapping the `@select` macros.
53+
54+
function select_new(df, columns...)
55+
@eval @select(df, $columns...)
56+
end
57+
58+
select_new(df, :a, :c)
59+
60+
# Yes. Here's another example of an `add_one()` function that adds one to all numeric columns and returns the result in a new set of columns.
61+
62+
function add_one(df)
63+
@eval @mutate(df, across(where(is_number), x -> x .+ 1))
64+
end
65+
66+
add_one(df)
67+
68+
# ## Note: the below documentation is included here only for historical reasons. It will be removed in the future.
69+
70+
# ## Superseded method of interpolating using the `!!` ("bang bang") operator
71+
172
# The `!!` ("bang bang") operator can be used to interpolate values of variables from the parent environment into your code. This operator is borrowed from the R `rlang` package. At some point, we may switch to using native Julia interpolation, but for a variety of reasons that introduce some complexity with native interpolation, we plan to continue to support `!!` interpolation.
273

374
# To interpolate multiple variables, the `rlang` R package uses the `!!!` "triple bang" operator. However, in `TidierData.jl`, the `!!` "bang bang" operator can be used to interpolate either single or multiple values as shown in the examples below.

docs/examples/UserGuide/macros.jl

Lines changed: 47 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,47 @@
1+
# While TidierData relies heavily on macros, you may occasionally find yourself needing to use other macros *within* TidierData macros. This can result in errors that may be hard to interpret. This page is intended to demonstrate two common situations when working with macros: string macros and macros with columns as arguments.
2+
3+
# ## String macros
4+
5+
# You may not think of string macros as macros because we often work with them in the form of prefixes (or suffixes) attached to string literals, such as `prefix"string goes here"`. However, string macros are indeed macros, and thankfully, these should work without any modification.
6+
7+
using TidierData
8+
9+
# Let's add the `psi` unit to the `a` column using the Unitful package.
10+
11+
using Unitful
12+
df = DataFrame(a = 1:5, b = 6:10)
13+
@chain df @mutate(a = a * u"psi")
14+
15+
# It just works!
16+
17+
# ## Macros with columns as arguments
18+
19+
# Occasionally, you may want to work with macros that operate on columns of a data frame. You may want to apply syntax that looks like this:
20+
21+
# ```julia
22+
# using Printf
23+
# df = DataFrame(a = [0.11, 0.21, 0.12, 0.22])
24+
#
25+
# @chain df begin
26+
# @mutate(a_label = @sprintf("Var = %.1f", a))
27+
# end
28+
# ```
29+
30+
# However, this will not work! Why not? Well, there are a two reasons: it is difficult to escape a macro but not its arguments, and macros cannot be vectorized by adding a period to the end (unlike functions).
31+
32+
# The easiest way to fix both issues is to wrap the macro inside of an anonymous function. Thus, `@example_macro(a)` turns into `(x -> example_macro(x))(a)`. What is happening here is that an anonymous function is being defined, and then that function is immediately being called with an argument `a` referring to the column name `a`.
33+
34+
# Here is what the looks like for `@sprintf`:
35+
36+
using Printf
37+
df = DataFrame(a = [0.11, 0.21, 0.12, 0.22])
38+
39+
@chain df begin
40+
@mutate(a_label = (x -> @sprintf("Var = %.1f", x))(a))
41+
end
42+
43+
# This works!
44+
45+
# Even though TidierData cannot dot-vectorize `@sprintf`, it can vectorize the anonymous function in which `@sprintf` is wrapped, converting the expression to `a_label = (x -> @sprintf("Var = %.1f", x)).(a)` before it is run. Notice that TidierData adds a period before the `(a)` to vectorize the function before passing this expression to DataFrames.jl.
46+
47+
# Lastly, one caveat to know is that the above anonymous wrapper function syntax currently only works for **macros** and *not* for functions. It should not be needed for functions, but sharing here for awareness.

docs/mkdocs.yml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -138,6 +138,7 @@ nav:
138138
- "Column names": "examples/generated/UserGuide/column_names.md"
139139
- "Interpolation" : "examples/generated/UserGuide/interpolation.md"
140140
- "Auto-vectorization" : "examples/generated/UserGuide/autovec.md"
141+
- "Using macros": "examples/generated/UserGuide/macros.md"
141142
# - "Benchmarking" : "examples/generated/UserGuide/benchmarking.md"
142143
- "Comparison to DF.jl" : "examples/generated/UserGuide/comparisons.md"
143144
- "Contribute" : "examples/generated/Contributors/Howto.md"

src/parsing.jl

Lines changed: 7 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -428,7 +428,13 @@ function parse_escape_function(rhs_expr::Union{Expr,Symbol})
428428
return x
429429
end
430430
elseif @capture(x, @mac_(args__))
431-
return esc(Expr(:macrocall, mac, LineNumberNode, args...))
431+
if endswith(string(mac), "_str")
432+
# Macros used inside of string macros are escaped, making it possible to work with Unitful units inside of `@mutate` (e.g. `u"psi"`)
433+
return esc(Expr(:macrocall, mac, LineNumberNode, args...))
434+
else
435+
# Other macros that may reference variables referring to column names should *not* be escaped
436+
return x
437+
end
432438
end
433439
return x
434440
end

0 commit comments

Comments
 (0)