Skip to content

Commit f2f7ca0

Browse files
committed
finish article
1 parent bc143a0 commit f2f7ca0

File tree

2 files changed

+100
-2
lines changed

2 files changed

+100
-2
lines changed

content/post/2023-01-30-r-basic-advanceds-variables-and-names-in-dplyr/index.Rmd

Lines changed: 51 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -136,4 +136,54 @@ iris %>%
136136
mutate(across(all_of(c("Sepal.Length", "Sepal.Width")), ~ .x - Petal.Length))
137137
```
138138

139-
However, contrary to most languages, in R **symbols can be treated as objects themselves**. This allows dplyr to even perform such simplifications. The details are irrelevant now
139+
Elegant, isn't it? Now, let's proceed by encapsulating this logic within a function where column names are passed as strings:
140+
141+
```{r}
142+
do_magic <- function(data, special, others) {
143+
data %>%
144+
mutate(across(all_of(others), ~ .x - all_of(special)))
145+
}
146+
147+
# won't work:
148+
# do_magic(iris, special = "Petal.Length", others = c("Sepal.Length", "Sepal.Width"))
149+
```
150+
151+
Surprisingly, it fails! When used within the context of `across`, dplyr seems unable to utilize the tidyselect rules (the ones that make `all_of()` possible). But we're not defeated; let's try embracing:
152+
153+
```{r}
154+
do_magic_but_better <- function(data, special, others) {
155+
data %>%
156+
mutate(across(all_of(others), ~ .x - {{special}}))
157+
}
158+
159+
do_magic_but_better(iris, special = Petal.Length, others = c("Sepal.Length", "Sepal.Width"))
160+
```
161+
162+
By adopting this approach, it's imperative to provide special as a symbol. Also, this does not look fine: one parameter is provided as symbol, another one is as character vector... **We should always aim at being consistent**. Either all column-like parameters should be symbols or all should be character strings. There are pros and cons to both ways. Let's say that we want to stick to strings only. How can we do it?
163+
164+
#### Tip: when `all_of()` does not work, use `.data`
165+
166+
There's a workaround for this conundrum:
167+
168+
```{r}
169+
do_magic_but_in_other_way <- function(data, special, others) {
170+
data %>%
171+
mutate(across(all_of(others), ~ .x - .data[[special]]))
172+
}
173+
174+
do_magic_but_in_other_way(iris, special = "Petal.Length", others = c("Sepal.Length", "Sepal.Width"))
175+
```
176+
177+
When you need to reference the underlying data within the context of functions, the `.data` pronoun comes to the rescue. As demonstrated, it operates similarly to directly accessing the data.
178+
179+
## Summary & Next Steps
180+
181+
Throughout this post, we ventured deep into some of the intricacies of dplyr. We've unraveled how the package strives to make our code both semantic and syntactic, all while simplifying complex operations. The power of symbols and the utility of functions like `all_of()` and `.data` demonstrate just how dynamic and adaptable dplyr can be, especially when working with variable column names. While we've covered much ground, the world of dplyr is vast and constantly evolving. We are aware that all this *embracing* and *tidyselect* rules might be intimidating, but we will be continue to explore more facets of the tidyverse in future posts of "basic advanceds", aiming to empower you with advanced techniques that enhance your data analysis journey.
182+
183+
If you've found this post enlightening and wish to delve deeper, or if you have any questions or insights, we'd love to hear from you! You can contact us directly via [X](https://twitter.com/Rturtletopia). Alternatively, for those who prefer a more open-source avenue, feel free to open an issue on our [GitHub](https://github.com/turtletopia/turtletopia.github.io/issues) repository. Your feedback and insights not only help us improve, but they also contribute to the broader data science community.
184+
185+
Until next time, keep exploring, learning, and sharing!
186+
187+
## Dive Deeper: Resources for the Curious Minds:
188+
189+
For those wishing to delve further or who may have lingering questions: [Dplyr official programming guide](https://dplyr.tidyverse.org/articles/programming.html)

content/post/2023-01-30-r-basic-advanceds-variables-and-names-in-dplyr/index.html

Lines changed: 49 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -112,5 +112,53 @@ <h1>Problem 3: dynamic columns in purrr formulas in <code>across</code></h1>
112112
<p>For custom, unnamed functions, the <em>purrr formula syntax</em> (<code>~ expression</code> with <code>.x</code>) is beneficial. In our case (without enclosing it in a function yet) could look like:</p>
113113
<pre class="r"><code>iris %&gt;%
114114
mutate(across(all_of(c(&quot;Sepal.Length&quot;, &quot;Sepal.Width&quot;)), ~ .x - Petal.Length))</code></pre>
115-
<p>However, contrary to most languages, in R <strong>symbols can be treated as objects themselves</strong>. This allows dplyr to even perform such simplifications. The details are irrelevant now</p>
115+
<p>Elegant, isn’t it? Now, let’s proceed by encapsulating this logic within a function where column names are passed as strings:</p>
116+
<pre class="r"><code>do_magic &lt;- function(data, special, others) {
117+
data %&gt;%
118+
mutate(across(all_of(others), ~ .x - all_of(special)))
119+
}
120+
121+
# won&#39;t work:
122+
# do_magic(iris, special = &quot;Petal.Length&quot;, others = c(&quot;Sepal.Length&quot;, &quot;Sepal.Width&quot;))</code></pre>
123+
<p>Surprisingly, it fails! When used within the context of <code>across</code>, dplyr seems unable to utilize the tidyselect rules (the ones that make <code>all_of()</code> possible). But we’re not defeated; let’s try embracing:</p>
124+
<pre class="r"><code>do_magic_but_better &lt;- function(data, special, others) {
125+
data %&gt;%
126+
mutate(across(all_of(others), ~ .x - {{special}}))
127+
}
128+
129+
do_magic_but_better(iris, special = Petal.Length, others = c(&quot;Sepal.Length&quot;, &quot;Sepal.Width&quot;))</code></pre>
130+
<pre><code>## Sepal.Length Sepal.Width Petal.Length Petal.Width Species
131+
## 1 3.7 2.1 1.4 0.2 setosa
132+
## 2 3.5 1.6 1.4 0.2 setosa
133+
## 3 3.4 1.9 1.3 0.2 setosa
134+
## 4 3.1 1.6 1.5 0.2 setosa
135+
## 5 3.6 2.2 1.4 0.2 setosa</code></pre>
136+
<p>By adopting this approach, it’s imperative to provide special as a symbol. Also, this does not look fine: one parameter is provided as symbol, another one is as character vector… <strong>We should always aim at being consistent</strong>. Either all column-like parameters should be symbols or all should be character strings. There are pros and cons to both ways. Let’s say that we want to stick to strings only. How can we do it?</p>
137+
<div id="tip-when-all_of-does-not-work-use-.data" class="section level4">
138+
<h4>Tip: when <code>all_of()</code> does not work, use <code>.data</code></h4>
139+
<p>There’s a workaround for this conundrum:</p>
140+
<pre class="r"><code>do_magic_but_in_other_way &lt;- function(data, special, others) {
141+
data %&gt;%
142+
mutate(across(all_of(others), ~ .x - .data[[special]]))
143+
}
144+
145+
do_magic_but_in_other_way(iris, special = &quot;Petal.Length&quot;, others = c(&quot;Sepal.Length&quot;, &quot;Sepal.Width&quot;))</code></pre>
146+
<pre><code>## Sepal.Length Sepal.Width Petal.Length Petal.Width Species
147+
## 1 3.7 2.1 1.4 0.2 setosa
148+
## 2 3.5 1.6 1.4 0.2 setosa
149+
## 3 3.4 1.9 1.3 0.2 setosa
150+
## 4 3.1 1.6 1.5 0.2 setosa
151+
## 5 3.6 2.2 1.4 0.2 setosa</code></pre>
152+
<p>When you need to reference the underlying data within the context of functions, the <code>.data</code> pronoun comes to the rescue. As demonstrated, it operates similarly to directly accessing the data.</p>
153+
</div>
154+
<div id="summary-next-steps" class="section level2">
155+
<h2>Summary &amp; Next Steps</h2>
156+
<p>Throughout this post, we ventured deep into some of the intricacies of dplyr. We’ve unraveled how the package strives to make our code both semantic and syntactic, all while simplifying complex operations. The power of symbols and the utility of functions like <code>all_of()</code> and <code>.data</code> demonstrate just how dynamic and adaptable dplyr can be, especially when working with variable column names. While we’ve covered much ground, the world of dplyr is vast and constantly evolving. We are aware that all this <em>embracing</em> and <em>tidyselect</em> rules might be intimidating, but we will be continue to explore more facets of the tidyverse in future posts of “basic advanceds”, aiming to empower you with advanced techniques that enhance your data analysis journey.</p>
157+
<p>If you’ve found this post enlightening and wish to delve deeper, or if you have any questions or insights, we’d love to hear from you! You can contact us directly via <a href="https://twitter.com/Rturtletopia">X</a>. Alternatively, for those who prefer a more open-source avenue, feel free to open an issue on our <a href="https://github.com/turtletopia/turtletopia.github.io/issues">GitHub</a> repository. Your feedback and insights not only help us improve, but they also contribute to the broader data science community.</p>
158+
<p>Until next time, keep exploring, learning, and sharing!</p>
159+
</div>
160+
<div id="dive-deeper-resources-for-the-curious-minds" class="section level2">
161+
<h2>Dive Deeper: Resources for the Curious Minds:</h2>
162+
<p>For those wishing to delve further or who may have lingering questions: <a href="https://dplyr.tidyverse.org/articles/programming.html">Dplyr official programming guide</a></p>
163+
</div>
116164
</div>

0 commit comments

Comments
 (0)