You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hello everyone! After an extended hiatus for various reasons (from graduating college to navigating job changes and legal challenges), we're back and eager to breathe new life into this blog. Given my deep interest in the fundamentals of advanced methods, today we're delving into an essential topic every dplyr user will eventually face.
13
17
@@ -23,7 +27,7 @@ library(dplyr)
23
27
iris <- iris %>% slice(1:5)
24
28
```
25
29
26
-
# Problem 1: Symbols vs. strings with names
30
+
##Problem 1: Symbols vs. strings with names
27
31
28
32
Let's compare how we select columns in a data frame using base R versus dplyr:
In this way we let dplyr know that `my_var_as_symbol` has to be passed directly as user provided it. We can think of embracing as of cut-paste operation. We tell dplyr: "Take what user provided in place of `my_var_as_symbol` in function call and plug it directly into `select`, without creating any intermediate variables.". Call to `my_subset_with_symbols()` is basically replaced with what lies inside of it.
121
125
122
-
# Problem 3: dynamic columns in purrr formulas in `across`
126
+
##Problem 3: dynamic columns in purrr formulas in `across`
123
127
124
128
While the above solutions work seamlessly with functions like `dplyr::select()`, challenges arise when operations grow complex. Suppose we wish to craft a function, `do_magic`, that takes data, a special `column`, and several `others` columns. This function should add the special column to all others.
<li><ahref="#problem-1-symbols-vs.-strings-with-names">Problem 1: Symbols vs. strings with names</a></li>
19
+
<li><ahref="#problem-2-passing-column-names-as-arguments-to-custom-functions">Problem 2: Passing column names as arguments to custom functions</a></li>
20
+
<li><ahref="#problem-3-dynamic-columns-in-purrr-formulas-in-across">Problem 3: dynamic columns in purrr formulas in <code>across</code></a></li>
21
+
<li><ahref="#summary-next-steps">Summary & Next Steps</a></li>
22
+
<li><ahref="#dive-deeper-resources-for-the-curious-minds">Dive Deeper: Resources for the Curious Minds:</a></li>
23
+
</ul>
24
+
</div>
11
25
12
-
<divid="intro" class="section level1">
13
-
<h1>Intro</h1>
26
+
<divid="intro" class="section level2">
27
+
<h2>Intro</h2>
14
28
<p>Hello everyone! After an extended hiatus for various reasons (from graduating college to navigating job changes and legal challenges), we’re back and eager to breathe new life into this blog. Given my deep interest in the fundamentals of advanced methods, today we’re delving into an essential topic every dplyr user will eventually face.</p>
15
29
<p>dplyr is meticulously designed with the primary goal of making code workflows read as naturally and close to plain language as possible. This design philosophy manifests in two critical dimensions: <em>semantic</em> and <em>syntactic</em>.</p>
16
30
<p>Semantically, the emphasis is on <strong>employing words with intuitive and easily understood meanings</strong>. For instance, dplyr and its friends adhere to a robust naming convention where function names typically take on verb forms, elucidating the action they perform.</p>
17
31
<p>Syntactically, the <strong>arrangement and combination of these descriptive words is paramount</strong>. Arguably, this is even more critical to the user experience. One of the most evident manifestations of this syntactical approach is the tidyverse’s hallmark feature: <strong>the pipe operator</strong>. But we are not going to tackle this today. I will look into caveats of another essential and intuitive syntactic feature: the <strong>use of symbols instead of strings to refer to variables within datasets</strong>. This offers a more natural-feeling mode of interaction but, as I have found out over many years of using R, this feature can lead to some problems.</p>
<h2>Problem 2: Passing column names as arguments to custom functions</h2>
79
94
<p>Differentiating between passing a variable name or a symbol becomes trickier when constructing functions that internally use dplyr verbs. Consider:</p>
@@ -100,9 +115,8 @@ <h2>Problem 2: Passing column names as arguments to custom functions</h2>
<p>In this way we let dplyr know that <code>my_var_as_symbol</code> has to be passed directly as user provided it. We can think of embracing as of cut-paste operation. We tell dplyr: “Take what user provided in place of <code>my_var_as_symbol</code> in function call and plug it directly into <code>select</code>, without creating any intermediate variables.”. Call to <code>my_subset_with_symbols()</code> is basically replaced with what lies inside of it.</p>
<h2>Problem 3: dynamic columns in purrr formulas in <code>across</code></h2>
106
120
<p>While the above solutions work seamlessly with functions like <code>dplyr::select()</code>, challenges arise when operations grow complex. Suppose we wish to craft a function, <code>do_magic</code>, that takes data, a special <code>column</code>, and several <code>others</code> columns. This function should add the special column to all others.</p>
107
121
<p>Leveraging <code>dplyr::mutate(dplyr::across())</code> can achieve this. Its syntax is:</p>
@@ -148,6 +162,7 @@ <h4>Tip: when <code>all_of()</code> does not work, use <code>.data</code></h4>
148
162
## 5 3.6 2.2 1.4 0.2 setosa</code></pre>
149
163
<p>When you need to reference the underlying data within the context of functions, the <code>.data</code> pronoun comes to the rescue. As demonstrated, it operates similarly to directly accessing the data.</p>
<p>Throughout this post, we ventured deep into some of the intricacies of dplyr. We’ve unraveled how the package strives to make our code both semantic and syntactic, all while simplifying complex operations. The power of symbols and the utility of functions like <code>all_of()</code> and <code>.data</code> demonstrate just how dynamic and adaptable dplyr can be, especially when working with variable column names. While we’ve covered much ground, the world of dplyr is vast and constantly evolving. We are aware that all this <em>embracing</em> and <em>tidyselect</em> rules might be intimidating, but we will be continue to explore more facets of the tidyverse in future posts of “basic advanceds”, aiming to empower you with advanced techniques that enhance your data analysis journey.</p>
@@ -158,4 +173,3 @@ <h2>Summary & Next Steps</h2>
158
173
<h2>Dive Deeper: Resources for the Curious Minds:</h2>
159
174
<p>For those wishing to delve further or who may have lingering questions: <ahref="https://dplyr.tidyverse.org/articles/programming.html">Dplyr official programming guide</a></p>
0 commit comments