You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: content/post/2023-01-30-r-basic-advanceds-variables-and-names-in-dplyr/index.Rmd
+6-6Lines changed: 6 additions & 6 deletions
Original file line number
Diff line number
Diff line change
@@ -15,7 +15,7 @@ output:
15
15
16
16
Hello everyone! After an extended hiatus for various reasons (from graduating college to navigating job changes and legal challenges), we're back and eager to breathe new life into this blog. Given my deep interest in the fundamentals of advanced methods, today we're delving into an essential topic every dplyr user will eventually face.
17
17
18
-
dplyr is meticulously designed with the primary goal of making code workflows read as naturally and close to plain language as possible. This design philosophy manifests in two critical dimensions: *semantic* and *syntactic*.
18
+
dplyr is meticulously designed with the primary goal of making code workflows read possibly close to natural languages. This design philosophy manifests in two critical dimensions: *semantic* and *syntactic*.
19
19
20
20
Semantically, the emphasis is on **employing words with intuitive and easily understood meanings**. For instance, dplyr and its friends adhere to a robust naming convention where function names typically take on verb forms, elucidating the action they perform.
21
21
@@ -72,7 +72,7 @@ iris %>%
72
72
select(my_variables)
73
73
```
74
74
75
-
This generates a warning. Given the tidyverse's informative error messages, it's wise to pay heed. Directly supplying can be ambiguous —- imagine having a column named "my_variables". Which should be selected if we have both the column and the external variable?
75
+
This generates a warning. Given the tidyverse's informative error messages, it's wise to pay heed. Directly supplying can be ambiguous -- imagine having a column named `my_variable`. Which should be selected if we have both the column and the external variable?
76
76
77
77
78
78

If we want to use symbols, just like directly in dplyr functions (mostly when those columns to use are predefined, in our internal functions or analyses), we have to *embrace* the variable:
111
+
If we want to use symbols, just like directly in dplyr functions (mostly when those columns to use are predefined, be it in our internal functions or in analyses), we have to *embrace* the variable:
In this way we let dplyr know that `my_var_as_symbol` has to be passed directly as user provided it. We can think of embracing as of cut-paste operation. We tell dplyr: "Take what user provided in place of `my_var_as_symbol` in function call and plug it directly into `select`, without creating any intermediate variables.". Call to `my_subset_with_symbols()` is basically replaced with what lies inside of it.
125
125
126
-
## Problem 3: dynamic columns in purrr formulas in `across`
126
+
## Problem 3: Dynamic columns in purrr formulas in `across`
127
127
128
128
While the above solutions work seamlessly with functions like `dplyr::select()`, challenges arise when operations grow complex. Suppose we wish to craft a function, `do_magic`, that takes data, a special `column`, and several `others` columns. This function should add the special column to all others.
129
129
@@ -182,12 +182,12 @@ When you need to reference the underlying data within the context of functions,
182
182
183
183
## Summary & Next Steps
184
184
185
-
Throughout this post, we ventured deep into some of the intricacies of dplyr. We've unraveled how the package strives to make our code both semantic and syntactic, all while simplifying complex operations. The power of symbols and the utility of functions like `all_of()` and `.data` demonstrate just how dynamic and adaptable dplyr can be, especially when working with variable column names. While we've covered much ground, the world of dplyr is vast and constantly evolving. We are aware that all this *embracing* and *tidyselect* rules might be intimidating, but we will be continue to explore more facets of the tidyverse in future posts of "basic advanceds", aiming to empower you with advanced techniques that enhance your data analysis journey.
185
+
Throughout this post, we ventured deep into some of the intricacies of dplyr. We've unraveled how the package strives to make our code both semantic and syntactic, all while simplifying complex operations. The power of symbols and the utility of functions like `all_of()` and `.data` demonstrate just how dynamic and adaptable dplyr can be, especially when working with variable column names. While we've covered much ground, the world of dplyr is vast and constantly evolving. We are aware that all this *embracing* and *tidyselect* rules might be intimidating, but we will continue to explore more facets of the tidyverse in future posts of "basic advanceds", aiming to empower you with advanced techniques that enhance your data analysis journey.
186
186
187
187
If you've found this post enlightening and wish to delve deeper, or if you have any questions or insights, we'd love to hear from you! You can contact us directly via [X](https://twitter.com/Rturtletopia). Alternatively, for those who prefer a more open-source avenue, feel free to open an issue on our [GitHub](https://github.com/turtletopia/turtletopia.github.io/issues) repository. Your feedback and insights not only help us improve, but they also contribute to the broader data science community.
188
188
189
189
Until next time, keep exploring, learning, and sharing!
190
190
191
191
## Dive Deeper: Resources for the Curious Minds:
192
192
193
-
For those wishing to delve further or who may have lingering questions: [Dplyr official programming guide](https://dplyr.tidyverse.org/articles/programming.html)
193
+
For those wishing to delve further or who may have lingering questions: [dplyr official programming guide](https://dplyr.tidyverse.org/articles/programming.html)
Copy file name to clipboardExpand all lines: content/post/2023-01-30-r-basic-advanceds-variables-and-names-in-dplyr/index.html
+7-7Lines changed: 7 additions & 7 deletions
Original file line number
Diff line number
Diff line change
@@ -17,7 +17,7 @@
17
17
<li><ahref="#intro">Intro</a></li>
18
18
<li><ahref="#problem-1-symbols-vs.-strings-with-names">Problem 1: Symbols vs. strings with names</a></li>
19
19
<li><ahref="#problem-2-passing-column-names-as-arguments-to-custom-functions">Problem 2: Passing column names as arguments to custom functions</a></li>
20
-
<li><ahref="#problem-3-dynamic-columns-in-purrr-formulas-in-across">Problem 3: dynamic columns in purrr formulas in <code>across</code></a></li>
20
+
<li><ahref="#problem-3-dynamic-columns-in-purrr-formulas-in-across">Problem 3: Dynamic columns in purrr formulas in <code>across</code></a></li>
21
21
<li><ahref="#summary-next-steps">Summary & Next Steps</a></li>
22
22
<li><ahref="#dive-deeper-resources-for-the-curious-minds">Dive Deeper: Resources for the Curious Minds:</a></li>
23
23
</ul>
@@ -26,7 +26,7 @@
26
26
<divid="intro" class="section level2">
27
27
<h2>Intro</h2>
28
28
<p>Hello everyone! After an extended hiatus for various reasons (from graduating college to navigating job changes and legal challenges), we’re back and eager to breathe new life into this blog. Given my deep interest in the fundamentals of advanced methods, today we’re delving into an essential topic every dplyr user will eventually face.</p>
29
-
<p>dplyr is meticulously designed with the primary goal of making code workflows read as naturally and close to plain language as possible. This design philosophy manifests in two critical dimensions: <em>semantic</em> and <em>syntactic</em>.</p>
29
+
<p>dplyr is meticulously designed with the primary goal of making code workflows read possibly close to natural languages. This design philosophy manifests in two critical dimensions: <em>semantic</em> and <em>syntactic</em>.</p>
30
30
<p>Semantically, the emphasis is on <strong>employing words with intuitive and easily understood meanings</strong>. For instance, dplyr and its friends adhere to a robust naming convention where function names typically take on verb forms, elucidating the action they perform.</p>
31
31
<p>Syntactically, the <strong>arrangement and combination of these descriptive words is paramount</strong>. Arguably, this is even more critical to the user experience. One of the most evident manifestations of this syntactical approach is the tidyverse’s hallmark feature: <strong>the pipe operator</strong>. But we are not going to tackle this today. I will look into caveats of another essential and intuitive syntactic feature: the <strong>use of symbols instead of strings to refer to variables within datasets</strong>. This offers a more natural-feeling mode of interaction but, as I have found out over many years of using R, this feature can lead to some problems.</p>
32
32
</div>
@@ -77,7 +77,7 @@ <h2>Problem 1: Symbols vs. strings with names</h2>
77
77
## 3 4.7 3.2
78
78
## 4 4.6 3.1
79
79
## 5 5.0 3.6</code></pre>
80
-
<p>This generates a warning. Given the tidyverse’s informative error messages, it’s wise to pay heed. Directly supplying can be ambiguous —- imagine having a column named “my_variables”. Which should be selected if we have both the column and the external variable?</p>
80
+
<p>This generates a warning. Given the tidyverse’s informative error messages, it’s wise to pay heed. Directly supplying can be ambiguous – imagine having a column named <code>my_variable</code>. Which should be selected if we have both the column and the external variable?</p>
81
81
<p><imgsrc="images/selection-ambiguity.png" alt="Diagram showing the dillema that dplyr is faced with when we torment it with ambiguous selections." />
82
82
To ensure clarity, dplyr authors suggest using dplyr::all_of(), which explicitly converts a name vector into symbols, resolving any ambiguities.</p>
83
83
<preclass="r"><code>iris %>%
@@ -104,7 +104,7 @@ <h2>Problem 2: Passing column names as arguments to custom functions</h2>
<p>If we want to use symbols, just like directly in dplyr functions (mostly when those columns to use are predefined, in our internal functions or analyses), we have to <em>embrace</em> the variable:</p>
107
+
<p>If we want to use symbols, just like directly in dplyr functions (mostly when those columns to use are predefined, be it in our internal functions or in analyses), we have to <em>embrace</em> the variable:</p>
@@ -116,7 +116,7 @@ <h2>Problem 2: Passing column names as arguments to custom functions</h2>
116
116
<p>In this way we let dplyr know that <code>my_var_as_symbol</code> has to be passed directly as user provided it. We can think of embracing as of cut-paste operation. We tell dplyr: “Take what user provided in place of <code>my_var_as_symbol</code> in function call and plug it directly into <code>select</code>, without creating any intermediate variables.”. Call to <code>my_subset_with_symbols()</code> is basically replaced with what lies inside of it.</p>
<h2>Problem 3: dynamic columns in purrr formulas in <code>across</code></h2>
119
+
<h2>Problem 3: Dynamic columns in purrr formulas in <code>across</code></h2>
120
120
<p>While the above solutions work seamlessly with functions like <code>dplyr::select()</code>, challenges arise when operations grow complex. Suppose we wish to craft a function, <code>do_magic</code>, that takes data, a special <code>column</code>, and several <code>others</code> columns. This function should add the special column to all others.</p>
121
121
<p>Leveraging <code>dplyr::mutate(dplyr::across())</code> can achieve this. Its syntax is:</p>
<p>Throughout this post, we ventured deep into some of the intricacies of dplyr. We’ve unraveled how the package strives to make our code both semantic and syntactic, all while simplifying complex operations. The power of symbols and the utility of functions like <code>all_of()</code> and <code>.data</code> demonstrate just how dynamic and adaptable dplyr can be, especially when working with variable column names. While we’ve covered much ground, the world of dplyr is vast and constantly evolving. We are aware that all this <em>embracing</em> and <em>tidyselect</em> rules might be intimidating, but we will be continue to explore more facets of the tidyverse in future posts of “basic advanceds”, aiming to empower you with advanced techniques that enhance your data analysis journey.</p>
168
+
<p>Throughout this post, we ventured deep into some of the intricacies of dplyr. We’ve unraveled how the package strives to make our code both semantic and syntactic, all while simplifying complex operations. The power of symbols and the utility of functions like <code>all_of()</code> and <code>.data</code> demonstrate just how dynamic and adaptable dplyr can be, especially when working with variable column names. While we’ve covered much ground, the world of dplyr is vast and constantly evolving. We are aware that all this <em>embracing</em> and <em>tidyselect</em> rules might be intimidating, but we will continue to explore more facets of the tidyverse in future posts of “basic advanceds”, aiming to empower you with advanced techniques that enhance your data analysis journey.</p>
169
169
<p>If you’ve found this post enlightening and wish to delve deeper, or if you have any questions or insights, we’d love to hear from you! You can contact us directly via <ahref="https://twitter.com/Rturtletopia">X</a>. Alternatively, for those who prefer a more open-source avenue, feel free to open an issue on our <ahref="https://github.com/turtletopia/turtletopia.github.io/issues">GitHub</a> repository. Your feedback and insights not only help us improve, but they also contribute to the broader data science community.</p>
170
170
<p>Until next time, keep exploring, learning, and sharing!</p>
<h2>Dive Deeper: Resources for the Curious Minds:</h2>
174
-
<p>For those wishing to delve further or who may have lingering questions: <ahref="https://dplyr.tidyverse.org/articles/programming.html">Dplyr official programming guide</a></p>
174
+
<p>For those wishing to delve further or who may have lingering questions: <ahref="https://dplyr.tidyverse.org/articles/programming.html">dplyr official programming guide</a></p>
0 commit comments