Skip to content

Commit 239acf8

Browse files
proofreading
1 parent 627c236 commit 239acf8

File tree

2 files changed

+13
-13
lines changed

2 files changed

+13
-13
lines changed

content/post/2023-01-30-r-basic-advanceds-variables-and-names-in-dplyr/index.Rmd

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,7 @@ output:
1515

1616
Hello everyone! After an extended hiatus for various reasons (from graduating college to navigating job changes and legal challenges), we're back and eager to breathe new life into this blog. Given my deep interest in the fundamentals of advanced methods, today we're delving into an essential topic every dplyr user will eventually face.
1717

18-
dplyr is meticulously designed with the primary goal of making code workflows read as naturally and close to plain language as possible. This design philosophy manifests in two critical dimensions: *semantic* and *syntactic*.
18+
dplyr is meticulously designed with the primary goal of making code workflows read possibly close to natural languages. This design philosophy manifests in two critical dimensions: *semantic* and *syntactic*.
1919

2020
Semantically, the emphasis is on **employing words with intuitive and easily understood meanings**. For instance, dplyr and its friends adhere to a robust naming convention where function names typically take on verb forms, elucidating the action they perform.
2121

@@ -72,7 +72,7 @@ iris %>%
7272
select(my_variables)
7373
```
7474

75-
This generates a warning. Given the tidyverse's informative error messages, it's wise to pay heed. Directly supplying can be ambiguous —- imagine having a column named "my_variables". Which should be selected if we have both the column and the external variable?
75+
This generates a warning. Given the tidyverse's informative error messages, it's wise to pay heed. Directly supplying can be ambiguous -- imagine having a column named `my_variable`. Which should be selected if we have both the column and the external variable?
7676

7777

7878
![Diagram showing the dillema that dplyr is faced with when we torment it with ambiguous selections.](images/selection-ambiguity.png)
@@ -108,7 +108,7 @@ my_subset_with_strings <- function(data, my_var_as_string) {
108108
my_subset_with_strings(iris, c("Sepal.Length", "Sepal.Width"))
109109
```
110110

111-
If we want to use symbols, just like directly in dplyr functions (mostly when those columns to use are predefined, in our internal functions or analyses), we have to *embrace* the variable:
111+
If we want to use symbols, just like directly in dplyr functions (mostly when those columns to use are predefined, be it in our internal functions or in analyses), we have to *embrace* the variable:
112112

113113
```{r, eval=FALSE}
114114
my_subset_with_symbols <- function(data, my_var_as_symbol) {
@@ -123,7 +123,7 @@ my_subset_with_symbols(iris, Petal.Length, Sepal.Width)
123123

124124
In this way we let dplyr know that `my_var_as_symbol` has to be passed directly as user provided it. We can think of embracing as of cut-paste operation. We tell dplyr: "Take what user provided in place of `my_var_as_symbol` in function call and plug it directly into `select`, without creating any intermediate variables.". Call to `my_subset_with_symbols()` is basically replaced with what lies inside of it.
125125

126-
## Problem 3: dynamic columns in purrr formulas in `across`
126+
## Problem 3: Dynamic columns in purrr formulas in `across`
127127

128128
While the above solutions work seamlessly with functions like `dplyr::select()`, challenges arise when operations grow complex. Suppose we wish to craft a function, `do_magic`, that takes data, a special `column`, and several `others` columns. This function should add the special column to all others.
129129

@@ -182,12 +182,12 @@ When you need to reference the underlying data within the context of functions,
182182

183183
## Summary & Next Steps
184184

185-
Throughout this post, we ventured deep into some of the intricacies of dplyr. We've unraveled how the package strives to make our code both semantic and syntactic, all while simplifying complex operations. The power of symbols and the utility of functions like `all_of()` and `.data` demonstrate just how dynamic and adaptable dplyr can be, especially when working with variable column names. While we've covered much ground, the world of dplyr is vast and constantly evolving. We are aware that all this *embracing* and *tidyselect* rules might be intimidating, but we will be continue to explore more facets of the tidyverse in future posts of "basic advanceds", aiming to empower you with advanced techniques that enhance your data analysis journey.
185+
Throughout this post, we ventured deep into some of the intricacies of dplyr. We've unraveled how the package strives to make our code both semantic and syntactic, all while simplifying complex operations. The power of symbols and the utility of functions like `all_of()` and `.data` demonstrate just how dynamic and adaptable dplyr can be, especially when working with variable column names. While we've covered much ground, the world of dplyr is vast and constantly evolving. We are aware that all this *embracing* and *tidyselect* rules might be intimidating, but we will continue to explore more facets of the tidyverse in future posts of "basic advanceds", aiming to empower you with advanced techniques that enhance your data analysis journey.
186186

187187
If you've found this post enlightening and wish to delve deeper, or if you have any questions or insights, we'd love to hear from you! You can contact us directly via [X](https://twitter.com/Rturtletopia). Alternatively, for those who prefer a more open-source avenue, feel free to open an issue on our [GitHub](https://github.com/turtletopia/turtletopia.github.io/issues) repository. Your feedback and insights not only help us improve, but they also contribute to the broader data science community.
188188

189189
Until next time, keep exploring, learning, and sharing!
190190

191191
## Dive Deeper: Resources for the Curious Minds:
192192

193-
For those wishing to delve further or who may have lingering questions: [Dplyr official programming guide](https://dplyr.tidyverse.org/articles/programming.html)
193+
For those wishing to delve further or who may have lingering questions: [dplyr official programming guide](https://dplyr.tidyverse.org/articles/programming.html)

content/post/2023-01-30-r-basic-advanceds-variables-and-names-in-dplyr/index.html

Lines changed: 7 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -17,7 +17,7 @@
1717
<li><a href="#intro">Intro</a></li>
1818
<li><a href="#problem-1-symbols-vs.-strings-with-names">Problem 1: Symbols vs. strings with names</a></li>
1919
<li><a href="#problem-2-passing-column-names-as-arguments-to-custom-functions">Problem 2: Passing column names as arguments to custom functions</a></li>
20-
<li><a href="#problem-3-dynamic-columns-in-purrr-formulas-in-across">Problem 3: dynamic columns in purrr formulas in <code>across</code></a></li>
20+
<li><a href="#problem-3-dynamic-columns-in-purrr-formulas-in-across">Problem 3: Dynamic columns in purrr formulas in <code>across</code></a></li>
2121
<li><a href="#summary-next-steps">Summary &amp; Next Steps</a></li>
2222
<li><a href="#dive-deeper-resources-for-the-curious-minds">Dive Deeper: Resources for the Curious Minds:</a></li>
2323
</ul>
@@ -26,7 +26,7 @@
2626
<div id="intro" class="section level2">
2727
<h2>Intro</h2>
2828
<p>Hello everyone! After an extended hiatus for various reasons (from graduating college to navigating job changes and legal challenges), we’re back and eager to breathe new life into this blog. Given my deep interest in the fundamentals of advanced methods, today we’re delving into an essential topic every dplyr user will eventually face.</p>
29-
<p>dplyr is meticulously designed with the primary goal of making code workflows read as naturally and close to plain language as possible. This design philosophy manifests in two critical dimensions: <em>semantic</em> and <em>syntactic</em>.</p>
29+
<p>dplyr is meticulously designed with the primary goal of making code workflows read possibly close to natural languages. This design philosophy manifests in two critical dimensions: <em>semantic</em> and <em>syntactic</em>.</p>
3030
<p>Semantically, the emphasis is on <strong>employing words with intuitive and easily understood meanings</strong>. For instance, dplyr and its friends adhere to a robust naming convention where function names typically take on verb forms, elucidating the action they perform.</p>
3131
<p>Syntactically, the <strong>arrangement and combination of these descriptive words is paramount</strong>. Arguably, this is even more critical to the user experience. One of the most evident manifestations of this syntactical approach is the tidyverse’s hallmark feature: <strong>the pipe operator</strong>. But we are not going to tackle this today. I will look into caveats of another essential and intuitive syntactic feature: the <strong>use of symbols instead of strings to refer to variables within datasets</strong>. This offers a more natural-feeling mode of interaction but, as I have found out over many years of using R, this feature can lead to some problems.</p>
3232
</div>
@@ -77,7 +77,7 @@ <h2>Problem 1: Symbols vs. strings with names</h2>
7777
## 3 4.7 3.2
7878
## 4 4.6 3.1
7979
## 5 5.0 3.6</code></pre>
80-
<p>This generates a warning. Given the tidyverse’s informative error messages, it’s wise to pay heed. Directly supplying can be ambiguous —- imagine having a column named “my_variables”. Which should be selected if we have both the column and the external variable?</p>
80+
<p>This generates a warning. Given the tidyverse’s informative error messages, it’s wise to pay heed. Directly supplying can be ambiguous imagine having a column named <code>my_variable</code>. Which should be selected if we have both the column and the external variable?</p>
8181
<p><img src="images/selection-ambiguity.png" alt="Diagram showing the dillema that dplyr is faced with when we torment it with ambiguous selections." />
8282
To ensure clarity, dplyr authors suggest using dplyr::all_of(), which explicitly converts a name vector into symbols, resolving any ambiguities.</p>
8383
<pre class="r"><code>iris %&gt;%
@@ -104,7 +104,7 @@ <h2>Problem 2: Passing column names as arguments to custom functions</h2>
104104
}
105105

106106
my_subset_with_strings(iris, c(&quot;Sepal.Length&quot;, &quot;Sepal.Width&quot;))</code></pre>
107-
<p>If we want to use symbols, just like directly in dplyr functions (mostly when those columns to use are predefined, in our internal functions or analyses), we have to <em>embrace</em> the variable:</p>
107+
<p>If we want to use symbols, just like directly in dplyr functions (mostly when those columns to use are predefined, be it in our internal functions or in analyses), we have to <em>embrace</em> the variable:</p>
108108
<pre class="r"><code>my_subset_with_symbols &lt;- function(data, my_var_as_symbol) {
109109
data %&gt;%
110110
select({{ my_var_as_symbol }})
@@ -116,7 +116,7 @@ <h2>Problem 2: Passing column names as arguments to custom functions</h2>
116116
<p>In this way we let dplyr know that <code>my_var_as_symbol</code> has to be passed directly as user provided it. We can think of embracing as of cut-paste operation. We tell dplyr: “Take what user provided in place of <code>my_var_as_symbol</code> in function call and plug it directly into <code>select</code>, without creating any intermediate variables.”. Call to <code>my_subset_with_symbols()</code> is basically replaced with what lies inside of it.</p>
117117
</div>
118118
<div id="problem-3-dynamic-columns-in-purrr-formulas-in-across" class="section level2">
119-
<h2>Problem 3: dynamic columns in purrr formulas in <code>across</code></h2>
119+
<h2>Problem 3: Dynamic columns in purrr formulas in <code>across</code></h2>
120120
<p>While the above solutions work seamlessly with functions like <code>dplyr::select()</code>, challenges arise when operations grow complex. Suppose we wish to craft a function, <code>do_magic</code>, that takes data, a special <code>column</code>, and several <code>others</code> columns. This function should add the special column to all others.</p>
121121
<p>Leveraging <code>dplyr::mutate(dplyr::across())</code> can achieve this. Its syntax is:</p>
122122
<pre class="r"><code>mutate(across(columns_to_mutate, function_to_apply))</code></pre>
@@ -165,11 +165,11 @@ <h4>Tip: when <code>all_of()</code> does not work, use <code>.data</code></h4>
165165
</div>
166166
<div id="summary-next-steps" class="section level2">
167167
<h2>Summary &amp; Next Steps</h2>
168-
<p>Throughout this post, we ventured deep into some of the intricacies of dplyr. We’ve unraveled how the package strives to make our code both semantic and syntactic, all while simplifying complex operations. The power of symbols and the utility of functions like <code>all_of()</code> and <code>.data</code> demonstrate just how dynamic and adaptable dplyr can be, especially when working with variable column names. While we’ve covered much ground, the world of dplyr is vast and constantly evolving. We are aware that all this <em>embracing</em> and <em>tidyselect</em> rules might be intimidating, but we will be continue to explore more facets of the tidyverse in future posts of “basic advanceds”, aiming to empower you with advanced techniques that enhance your data analysis journey.</p>
168+
<p>Throughout this post, we ventured deep into some of the intricacies of dplyr. We’ve unraveled how the package strives to make our code both semantic and syntactic, all while simplifying complex operations. The power of symbols and the utility of functions like <code>all_of()</code> and <code>.data</code> demonstrate just how dynamic and adaptable dplyr can be, especially when working with variable column names. While we’ve covered much ground, the world of dplyr is vast and constantly evolving. We are aware that all this <em>embracing</em> and <em>tidyselect</em> rules might be intimidating, but we will continue to explore more facets of the tidyverse in future posts of “basic advanceds”, aiming to empower you with advanced techniques that enhance your data analysis journey.</p>
169169
<p>If you’ve found this post enlightening and wish to delve deeper, or if you have any questions or insights, we’d love to hear from you! You can contact us directly via <a href="https://twitter.com/Rturtletopia">X</a>. Alternatively, for those who prefer a more open-source avenue, feel free to open an issue on our <a href="https://github.com/turtletopia/turtletopia.github.io/issues">GitHub</a> repository. Your feedback and insights not only help us improve, but they also contribute to the broader data science community.</p>
170170
<p>Until next time, keep exploring, learning, and sharing!</p>
171171
</div>
172172
<div id="dive-deeper-resources-for-the-curious-minds" class="section level2">
173173
<h2>Dive Deeper: Resources for the Curious Minds:</h2>
174-
<p>For those wishing to delve further or who may have lingering questions: <a href="https://dplyr.tidyverse.org/articles/programming.html">Dplyr official programming guide</a></p>
174+
<p>For those wishing to delve further or who may have lingering questions: <a href="https://dplyr.tidyverse.org/articles/programming.html">dplyr official programming guide</a></p>
175175
</div>

0 commit comments

Comments
 (0)