@@ -616,26 +616,38 @@ response for us. So `dbplyr` does all the hard work of translating from R to SQL
616
616
we can just stick with R!
617
617
618
618
With our ` lang_db ` table reference for the 2016 Canadian Census data in hand, we
619
- can mostly continue onward as if it were a regular data frame. For example,
620
- we can use the ` filter ` function
621
- to obtain only certain rows. Below we filter the data to include only Aboriginal languages.
619
+ can mostly continue onward as if it were a regular data frame. For example, let's do the same exercise
620
+ from Chapter \@ ref(intro): we will obtain only those rows corresponding to Aboriginal languages, and keep only
621
+ the ` language ` and ` mother_tongue ` columns.
622
+ We can use the ` filter ` function to obtain only certain rows. Below we filter the data to include only Aboriginal languages.
622
623
623
624
``` {r}
624
625
aboriginal_lang_db <- filter(lang_db, category == "Aboriginal languages")
625
626
aboriginal_lang_db
626
627
```
627
628
628
629
Above you can again see the hints that this data is not actually stored in R yet:
629
- the source is a ` lazy query [?? x 6]` and the output says ` ... with more rows ` at the end
630
+ the source is ` SQL [?? x 6]` and the output says ` ... more rows ` at the end
630
631
(both indicating that R does not know how many rows there are in total!),
631
- and a database type ` sqlite 3.36.0 ` is listed.
632
+ and a database type ` sqlite ` is listed.
633
+ We didn't use the ` collect ` function because we are not ready to bring the data into R yet. \index{collect}
634
+ We can still use the database to do some work to obtain * only* the small amount of data we want to work with locally
635
+ in R. Let's add the second part of our database query: selecting only the ` language ` and ` mother_tongue ` columns
636
+ using the ` select ` function.
637
+
638
+ ``` {r}
639
+ aboriginal_lang_selected_db <- select(aboriginal_lang_db, language, mother_tongue)
640
+ aboriginal_lang_selected_db
641
+ ```
642
+
643
+ Now you can see that the database will return only the two columns we asked for with the ` select ` function.
632
644
In order to actually retrieve this data in R as a data frame,
633
645
we use the ` collect ` function. \index{filter}
634
646
Below you will see that after running ` collect ` , R knows that the retrieved
635
647
data has 67 rows, and there is no database listed any more.
636
648
637
649
``` {r}
638
- aboriginal_lang_data <- collect(aboriginal_lang_db )
650
+ aboriginal_lang_data <- collect(aboriginal_lang_selected_db )
639
651
aboriginal_lang_data
640
652
```
641
653
@@ -649,14 +661,14 @@ For example, look what happens when we try to use `nrow` to count rows
649
661
in a data frame: \index{nrow}
650
662
651
663
``` {r}
652
- nrow(aboriginal_lang_db )
664
+ nrow(aboriginal_lang_selected_db )
653
665
```
654
666
655
667
or ` tail ` to preview the last six rows of a data frame:
656
668
\index{tail}
657
669
658
670
``` {r, eval = FALSE}
659
- tail(aboriginal_lang_db )
671
+ tail(aboriginal_lang_selected_db )
660
672
```
661
673
662
674
```
0 commit comments