Skip to content

Commit 22baaf4

Browse files
Merge pull request #559 from UBC-DSCI/db-create-column
Updates to DB section to match Python
2 parents c9b277a + b4df504 commit 22baaf4

File tree

1 file changed

+20
-8
lines changed

1 file changed

+20
-8
lines changed

source/reading.Rmd

Lines changed: 20 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -616,26 +616,38 @@ response for us. So `dbplyr` does all the hard work of translating from R to SQL
616616
we can just stick with R!
617617

618618
With our `lang_db` table reference for the 2016 Canadian Census data in hand, we
619-
can mostly continue onward as if it were a regular data frame. For example,
620-
we can use the `filter` function
621-
to obtain only certain rows. Below we filter the data to include only Aboriginal languages.
619+
can mostly continue onward as if it were a regular data frame. For example, let's do the same exercise
620+
from Chapter \@ref(intro): we will obtain only those rows corresponding to Aboriginal languages, and keep only
621+
the `language` and `mother_tongue` columns.
622+
We can use the `filter` function to obtain only certain rows. Below we filter the data to include only Aboriginal languages.
622623

623624
```{r}
624625
aboriginal_lang_db <- filter(lang_db, category == "Aboriginal languages")
625626
aboriginal_lang_db
626627
```
627628

628629
Above you can again see the hints that this data is not actually stored in R yet:
629-
the source is a `lazy query [?? x 6]` and the output says `... with more rows` at the end
630+
the source is `SQL [?? x 6]` and the output says `... more rows` at the end
630631
(both indicating that R does not know how many rows there are in total!),
631-
and a database type `sqlite 3.36.0` is listed.
632+
and a database type `sqlite` is listed.
633+
We didn't use the `collect` function because we are not ready to bring the data into R yet. \index{collect}
634+
We can still use the database to do some work to obtain *only* the small amount of data we want to work with locally
635+
in R. Let's add the second part of our database query: selecting only the `language` and `mother_tongue` columns
636+
using the `select` function.
637+
638+
```{r}
639+
aboriginal_lang_selected_db <- select(aboriginal_lang_db, language, mother_tongue)
640+
aboriginal_lang_selected_db
641+
```
642+
643+
Now you can see that the database will return only the two columns we asked for with the `select` function.
632644
In order to actually retrieve this data in R as a data frame,
633645
we use the `collect` function. \index{filter}
634646
Below you will see that after running `collect`, R knows that the retrieved
635647
data has 67 rows, and there is no database listed any more.
636648

637649
```{r}
638-
aboriginal_lang_data <- collect(aboriginal_lang_db)
650+
aboriginal_lang_data <- collect(aboriginal_lang_selected_db)
639651
aboriginal_lang_data
640652
```
641653

@@ -649,14 +661,14 @@ For example, look what happens when we try to use `nrow` to count rows
649661
in a data frame: \index{nrow}
650662

651663
```{r}
652-
nrow(aboriginal_lang_db)
664+
nrow(aboriginal_lang_selected_db)
653665
```
654666

655667
or `tail` to preview the last six rows of a data frame:
656668
\index{tail}
657669

658670
```{r, eval = FALSE}
659-
tail(aboriginal_lang_db)
671+
tail(aboriginal_lang_selected_db)
660672
```
661673

662674
```

0 commit comments

Comments
 (0)