Skip to content

Commit bc659ef

Browse files
Merge pull request #556 from UBC-DSCI/path-gps
Better section on paths in reading chapter
2 parents 178d846 + 66a36b9 commit bc659ef

File tree

1 file changed

+39
-23
lines changed

1 file changed

+39
-23
lines changed

source/reading.Rmd

Lines changed: 39 additions & 23 deletions
Original file line numberDiff line numberDiff line change
@@ -27,7 +27,7 @@ print_html_nodes <- function(html_nodes_object) {
2727
## Overview
2828

2929
In this chapter, you’ll learn to read tabular data of various formats into R
30-
from your local device (e.g., your laptop) and the web. Reading (or loading)
30+
from your local device (e.g., your laptop) and the web. "Reading" (or "loading")
3131
\index{loading|see{reading}}\index{reading!definition} is the process of
3232
converting data (stored as plain text, a database, HTML, etc.) into an object
3333
(e.g., a data frame) that R can easily access and manipulate. Thus reading data
@@ -81,37 +81,52 @@ could live on your computer (*local*)
8181
\index{location|see{path}} \index{path!local, remote, relative, absolute}
8282
or somewhere on the internet (*remote*).
8383

84-
The place where the file lives on your computer is called the "path". You can
84+
The place where the file lives on your computer is referred to as its "path". You can
8585
think of the path as directions to the file. There are two kinds of paths:
86-
*relative* paths and *absolute* paths. A relative path is where the file is
87-
with respect to where you currently are on the computer (e.g., where the file
88-
you're working in is). On the other hand, an absolute path is where the file is
89-
in respect to the computer's filesystem base (or root) folder.
86+
*relative* paths and *absolute* paths. A relative path indicates where the file is
87+
with respect to your *working directory* (i.e., "where you are currently") on the computer.
88+
On the other hand, an absolute path indicates where the file is
89+
with respect to the computer's filesystem base (or *root*) folder, regardless of where you are working.
9090

9191
Suppose our computer's filesystem looks like the picture in Figure
92-
\@ref(fig:file-system-for-export-to-intro-datascience), and we are working in a
93-
file titled `worksheet_02.ipynb`. If we want to
94-
read the `.csv` file named `happiness_report.csv` into R, we could do this
95-
using either a relative or an absolute path. We show both choices
96-
below.\index{Happiness Report}
92+
\@ref(fig:file-system-for-export-to-intro-datascience). We are working in a
93+
file titled `worksheet_02.ipynb`, and our current working directory is `worksheet_02`;
94+
typically, as is the case here, the working directory is the directory containing the file you are currently
95+
working on.\index{Happiness Report}
9796

9897
```{r file-system-for-export-to-intro-datascience, echo = FALSE, message = FALSE, warning = FALSE, fig.cap = "Example file system.", fig.retina = 2, out.width="100%"}
9998
knitr::include_graphics("img/reading/filesystem.jpeg")
10099
```
101100

102-
**Reading `happiness_report.csv` using a relative path:**
103-
101+
Let's say we wanted to open the `happiness_report.csv` file. We have two options to indicate
102+
where the file is: using a relative path, or using an absolute path.
103+
The absolute path of the file always starts with a slash `/`&mdash;representing the root folder on the computer&mdash;and
104+
proceeds by listing out the sequence of folders you would have to enter to reach the file, each separated by another slash `/`.
105+
So in this case, `happiness_report.csv` would be reached by starting at the root, and entering the `home` folder,
106+
then the `dsci-100` folder, then the `worksheet_02` folder, and then finally the `data` folder. So its absolute
107+
path would be `/home/dsci-100/worksheet_02/data/happiness_report.csv`. We can load the file using its absolute path
108+
as a string passed to the `read_csv` function.
104109
```{r eval = FALSE}
105-
happy_data <- read_csv("data/happiness_report.csv")
110+
happy_data <- read_csv("/home/dsci-100/worksheet_02/data/happiness_report.csv")
106111
```
107-
108-
**Reading `happiness_report.csv` using an absolute path:**
109-
112+
If we instead wanted to use a relative path, we would need to list out the sequence of steps needed to get from our current
113+
working directory to the file, with slashes `/` separating each step. Since we are currently in the `worksheet_02` folder,
114+
we just need to enter the `data` folder to reach our desired file. Hence the relative path is `data/happiness_report.csv`,
115+
and we can load the file using its relative path as a string passed to `read_csv`.
110116
```{r eval = FALSE}
111-
happy_data <- read_csv("/home/dsci-100/worksheet_02/data/happiness_report.csv")
117+
happy_data <- read_csv("data/happiness_report.csv")
112118
```
119+
Note that there is no forward slash at the beginning of a relative path; if we accidentally typed `"/data/happiness_report.csv"`,
120+
R would look for a folder named `data` in the root folder of the computer&mdash;but that doesn't exist!
121+
122+
Aside from specifying places to go in a path using folder names (like `data` and `worksheet_02`), we can also specify two additional
123+
special places: the *current directory* and the *previous directory*. We indicate the current working directory with a single dot `.`, and
124+
the previous directory with two dots `..`. So for instance, if we wanted to reach the `bike_share.csv` file from the `worksheet_02` folder, we could
125+
use the relative path `../tutorial_01/bike_share.csv`. We can even combine these two; for example, we could reach the `bike_share.csv` file using
126+
the (very silly) path `../tutorial_01/../tutorial_01/./bike_share.csv` with quite a few redundant directions: it says to go back a folder, then open `tutorial_01`,
127+
then go back a folder again, then open `tutorial_01` again, then stay in the current directory, then finally get to `bike_share.csv`. Whew, what a long trip!
113128

114-
So which one should you use? Generally speaking, you should use relative paths.
129+
So which kind of path should you use: relative, or absolute? Generally speaking, you should use relative paths.
115130
Using a relative path helps ensure that your code can be run
116131
on a different computer (and as an added bonus, relative paths are often shorter&mdash;easier to type!).
117132
This is because a file's relative path is often the same across different computers, while a
@@ -139,10 +154,11 @@ difference between absolute and relative paths. You can also check out the
139154
in R.
140155

141156
Beyond files stored on your computer (i.e., locally), we also need a way to locate resources
142-
stored elsewhere on the internet (i.e., remotely). For this purpose we use a *Uniform Resource Locator (URL)*,
143-
i.e., a web address that looks something like https://datasciencebook.ca/. \index{URL}
144-
URLs indicate the location of a resource on the internet and
145-
help us retrieve that resource.
157+
stored elsewhere on the internet (i.e., remotely). For this purpose we use a
158+
*Uniform Resource Locator (URL)*, i.e., a web address that looks something
159+
like https://datasciencebook.ca/. URLs indicate the location of a resource on the internet, and
160+
start with a web domain, followed by a forward slash `/`, and then a path
161+
to where the resource is located on the remote machine. \index{URL}
146162

147163
## Reading tabular data from a plain text file into R
148164

0 commit comments

Comments
 (0)