Skip to content

Commit 1b15a5f

Browse files
better path section in reading
1 parent 2fdf0c5 commit 1b15a5f

File tree

1 file changed

+47
-37
lines changed

1 file changed

+47
-37
lines changed

source/reading.md

Lines changed: 47 additions & 37 deletions
Original file line numberDiff line numberDiff line change
@@ -80,22 +80,21 @@ functions, we first need to talk about *where* the data lives. When you load a
8080
data set into Python, you first need to tell Python where those files live. The file
8181
could live on your computer (*local*) or somewhere on the internet (*remote*).
8282

83-
The place where the file lives on your computer is called the "path". You can
83+
The place where the file lives on your computer is referred to as its "path". You can
8484
think of the path as directions to the file. There are two kinds of paths:
85-
*relative* paths and *absolute* paths. A relative path is where the file is
86-
with respect to where you currently are on the computer (e.g., where the file
87-
you're working in is). On the other hand, an absolute path is where the file is
88-
in respect to the computer's filesystem base (or root) folder.
85+
*relative* paths and *absolute* paths. A relative path indicates where the file is
86+
with respect to your *working directory* (i.e., "where you are currently") on the computer.
87+
On the other hand, an absolute path indicates where the file is
88+
in respect to the computer's filesystem base (or *root*) folder, regardless of where you are working.
8989

9090
```{index} Happiness Report
9191
```
9292

9393
Suppose our computer's filesystem looks like the picture in
94-
{numref}`Filesystem`, and we are working in a
95-
file titled `worksheet_02.ipynb`. If we want to
96-
read the `.csv` file named `happiness_report.csv` into Python, we could do this
97-
using either a relative or an absolute path. We show both choices
98-
below.
94+
{numref}`Filesystem`. We are working in a
95+
file titled `worksheet_02.ipynb`, and our current working directory is `worksheet_02`;
96+
typically, as is the case here, the working directory is the directory containing the file you are currently
97+
working on.
9998

10099
```{figure} img/reading/filesystem.jpeg
101100
---
@@ -105,34 +104,42 @@ name: Filesystem
105104
Example file system
106105
```
107106

108-
109-
**Reading `happiness_report.csv` using a relative path:**
110-
111-
+++
112-
107+
Let's say we wanted to open the `happiness_report.csv` file. We have two options to indicate
108+
where the file is: using a relative path, or using an absolute path.
109+
The absolute path of the file always starts with a slash `/`—representing the root folder on the computer—and
110+
proceeds by listing out the sequence of folders you would have to enter to reach the file, each separated by another slash `/`.
111+
So in this case, `happiness_report.csv` would be reached by starting at the root, and entering the `home` folder,
112+
then the `dsci-100` folder, then the `worksheet_02` folder, and then finally the `data` folder. So its absolute
113+
path would be `/home/dsci-100/worksheet_02/data/happiness_report.csv`. We can load the file using its absolute path
114+
as a string passed to the `read_csv` function from `pandas`.
113115
```python
114-
happy_data = pd.read_csv("data/happiness_report.csv")
116+
happy_data = pd.read_csv("/home/dsci-100/worksheet_02/data/happiness_report.csv")
115117
```
116-
117-
+++
118-
119-
**Reading `happiness_report.csv` using an absolute path:**
120-
121-
+++
122-
118+
If we instead wanted to use a relative path, we would need to list out the sequence of steps needed to get from our current
119+
working directory to the file, with slashes `/` separating each step. Since we are currently in the `worksheet_02` folder,
120+
we just need to enter the `data` folder to reach our desired file. Hence the relative path is `data/happiness_report.csv`,
121+
and we can load the file using its relative path as a string passed to `read_csv`.
123122
```python
124-
happy_data = pd.read_csv("/home/dsci-100/worksheet_02/data/happiness_report.csv")
123+
happy_data = pd.read_csv("data/happiness_report.csv")
125124
```
125+
Note that there is no forward slash at the beginning of a relative path; if we accidentally typed `"/data/happiness_report.csv"`,
126+
Python would look for a folder named `data` in the root folder of the computer—but that doesn't exist!
126127

127-
+++
128+
Aside from specifying places to go in a path using folder names (like `data` and `worksheet_02`), we can also specify two additional
129+
special places: the *current directory* and the *previous directory*. We indicate the current working directory with a single dot `.`, and
130+
the previous directory with two dots `..`. So for instance, if we wanted to reach the `bike_share.csv` file from the `worksheet_02` folder, we could
131+
use the relative path `../tutorial_01/bike_share.csv`. We can even combine these two; for example, we could reach the `bike_share.csv` file using
132+
the (very silly) path `../tutorial_01/../tutorial_01/./bike_share.csv` with quite a few redundant directions: it says to go back a folder, then open `tutorial_01`,
133+
then go back a folder again, then open `tutorial_01` again, then stay in the current directory, then finally get to `bike_share.csv`. Whew, what a long trip!
128134

129-
So which one should you use? Generally speaking, to ensure your code can be run
130-
on a different computer, you should use relative paths. An added bonus is that
131-
it's also less typing! Generally, you should use relative paths because the file's
132-
absolute path (the names of
133-
folders between the computer's root `/` and the file) isn't usually the same
134-
across different computers. For example, suppose Fatima and Jayden are working on a
135-
project together on the `happiness_report.csv` data. Fatima's file is stored at
135+
So which kind of path should you use: relative, or absolute? Generally speaking, you should use relative paths.
136+
Using a relative path helps ensure that your code can be run
137+
on a different computer (and as an added bonus, relative paths are often shorter—easier to type!).
138+
This is because a file's relative path is often the same across different computers, while a
139+
file's absolute path (the names of
140+
all of the folders between the computer's root, represented by `/`, and the file) isn't usually the same
141+
across different computers. For example, suppose Fatima and Jayden are working on a
142+
project together on the `happiness_report.csv` data. Fatima's file is stored at
136143

137144
```
138145
/home/Fatima/project/data/happiness_report.csv
@@ -150,16 +157,19 @@ their different usernames. If Jayden has code that loads the
150157
`happiness_report.csv` data using an absolute path, the code won't work on
151158
Fatima's computer. But the relative path from inside the `project` folder
152159
(`data/happiness_report.csv`) is the same on both computers; any code that uses
153-
relative paths will work on both!
160+
relative paths will work on both! In the additional resources section,
161+
we include a link to a short video on the
162+
difference between absolute and relative paths.
154163

155164
```{index} URL
156165
```
157166

158-
Your file could be stored locally, as we discussed, or it could also be
159-
somewhere on the internet (remotely). For this purpose we use a
167+
Beyond files stored on your computer (i.e., locally), we also need a way to locate resources
168+
stored elsewhere on the internet (i.e., remotely). For this purpose we use a
160169
*Uniform Resource Locator (URL)*, i.e., a web address that looks something
161-
like https://google.com/. URLs indicate the location of a resource on the internet and
162-
helps us retrieve that resource.
170+
like https://datasciencebook.ca/. URLs indicate the location of a resource on the internet, and
171+
start with a web domain, followed by a forward slash `/`, and then a path
172+
to where the resource is located on the remote machine.
163173

164174
## Reading tabular data from a plain text file into Python
165175

0 commit comments

Comments
 (0)