Skip to content

Commit acfb8d2

Browse files
mlee comments addressed
1 parent 0b10d8e commit acfb8d2

File tree

2 files changed

+18
-16
lines changed

2 files changed

+18
-16
lines changed

img/reading/NASA-API-parameters.png

-24.5 KB
Loading

source/reading.Rmd

Lines changed: 18 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -839,7 +839,7 @@ offer something known as an **a**pplication **p**rogramming **i**nterface
839839
provides a programmatic way to ask for subsets of a data set. This allows the
840840
website owner to control *who* has access to the data, *what portion* of the
841841
data they have access to, and *how much* data they can access. Typically, the
842-
website owner will give you a *token* (a secret string of characters somewhat
842+
website owner will give you a *token* or *key* (a secret string of characters somewhat
843843
like a password) that you have to provide when accessing the API.
844844

845845
Another interesting thought: websites themselves *are* data! When you type a
@@ -929,7 +929,7 @@ above you can see a line that looks like
929929
```html
930930
<span class="result-price">$800</span>
931931
```
932-
That is definitely storing the price of a particular apartment. With some more
932+
That snippet is definitely storing the price of a particular apartment. With some more
933933
investigation, you should be able to find things like the date and time of the
934934
listing, the address of the listing, and more. So this source code most likely
935935
contains all the information we are interested in!
@@ -1003,7 +1003,8 @@ The selector gadget returns them to us as a comma-separated list (here
10031003
`.housing , .result-price`), which is exactly the format we need to provide to
10041004
R if we are using more than one CSS selector.
10051005

1006-
**Stop! Are you allowed to scrape that website?**
1006+
**Caution: are you allowed to scrape that website?**
1007+
10071008
*Before* scraping \index{web scraping!permission} data from the web, you should always check whether or not
10081009
you are *allowed* to scrape it! There are two documents that are important
10091010
for this: the `robots.txt` file and the Terms of Service
@@ -1130,7 +1131,7 @@ such as this into a more useful format for data analysis using R.
11301131
### Using an API
11311132

11321133
Rather than posting a data file at a URL for you to download, many websites these days
1133-
provide an API \index{API} that must be accessed through a programming language like R. The benefit of this
1134+
provide an API \index{API} that must be accessed through a programming language like R. The benefit of using an API
11341135
is that data owners have much more control over the data they provide to users. However, unlike
11351136
web scraping, there is no consistent way to access an API across websites. Every website typically
11361137
has its own API designed especially for its own use case. Therefore we will just provide one example
@@ -1146,7 +1147,7 @@ picture of the Rho-Ophiuchi cloud complex in Figure \@ref(fig:NASA-API-Rho-Ophiu
11461147
knitr::include_graphics("img/reading/NASA-API-Rho-Ophiuchi.png")
11471148
```
11481149

1149-
First, you will need to visit the [NASA APIs page](https://api.nasa.gov/) and generate an API key.
1150+
First, you will need to visit the [NASA APIs page](https://api.nasa.gov/) and generate an API key (i.e., a password used to identify you when accessing the API).
11501151
Note that a valid email address is required to
11511152
associate with the key. The signup form looks something like Figure \@ref(fig:NASA-API-signup).
11521153
After filling out the basic information, you will receive the token via email.
@@ -1156,7 +1157,7 @@ Make sure to store the key in a safe place, and keep it private.
11561157
knitr::include_graphics("img/reading/NASA-API-signup.png")
11571158
```
11581159

1159-
**Stop! Think about your API usage carefully!**
1160+
**Caution: think about your API usage carefully!**
11601161

11611162
When you access an API, you are initiating a transfer of data from a web server
11621163
to your computer. Web servers are expensive to run and do not have infinite resources.
@@ -1187,8 +1188,7 @@ API, we need to specify three things. First, we specify the URL *endpoint* of
11871188
the API, which is simply a URL that helps the remote server understand which
11881189
API you are trying to access. NASA offers a variety of APIs, each with its own
11891190
endpoint; in the case of the NASA "Astronomy Picture of the Day" API, the URL
1190-
endpoint is `https://api.nasa.gov/planetary/apod`, as shown at the top of
1191-
Figure \@ref(fig:NASA-API-parameters). Second, we write `?`, which denotes that a
1191+
endpoint is `https://api.nasa.gov/planetary/apod`. Second, we write `?`, which denotes that a
11921192
list of *query parameters* will follow. And finally, we specify a list of
11931193
query parameters of the form `parameter=value`, separated by `&` characters. The NASA
11941194
"Astronomy Picture of the Day" API accepts the parameters shown in
@@ -1200,7 +1200,8 @@ knitr::include_graphics("img/reading/NASA-API-parameters.png")
12001200

12011201
So for example, to obtain the image of the day
12021202
from July 13, 2023, the API query would have two parameters: `api_key=YOUR_API_KEY`
1203-
and `date=2023-07-13`.
1203+
and `date=2023-07-13`. Remember to replace `YOUR_API_KEY` with the API key you
1204+
received from NASA in your email! Putting it all together, the query will look like the following:
12041205
```
12051206
https://api.nasa.gov/planetary/apod?api_key=YOUR_API_KEY&date=2023-07-13
12061207
```
@@ -1233,7 +1234,7 @@ commas. For example, if you look closely, you'll see that the first entry is
12331234
`"date":"2023-07-13"`, which indicates that we indeed successfully received
12341235
data corresponding to July 13, 2023.
12351236

1236-
So now the job is to do all of this programmatically in R. We will load
1237+
So now our job is to do all of this programmatically in R. We will load
12371238
the `httr2` package, and construct the query using the `request` function, which takes a single URL argument;
12381239
you will recognize the same query URL that we pasted into the browser earlier.
12391240
We will then send the query using the `req_perform` function, and finally
@@ -1245,9 +1246,9 @@ of the nasa_data object. But you can reproduce this using the DEMO_KEY key -->
12451246
library(httr2)
12461247

12471248
req <- request("https://api.nasa.gov/planetary/apod?api_key=YOUR_API_KEY&date=2023-07-13")
1248-
response <- req_perform(req)
1249-
nasa_data <- resp_body_json(response)
1250-
nasa_data
1249+
resp <- req_perform(req)
1250+
nasa_data_single <- resp_body_json(resp)
1251+
nasa_data_single
12511252
```
12521253

12531254
```{r hidden_query, echo = FALSE, warning = FALSE, message = FALSE}
@@ -1263,12 +1264,13 @@ We can obtain more records at once by using the `start_date` and `end_date` para
12631264
shown in the table of parameters in \@ref(fig:NASA-API-parameters).
12641265
Let's obtain all the records between May 1, 2023, and July 13, 2023, and store the result
12651266
in an object called `nasa_data`; now the response
1266-
will take the form of an R *list* (you'll learn more about these in Chapter \@ref(wrangling)),
1267-
with one item similar to the above for each of the 74 days between the start and end dates:
1267+
will take the form of an R *list* (you'll learn more about these in Chapter \@ref(wrangling)).
1268+
Each item in the list will correspond to a single day's record (just like the `nasa_data_single` object),
1269+
and there will be 74 items total, one for each day between the start and end dates:
12681270

12691271
```r
12701272
req <- request("https://api.nasa.gov/planetary/apod?api_key=YOUR_API_KEY&start_date=2023-05-01&end_date=2023-07-13")
1271-
response <- req_perform(req)
1273+
resp <- req_perform(req)
12721274
nasa_data <- resp_body_json(response)
12731275
length(nasa_data)
12741276
```

0 commit comments

Comments
 (0)