@@ -839,7 +839,7 @@ offer something known as an **a**pplication **p**rogramming **i**nterface
839
839
provides a programmatic way to ask for subsets of a data set. This allows the
840
840
website owner to control * who* has access to the data, * what portion* of the
841
841
data they have access to, and * how much* data they can access. Typically, the
842
- website owner will give you a * token* (a secret string of characters somewhat
842
+ website owner will give you a * token* or * key * (a secret string of characters somewhat
843
843
like a password) that you have to provide when accessing the API.
844
844
845
845
Another interesting thought: websites themselves * are* data! When you type a
@@ -929,7 +929,7 @@ above you can see a line that looks like
929
929
``` html
930
930
<span class =" result-price" >$800</span >
931
931
```
932
- That is definitely storing the price of a particular apartment. With some more
932
+ That snippet is definitely storing the price of a particular apartment. With some more
933
933
investigation, you should be able to find things like the date and time of the
934
934
listing, the address of the listing, and more. So this source code most likely
935
935
contains all the information we are interested in!
@@ -1003,7 +1003,8 @@ The selector gadget returns them to us as a comma-separated list (here
1003
1003
` .housing , .result-price ` ), which is exactly the format we need to provide to
1004
1004
R if we are using more than one CSS selector.
1005
1005
1006
- ** Stop! Are you allowed to scrape that website?**
1006
+ ** Caution: are you allowed to scrape that website?**
1007
+
1007
1008
* Before* scraping \index{web scraping!permission} data from the web, you should always check whether or not
1008
1009
you are * allowed* to scrape it! There are two documents that are important
1009
1010
for this: the ` robots.txt ` file and the Terms of Service
@@ -1130,7 +1131,7 @@ such as this into a more useful format for data analysis using R.
1130
1131
### Using an API
1131
1132
1132
1133
Rather than posting a data file at a URL for you to download, many websites these days
1133
- provide an API \index{API} that must be accessed through a programming language like R. The benefit of this
1134
+ provide an API \index{API} that must be accessed through a programming language like R. The benefit of using an API
1134
1135
is that data owners have much more control over the data they provide to users. However, unlike
1135
1136
web scraping, there is no consistent way to access an API across websites. Every website typically
1136
1137
has its own API designed especially for its own use case. Therefore we will just provide one example
@@ -1146,7 +1147,7 @@ picture of the Rho-Ophiuchi cloud complex in Figure \@ref(fig:NASA-API-Rho-Ophiu
1146
1147
knitr::include_graphics("img/reading/NASA-API-Rho-Ophiuchi.png")
1147
1148
```
1148
1149
1149
- First, you will need to visit the [ NASA APIs page] ( https://api.nasa.gov/ ) and generate an API key.
1150
+ First, you will need to visit the [ NASA APIs page] ( https://api.nasa.gov/ ) and generate an API key (i.e., a password used to identify you when accessing the API) .
1150
1151
Note that a valid email address is required to
1151
1152
associate with the key. The signup form looks something like Figure \@ ref(fig: NASA-API-signup ).
1152
1153
After filling out the basic information, you will receive the token via email.
@@ -1156,7 +1157,7 @@ Make sure to store the key in a safe place, and keep it private.
1156
1157
knitr::include_graphics("img/reading/NASA-API-signup.png")
1157
1158
```
1158
1159
1159
- ** Stop! Think about your API usage carefully!**
1160
+ ** Caution: think about your API usage carefully!**
1160
1161
1161
1162
When you access an API, you are initiating a transfer of data from a web server
1162
1163
to your computer. Web servers are expensive to run and do not have infinite resources.
@@ -1187,8 +1188,7 @@ API, we need to specify three things. First, we specify the URL *endpoint* of
1187
1188
the API, which is simply a URL that helps the remote server understand which
1188
1189
API you are trying to access. NASA offers a variety of APIs, each with its own
1189
1190
endpoint; in the case of the NASA "Astronomy Picture of the Day" API, the URL
1190
- endpoint is ` https://api.nasa.gov/planetary/apod ` , as shown at the top of
1191
- Figure \@ ref(fig: NASA-API-parameters ). Second, we write ` ? ` , which denotes that a
1191
+ endpoint is ` https://api.nasa.gov/planetary/apod ` . Second, we write ` ? ` , which denotes that a
1192
1192
list of * query parameters* will follow. And finally, we specify a list of
1193
1193
query parameters of the form ` parameter=value ` , separated by ` & ` characters. The NASA
1194
1194
"Astronomy Picture of the Day" API accepts the parameters shown in
@@ -1200,7 +1200,8 @@ knitr::include_graphics("img/reading/NASA-API-parameters.png")
1200
1200
1201
1201
So for example, to obtain the image of the day
1202
1202
from July 13, 2023, the API query would have two parameters: ` api_key=YOUR_API_KEY `
1203
- and ` date=2023-07-13 ` .
1203
+ and ` date=2023-07-13 ` . Remember to replace ` YOUR_API_KEY ` with the API key you
1204
+ received from NASA in your email! Putting it all together, the query will look like the following:
1204
1205
```
1205
1206
https://api.nasa.gov/planetary/apod?api_key=YOUR_API_KEY&date=2023-07-13
1206
1207
```
@@ -1233,7 +1234,7 @@ commas. For example, if you look closely, you'll see that the first entry is
1233
1234
` "date":"2023-07-13" ` , which indicates that we indeed successfully received
1234
1235
data corresponding to July 13, 2023.
1235
1236
1236
- So now the job is to do all of this programmatically in R. We will load
1237
+ So now our job is to do all of this programmatically in R. We will load
1237
1238
the ` httr2 ` package, and construct the query using the ` request ` function, which takes a single URL argument;
1238
1239
you will recognize the same query URL that we pasted into the browser earlier.
1239
1240
We will then send the query using the ` req_perform ` function, and finally
@@ -1245,9 +1246,9 @@ of the nasa_data object. But you can reproduce this using the DEMO_KEY key -->
1245
1246
library(httr2 )
1246
1247
1247
1248
req <- request(" https://api.nasa.gov/planetary/apod?api_key=YOUR_API_KEY&date=2023-07-13" )
1248
- response <- req_perform(req )
1249
- nasa_data <- resp_body_json(response )
1250
- nasa_data
1249
+ resp <- req_perform(req )
1250
+ nasa_data_single <- resp_body_json(resp )
1251
+ nasa_data_single
1251
1252
```
1252
1253
1253
1254
``` {r hidden_query, echo = FALSE, warning = FALSE, message = FALSE}
@@ -1263,12 +1264,13 @@ We can obtain more records at once by using the `start_date` and `end_date` para
1263
1264
shown in the table of parameters in \@ ref(fig: NASA-API-parameters ).
1264
1265
Let's obtain all the records between May 1, 2023, and July 13, 2023, and store the result
1265
1266
in an object called ` nasa_data ` ; now the response
1266
- will take the form of an R * list* (you'll learn more about these in Chapter \@ ref(wrangling)),
1267
- with one item similar to the above for each of the 74 days between the start and end dates:
1267
+ will take the form of an R * list* (you'll learn more about these in Chapter \@ ref(wrangling)).
1268
+ Each item in the list will correspond to a single day's record (just like the ` nasa_data_single ` object),
1269
+ and there will be 74 items total, one for each day between the start and end dates:
1268
1270
1269
1271
``` r
1270
1272
req <- request(" https://api.nasa.gov/planetary/apod?api_key=YOUR_API_KEY&start_date=2023-05-01&end_date=2023-07-13" )
1271
- response <- req_perform(req )
1273
+ resp <- req_perform(req )
1272
1274
nasa_data <- resp_body_json(response )
1273
1275
length(nasa_data )
1274
1276
```
0 commit comments