Skip to content

Commit d7b426d

Browse files
api section rough version done
1 parent 203871e commit d7b426d

File tree

2 files changed

+78
-72
lines changed

2 files changed

+78
-72
lines changed

source/data/nasa.json

Lines changed: 1 addition & 0 deletions
Large diffs are not rendered by default.

source/reading.md

Lines changed: 77 additions & 72 deletions
Original file line numberDiff line numberDiff line change
@@ -1342,8 +1342,9 @@ its own API designed especially for its own use case. Therefore we will just
13421342
provide one example of accessing data through an API in this book, with the
13431343
hope that it gives you enough of a basic idea that you can learn how to use
13441344
another API if needed. In particular, in this book we will show you the basics
1345-
of how to use the `requests` package in Python to request the NASA "Astronomy Picture
1346-
of the Day" for July 13, 2023 from its API.
1345+
of how to use the `requests` package in Python to access data from the NASA "Astronomy Picture
1346+
of the Day" API (a great source of desktop backgrounds, by the way—take a look at the stunning
1347+
picture of the Rho-Ophiuchi cloud complex in {numref}`fig:NASA-API-Rho-Ophiuchi` from July 13, 2023!).
13471348

13481349
```{index} API; requests, NASA, API; token; key
13491350
```
@@ -1427,107 +1428,111 @@ https://api.nasa.gov/planetary/apod?api_key=YOUR_API_KEY&date=2023-07-13
14271428
```
14281429
If you try putting this URL into your web browser, you'll actually find that the server
14291430
responds to your request with some text:
1430-
```
1431-
{"date":"2023-07-13","explanation":"A mere 390 light-years away, Sun-like stars and future planetary systems are forming in the Rho Ophiuchi molecular cloud complex, the closest star-forming region to our fair planet. The James Webb Space Telescope's NIRCam peered into the nearby natal chaos to capture this infrared image at an inspiring scale. The spectacular cosmic snapshot was released to celebrate the successful first year of Webb's exploration of the Universe. The frame spans less than a light-year across the Rho Ophiuchi region and contains about 50 young stars. Brighter stars clearly sport Webb's characteristic pattern of diffraction spikes. Huge jets of shocked molecular hydrogen blasting from newborn stars are red in the image, with the large, yellowish dusty cavity carved out by the energetic young star near its center. Near some stars in the stunning image are shadows cast by their protoplanetary disks.","hdurl":"https://apod.nasa.gov/apod/image/2307/STScI-01_RhoOph.png","media_type":"image","service_version":"v1","title":"Webb's Rho Ophiuchi","url":"https://apod.nasa.gov/apod/image/2307/STScI-01_RhoOph1024.png"}
1432-
```
14331431

1434-
Woah! What a mess! There is definitely some data there, but it's a bit hard to
1432+
```json
1433+
{"date":"2023-07-13","explanation":"A mere 390 light-years away, Sun-like stars
1434+
and future planetary systems are forming in the Rho Ophiuchi molecular cloud
1435+
complex, the closest star-forming region to our fair planet. The James Webb
1436+
Space Telescope's NIRCam peered into the nearby natal chaos to capture this
1437+
infrared image at an inspiring scale. The spectacular cosmic snapshot was
1438+
released to celebrate the successful first year of Webb's exploration of the
1439+
Universe. The frame spans less than a light-year across the Rho Ophiuchi region
1440+
and contains about 50 young stars. Brighter stars clearly sport Webb's
1441+
characteristic pattern of diffraction spikes. Huge jets of shocked molecular
1442+
hydrogen blasting from newborn stars are red in the image, with the large,
1443+
yellowish dusty cavity carved out by the energetic young star near its center.
1444+
Near some stars in the stunning image are shadows cast by their protoplanetary
1445+
disks.","hdurl":"https://apod.nasa.gov/apod/image/2307/STScI-01_RhoOph.png",
1446+
"media_type":"image","service_version":"v1","title":"Webb's
1447+
Rho Ophiuchi","url":"https://apod.nasa.gov/apod/image/2307/STScI-01_RhoOph1024.png"}
1448+
```
1449+
1450+
Neat! There is definitely some data there, but it's a bit hard to
14351451
see what it all is. As it turns out, this is a common format for data called
1436-
*JSON* (JavaScript Object Notation). You can interpret this data just like
1452+
*JSON* (JavaScript Object Notation). We won't encounter this kind of data much in this book,
1453+
but for now you can interpret this data just like
14371454
you'd interpret a Python dictionary: these are `key : value` pairs separated by
14381455
commas. For example, if you look closely, you'll see that the first entry is
14391456
`"date":"2023-07-13"`, which indicates that we indeed successfully received
14401457
data corresponding to July 13, 2023.
14411458

14421459
So now the job is to do all of this programmatically in Python. We will load
14431460
the `requests` package, and make the query using the `get` function, which takes a single URL argument;
1444-
you will recognize the same query URL that we pasted into the browser earlier below.
1445-
We will then name the response object `nasa_response`, and obtain a JSON representation using the `json` method.
1446-
Finally, we use the `dumps` method from the `json` package to obtain the response as a nicely-formatted string object.
1461+
you will recognize the same query URL that we pasted into the browser earlier.
1462+
We will then name the response object `nasa_data`, and obtain a JSON representation of the
1463+
response using the `json` method.
14471464

1448-
```{code-cell} ipython3
1449-
:tags: [remove-output]
1450-
import requests, json
1465+
<!-- we have disabled the below code for reproducibility, with hidden setting
1466+
of the nasa_data object. But you can reproduce this using the DEMO_KEY key -->
1467+
```python
1468+
import requests
14511469

1452-
nasa_response = requests.get("https://api.nasa.gov/planetary/apod?api_key=YOUR_API_KEY&date=2023-07-13")
1453-
json.dumps(nasa_response.json(), indent=4, sort_keys=True)
1470+
requests.get(
1471+
"https://api.nasa.gov/planetary/apod?api_key=YOUR_API_KEY&date=2023-07-13"
1472+
).json()
14541473
```
14551474

1456-
Let's take a look at the first 3 most recent tweets of [@scikit_learn](https://twitter.com/scikit_learn) through accessing the attributes of tweet data dictionary:
1457-
14581475
```{code-cell} ipython3
1459-
:tags: [remove-output]
1460-
1461-
for info in scikit_learn_tweets[:3]:
1462-
print("ID: {}".format(info.id))
1463-
print(info.created_at)
1464-
print(info.full_text)
1465-
print("\n")
1466-
```
1467-
1476+
:tags: [remove-input]
1477+
import json
1478+
with open("data/nasa.json", "r") as f:
1479+
nasa_data = json.load(f)
1480+
# the last entry in the stored data is July 13, 2023, so print that
1481+
nasa_data[-1]
14681482
```
1469-
ID: 1555686128971403265
1470-
2022-08-05 22:44:11+00:00
1471-
scikit-learn 1.1.2 is out on https://t.co/lSpi4eDc2t and conda-forge!
1472-
1473-
This is a small maintenance release that fixes a couple of regressions:
1474-
https://t.co/Oa84ES0qpG
14751483

1484+
We can obtain more records at once by using the `start_date` and `end_date` parameters.
1485+
Let's obtain all the records between May 1, 2023, and July 13, 2023, and store the result
1486+
in an object called `nasa_data`; now the response
1487+
will take the form of a Python list, with one dictionary item similar to the above
1488+
for each of the 74 days between the start and end dates:
14761489

1477-
ID: 1549321048943988737
1478-
2022-07-19 09:11:37+00:00
1479-
RT @MarenWestermann: @scikit_learn It is worth highlighting that this scikit-learn sprint is seeing the highest participation of women out…
1480-
1481-
1482-
ID: 1548339716465930244
1483-
2022-07-16 16:12:09+00:00
1484-
@StefanieMolin @theBodlina @RichardKlima We continue pulling requests here in Dublin. Putting some Made in Ireland code in the scikit-learn codebase 🇮🇪 . Current stats: 18 PRs opened, 12 merged 🚀 https://t.co/ccWy8vh8YI
1490+
```python
1491+
nasa_data = requests.get(
1492+
"https://api.nasa.gov/planetary/apod?api_key=YOUR_API_KEY&start_date=2023-05-01&end_date=2023-07-13"
1493+
).json()
1494+
len(nasa_data)
14851495
```
14861496

1487-
+++
1488-
1489-
A full list of available attributes provided by Twitter API can be found [here](https://developer.twitter.com/en/docs/twitter-api/v1/data-dictionary/object-model/tweet).
1490-
1491-
+++
1497+
```{code-cell} ipython3
1498+
:tags: [remove-input]
1499+
len(nasa_data)
1500+
```
14921501

1493-
For the demonstration purpose, let's only use a
1502+
For further data processing using the techniques in this book, you'll need to turn this list of dictionaries
1503+
into a `pandas` data frame.
1504+
these items For the demonstration purpose, let's only use a
14941505
few variables of interest: `created_at`, `user.screen_name`, `retweeted`,
14951506
and `full_text`, and construct a `pandas` DataFrame using the extracted information.
14961507

14971508
```{code-cell} ipython3
1498-
:tags: [remove-output]
1499-
1500-
columns = ["time", "user", "is_retweet", "text"]
1501-
data = []
1502-
for tweet in scikit_learn_tweets:
1503-
data.append(
1504-
[tweet.created_at, tweet.user.screen_name, tweet.retweeted, tweet.full_text]
1505-
)
1506-
1507-
scikit_learn_tweets_df = pd.DataFrame(data, columns=columns)
1508-
scikit_learn_tweets_df
1509-
```
1509+
data_dict = {
1510+
"date":[],
1511+
"title": [],
1512+
"copyright" : [],
1513+
"url": []
1514+
}
15101515
1511-
```{code-cell} ipython3
1512-
:tags: [remove-input]
1516+
for item in nasa_data:
1517+
data_dict["copyright"].append(item["copyright"] if "copyright" in item else None)
1518+
for entry in ["url", "title", "date"]:
1519+
data_dict[entry].append(item[entry])
15131520
1514-
scikit_learn_tweets_df = pd.read_csv("data/reading_api_df.csv", index_col=0)
1515-
scikit_learn_tweets_df
1521+
nasa_df = pd.DataFrame(data_dict)
1522+
nasa_df
15161523
```
15171524

1518-
If you look back up at the image of the [@scikit_learn](https://twitter.com/scikit_learn) Twitter page, you will
1519-
recognize the text of the most recent few tweets in the above data frame. In
1520-
other words, we have successfully created a small data set using the Twitter
1521-
API&mdash;neat! This data is also quite different from what we obtained from web scraping;
1522-
the extracted information can be easily converted into a `pandas` data frame (although not *every* API will provide data in such a nice format).
1523-
From this point onward, the `scikit_learn_tweets_df` data frame is stored on your
1525+
Success! We have created a small data set using the NASA
1526+
API! This data is also quite different from what we obtained from web scraping;
1527+
the extracted information can be easily converted into a `pandas` data frame
1528+
(although not *every* API will provide data in such a nice format).
1529+
From this point onward, the `nasa_df` data frame is stored on your
15241530
machine, and you can play with it to your heart's content. For example, you can use
15251531
`pandas.to_csv` to save it to a file and `pandas.read_csv` to read it into Python again later;
15261532
and after reading the next few chapters you will have the skills to
1527-
compute the percentage of retweets versus tweets, find the most oft-retweeted
1528-
account, make visualizations of the data, and much more! If you decide that you want
1529-
to ask the Twitter API for more data
1530-
(see [the `tweepy` page](https://github.com/tweepy/tweepy)
1533+
do even more interesting things. If you decide that you want
1534+
to ask any of the various NASA APIs for more data
1535+
(see [the list of awesome NASA APIS here](https://api.nasa.gov/)
15311536
for more examples of what is possible), just be mindful as usual about how much
15321537
data you are requesting and how frequently you are making requests.
15331538

0 commit comments

Comments
 (0)