@@ -109,14 +109,16 @@ So in this case, `happiness_report.csv` would be reached by starting at the root
109
109
then the ` dsci-100 ` folder, then the ` project3 ` folder, and then finally the ` data ` folder. So its absolute
110
110
path would be ` /home/dsci-100/project3/data/happiness_report.csv ` . We can load the file using its absolute path
111
111
as a string passed to the ` read_csv ` function from ` pandas ` .
112
- ``` python
112
+ ``` {code-cell} ipython3
113
+ :tags: ["remove-output"]
113
114
happy_data = pd.read_csv("/home/dsci-100/project3/data/happiness_report.csv")
114
115
```
115
116
If we instead wanted to use a relative path, we would need to list out the sequence of steps needed to get from our current
116
117
working directory to the file, with slashes ` / ` separating each step. Since we are currently in the ` project3 ` folder,
117
118
we just need to enter the ` data ` folder to reach our desired file. Hence the relative path is ` data/happiness_report.csv ` ,
118
119
and we can load the file using its relative path as a string passed to ` read_csv ` .
119
- ``` python
120
+ ``` {code-cell} ipython3
121
+ :tags: ["remove-output"]
120
122
happy_data = pd.read_csv("data/happiness_report.csv")
121
123
```
122
124
Note that there is no forward slash at the beginning of a relative path; if we accidentally typed ` "/data/happiness_report.csv" ` ,
@@ -147,13 +149,13 @@ all of the folders between the computer's root, represented by `/`, and the file
147
149
across different computers. For example, suppose Fatima and Jayden are working on a
148
150
project together on the ` happiness_report.csv ` data. Fatima's file is stored at
149
151
150
- ```
152
+ ``` text
151
153
/home/Fatima/project3/data/happiness_report.csv
152
154
```
153
155
154
156
while Jayden's is stored at
155
157
156
- ```
158
+ ``` text
157
159
/home/Jayden/project3/data/happiness_report.csv
158
160
```
159
161
@@ -275,11 +277,13 @@ With this extra information being present at the top of the file, using
275
277
into Python. In the case of this file, Python just prints a ` ParserError `
276
278
message, indicating that it wasn't able to read the file.
277
279
278
- ``` python
280
+ ``` {code-cell} ipython3
281
+ :tags: ["remove-output"]
279
282
canlang_data = pd.read_csv("data/can_lang_meta-data.csv")
280
283
```
281
- ``` text
282
- ParserError: Error tokenizing data. C error: Expected 1 fields in line 4, saw 6
284
+ ``` {code-cell} ipython3
285
+ :tags: ["remove-input"]
286
+ print("ParserError: Error tokenizing data. C error: Expected 1 fields in line 4, saw 6")
283
287
```
284
288
285
289
``` {index} ParserError
@@ -841,7 +845,8 @@ be able to connect to a database using this information.
841
845
``` {index} ibis; postgres, ibis; connect
842
846
```
843
847
844
- ``` python
848
+ ``` {code-cell} ipython3
849
+ :tags: ["remove-output"]
845
850
conn = ibis.postgres.connect(
846
851
database="can_mov_db",
847
852
host="fakeserver.stat.ubc.ca",
@@ -859,12 +864,14 @@ connecting to and working with an SQLite database. For example, we can again use
859
864
``` {index} ibis; list_tables
860
865
```
861
866
862
- ``` python
867
+ ``` {code-cell} ipython3
868
+ :tags: ["remove-output"]
863
869
conn.list_tables()
864
870
```
865
871
866
- ``` text
867
- ["themes", "medium", "titles", "title_aliases", "forms", "episodes", "names", "names_occupations", "occupation", "ratings"]
872
+ ``` {code-cell} ipython3
873
+ :tags: ["remove-input"]
874
+ print('["themes", "medium", "titles", "title_aliases", "forms", "episodes", "names", "names_occupations", "occupation", "ratings"]')
868
875
```
869
876
870
877
We see that there are 10 tables in this database. Let's first look at the
@@ -874,16 +881,20 @@ database.
874
881
``` {index} ibis; table
875
882
```
876
883
877
- ``` python
884
+ ``` {code-cell} ipython3
885
+ :tags: ["remove-output"]
878
886
ratings_table = conn.table("ratings")
879
887
ratings_table
880
888
```
881
889
882
- ``` text
890
+ ``` {code-cell} ipython3
891
+ :tags: ["remove-input"]
892
+ print("""
883
893
AlchemyTable: ratings
884
894
title string
885
895
average_rating float64
886
896
num_votes int64
897
+ """)
887
898
```
888
899
889
900
``` {index} ibis; []
@@ -892,12 +903,15 @@ AlchemyTable: ratings
892
903
To find the lowest rating that exists in the data base, we first need to
893
904
select the ` average_rating ` column:
894
905
895
- ``` python
906
+ ``` {code-cell} ipython3
907
+ :tags: ["remove-output"]
896
908
avg_rating = ratings_table[["average_rating"]]
897
909
avg_rating
898
910
```
899
911
900
- ``` text
912
+ ``` {code-cell} ipython3
913
+ :tags: ["remove-input"]
914
+ print("""
901
915
r0 := AlchemyTable: ratings
902
916
title string
903
917
average_rating float64
@@ -906,6 +920,7 @@ r0 := AlchemyTable: ratings
906
920
Selection[r0]
907
921
selections:
908
922
average_rating: r0.average_rating
923
+ """)
909
924
```
910
925
911
926
``` {index} database; ordering, ibis; order_by, ibis; head
@@ -914,7 +929,8 @@ Selection[r0]
914
929
Next we use the ` order_by ` function from ` ibis ` order the table by ` average_rating ` ,
915
930
and then the ` head ` function to select the first row (i.e., the lowest score).
916
931
917
- ``` python
932
+ ``` {code-cell} ipython3
933
+ :tags: ["remove-output"]
918
934
lowest = avg_rating.order_by("average_rating").head(1)
919
935
lowest.execute()
920
936
```
@@ -925,7 +941,6 @@ lowest = pd.DataFrame({"average_rating" : [1.0]})
925
941
lowest
926
942
```
927
943
928
-
929
944
We see the lowest rating given to a movie is 1, indicating that it must have
930
945
been a really bad movie...
931
946
@@ -1250,7 +1265,8 @@ page we want to scrape by providing its URL in quotations to the `requests.get`
1250
1265
function. This function obtains the raw HTML of the page, which we then
1251
1266
pass to the ` BeautifulSoup ` function for parsing:
1252
1267
1253
- ``` python
1268
+ ``` {code-cell} ipython3
1269
+ :tags: ["remove-output"]
1254
1270
import requests
1255
1271
import bs4
1256
1272
@@ -1338,7 +1354,8 @@ below that `read_html` found 17 tables on the Wikipedia page for Canada.
1338
1354
``` {index} read function; read_html
1339
1355
```
1340
1356
1341
- ``` python
1357
+ ``` {code-cell} ipython3
1358
+ :tags: ["remove-output"]
1342
1359
canada_wiki_tables = pd.read_html("https://en.wikipedia.org/wiki/Canada")
1343
1360
len(canada_wiki_tables)
1344
1361
```
@@ -1514,7 +1531,8 @@ response using the `json` method.
1514
1531
1515
1532
<!-- we have disabled the below code for reproducibility, with hidden setting
1516
1533
of the nasa_data object. But you can reproduce this using the DEMO_KEY key -->
1517
- ``` python
1534
+ ``` {code-cell} ipython3
1535
+ :tags: ["remove-output"]
1518
1536
import requests
1519
1537
1520
1538
nasa_data_single = requests.get(
@@ -1539,7 +1557,8 @@ in an object called `nasa_data`; now the response
1539
1557
will take the form of a Python list. Each item in the list will correspond to a single day's record (just like the ` nasa_data_single ` object),
1540
1558
and there will be 74 items total, one for each day between the start and end dates:
1541
1559
1542
- ``` python
1560
+ ``` {code-cell} ipython3
1561
+ :tags: ["remove-output"]
1543
1562
nasa_data = requests.get(
1544
1563
"https://api.nasa.gov/planetary/apod?api_key=YOUR_API_KEY&start_date=2023-05-01&end_date=2023-07-13"
1545
1564
).json()
@@ -1548,6 +1567,10 @@ len(nasa_data)
1548
1567
1549
1568
``` {code-cell} ipython3
1550
1569
:tags: [remove-input]
1570
+ # need to secretly re-load the nasa data again because the above running code destroys it
1571
+ # see PR 341 for why we need to do things this way (essentially due to PDF build)
1572
+ with open("data/nasa.json", "r") as f:
1573
+ nasa_data = json.load(f)
1551
1574
len(nasa_data)
1552
1575
```
1553
1576
0 commit comments