Skip to content

Commit dc312b6

Browse files
authored
fix: re-order JSON and CSV in Python lessons (#1658)
When working on #1584 I realized it'd be better if the lesson started with JSON and continued with CSV, not the other way. In Python it doesn't matter and in JavaScript it's easier to start with JSON, which is built-in, and only then move to CSV, which requires an additional library. So for the sake of having both lessons aligned, I want to change the order in the Python lesson, too. So most of the diff is just the two sections reversed, and the two exercises reversed. I made only a few additional changes to the wording.
1 parent 8171c38 commit dc312b6

File tree

7 files changed

+144
-136
lines changed

7 files changed

+144
-136
lines changed

sources/academy/webscraping/scraping_basics_javascript2/09_getting_links.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -35,8 +35,8 @@ Over the course of the previous lessons, the code of our program grew to almost
3535
import httpx
3636
from bs4 import BeautifulSoup
3737
from decimal import Decimal
38-
import csv
3938
import json
39+
import csv
4040

4141
url = "https://warehouse-theme-metal.myshopify.com/collections/sales"
4242
response = httpx.get(url)
@@ -153,8 +153,8 @@ Now let's put it all together:
153153
import httpx
154154
from bs4 import BeautifulSoup
155155
from decimal import Decimal
156-
import csv
157156
import json
157+
import csv
158158

159159
def download(url):
160160
response = httpx.get(url)
@@ -279,8 +279,8 @@ Browsers reading the HTML know the base address and automatically resolve such l
279279
import httpx
280280
from bs4 import BeautifulSoup
281281
from decimal import Decimal
282-
import csv
283282
import json
283+
import csv
284284
# highlight-next-line
285285
from urllib.parse import urljoin
286286
```

sources/academy/webscraping/scraping_basics_javascript2/10_crawling.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -20,8 +20,8 @@ Thanks to the refactoring, we have functions ready for each of the tasks, so we
2020
import httpx
2121
from bs4 import BeautifulSoup
2222
from decimal import Decimal
23-
import csv
2423
import json
24+
import csv
2525
from urllib.parse import urljoin
2626

2727
def download(url):

sources/academy/webscraping/scraping_basics_javascript2/11_scraping_variants.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -193,8 +193,8 @@ Now, if we use our new function, we should finally get a program that can scrape
193193
import httpx
194194
from bs4 import BeautifulSoup
195195
from decimal import Decimal
196-
import csv
197196
import json
197+
import csv
198198
from urllib.parse import urljoin
199199

200200
def download(url):

sources/academy/webscraping/scraping_basics_python/08_saving_data.md

Lines changed: 81 additions & 73 deletions
Original file line numberDiff line numberDiff line change
@@ -78,83 +78,28 @@ If you find the complex data structures printed by `print()` difficult to read,
7878

7979
:::
8080

81-
## Saving data as CSV
82-
83-
The CSV format is popular among data analysts because a wide range of tools can import it, including spreadsheets apps like LibreOffice Calc, Microsoft Excel, Apple Numbers, and Google Sheets.
84-
85-
In Python, it's convenient to read and write CSV files, thanks to the [`csv`](https://docs.python.org/3/library/csv.html) standard library module. First let's try something small in the Python's interactive REPL to familiarize ourselves with the basic usage:
86-
87-
```py
88-
>>> import csv
89-
>>> with open("data.csv", "w") as file:
90-
... writer = csv.DictWriter(file, fieldnames=["name", "age", "hobbies"])
91-
... writer.writeheader()
92-
... writer.writerow({"name": "Alice", "age": 24, "hobbies": "kickbox, Python"})
93-
... writer.writerow({"name": "Bob", "age": 42, "hobbies": "reading, TypeScript"})
94-
...
95-
```
96-
97-
We first opened a new file for writing and created a `DictWriter()` instance with the expected field names. We instructed it to write the header row first and then added two more rows containing actual data. The code produced a `data.csv` file in the same directory where we're running the REPL. It has the following contents:
98-
99-
```csv title=data.csv
100-
name,age,hobbies
101-
Alice,24,"kickbox, Python"
102-
Bob,42,"reading, TypeScript"
103-
```
104-
105-
In the CSV format, if values contain commas, we should enclose them in quotes. You can see that the writer automatically handled this.
106-
107-
When browsing the directory on macOS, we can see a nice preview of the file's contents, which proves that the file is correct and that other programs can read it as well. If you're using a different operating system, try opening the file with any spreadsheet program you have.
108-
109-
![CSV example preview](images/csv-example.png)
110-
111-
Now that's nice, but we didn't want Alice, Bob, kickbox, or TypeScript. What we actually want is a CSV containing `Sony XBR-950G BRAVIA 4K HDR Ultra HD TV`, right? Let's do this! First, let's add `csv` to our imports:
112-
113-
```py
114-
import httpx
115-
from bs4 import BeautifulSoup
116-
from decimal import Decimal
117-
# highlight-next-line
118-
import csv
119-
```
120-
121-
Next, instead of printing the data, we'll finish the program by exporting it to CSV. Replace `print(data)` with the following:
122-
123-
```py
124-
with open("products.csv", "w") as file:
125-
writer = csv.DictWriter(file, fieldnames=["title", "min_price", "price"])
126-
writer.writeheader()
127-
for row in data:
128-
writer.writerow(row)
129-
```
130-
131-
If we run our scraper now, it won't display any output, but it will create a `products.csv` file in the current working directory, which contains all the data about the listed products.
132-
133-
![CSV preview](images/csv.png)
134-
13581
## Saving data as JSON
13682

13783
The JSON format is popular primarily among developers. We use it for storing data, configuration files, or as a way to transfer data between programs (e.g., APIs). Its origin stems from the syntax of objects in the JavaScript programming language, which is similar to the syntax of Python dictionaries.
13884

139-
In Python, there's a [`json`](https://docs.python.org/3/library/json.html) standard library module, which is so straightforward that we can start using it in our code right away. We'll need to begin with imports:
85+
In Python, we can read and write JSON using the [`json`](https://docs.python.org/3/library/json.html) standard library module. We'll begin with imports:
14086

14187
```py
14288
import httpx
14389
from bs4 import BeautifulSoup
14490
from decimal import Decimal
145-
import csv
14691
# highlight-next-line
14792
import json
14893
```
14994

150-
Next, let’s append one more export to end of the source code of our scraper:
95+
Next, instead of printing the data, we'll finish the program by exporting it to JSON. Let's replace the line `print(data)` with the following:
15196

15297
```py
15398
with open("products.json", "w") as file:
15499
json.dump(data, file)
155100
```
156101

157-
Thats it! If we run the program now, it should also create a `products.json` file in the current working directory:
102+
That's it! If we run the program now, it should also create a `products.json` file in the current working directory:
158103

159104
```text
160105
$ python main.py
@@ -176,7 +121,7 @@ with open("products.json", "w") as file:
176121
json.dump(data, file, default=serialize)
177122
```
178123

179-
Now the program should work as expected, producing a JSON file with the following content:
124+
If we run our scraper now, it won't display any output, but it will create a `products.json` file in the current working directory, which contains all the data about the listed products:
180125

181126
<!-- eslint-skip -->
182127
```json title=products.json
@@ -197,30 +142,76 @@ Also, if your data contains non-English characters, set `ensure_ascii=False`. By
197142

198143
:::
199144

200-
We've built a Python application that downloads a product listing, parses the data, and saves it in a structured format for further use. But the data still has gaps: for some products, we only have the min price, not the actual prices. In the next lesson, we'll attempt to scrape more details from all the product pages.
145+
## Saving data as CSV
201146

202-
---
147+
The CSV format is popular among data analysts because a wide range of tools can import it, including spreadsheets apps like LibreOffice Calc, Microsoft Excel, Apple Numbers, and Google Sheets.
203148

204-
## Exercises
149+
In Python, we can read and write CSV using the [`csv`](https://docs.python.org/3/library/csv.html) standard library module. First let's try something small in the Python's interactive REPL to familiarize ourselves with the basic usage:
205150

206-
In this lesson, you learned how to create export files in two formats. The following challenges are designed to help you empathize with the people who'd be working with them.
151+
```py
152+
>>> import csv
153+
>>> with open("data.csv", "w") as file:
154+
... writer = csv.DictWriter(file, fieldnames=["name", "age", "hobbies"])
155+
... writer.writeheader()
156+
... writer.writerow({"name": "Alice", "age": 24, "hobbies": "kickbox, Python"})
157+
... writer.writerow({"name": "Bob", "age": 42, "hobbies": "reading, TypeScript"})
158+
...
159+
```
207160

208-
### Process your CSV
161+
We first opened a new file for writing and created a `DictWriter()` instance with the expected field names. We instructed it to write the header row first and then added two more rows containing actual data. The code produced a `data.csv` file in the same directory where we're running the REPL. It has the following contents:
209162

210-
Open the `products.csv` file in a spreadsheet app. Use the app to find all products with a min price greater than $500.
163+
```csv title=data.csv
164+
name,age,hobbies
165+
Alice,24,"kickbox, Python"
166+
Bob,42,"reading, TypeScript"
167+
```
211168

212-
<details>
213-
<summary>Solution</summary>
169+
In the CSV format, if a value contains commas, we should enclose it in quotes. When we open the file in a text editor of our choice, we can see that the writer automatically handled this.
214170

215-
Let's use [Google Sheets](https://www.google.com/sheets/about/), which is free to use. After logging in with a Google account:
171+
When browsing the directory on macOS, we can see a nice preview of the file's contents, which proves that the file is correct and that other programs can read it. If you're using a different operating system, try opening the file with any spreadsheet program you have.
216172

217-
1. Go to **File > Import**, choose **Upload**, and select the file. Import the data using the default settings. You should see a table with all the data.
218-
2. Select the header row. Go to **Data > Create filter**.
219-
3. Use the filter icon that appears next to `min_price`. Choose **Filter by condition**, select **Greater than**, and enter **500** in the text field. Confirm the dialog. You should see only the filtered data.
173+
![CSV example preview](images/csv-example.png)
220174

221-
![CSV in Google Sheets](images/csv-sheets.png)
175+
Now that's nice, but we didn't want Alice, Bob, kickbox, or TypeScript. What we actually want is a CSV containing `Sony XBR-950G BRAVIA 4K HDR Ultra HD TV`, right? Let's do this! First, let's add `csv` to our imports:
222176

223-
</details>
177+
```py
178+
import httpx
179+
from bs4 import BeautifulSoup
180+
from decimal import Decimal
181+
import json
182+
# highlight-next-line
183+
import csv
184+
```
185+
186+
Next, let's add one more data export to end of the source code of our scraper:
187+
188+
```py
189+
def serialize(obj):
190+
if isinstance(obj, Decimal):
191+
return str(obj)
192+
raise TypeError("Object not JSON serializable")
193+
194+
with open("products.json", "w") as file:
195+
json.dump(data, file, default=serialize)
196+
197+
with open("products.csv", "w") as file:
198+
writer = csv.DictWriter(file, fieldnames=["title", "min_price", "price"])
199+
writer.writeheader()
200+
for row in data:
201+
writer.writerow(row)
202+
```
203+
204+
The program should now also produce a CSV file with the following content:
205+
206+
![CSV preview](images/csv.png)
207+
208+
We've built a Python application that downloads a product listing, parses the data, and saves it in a structured format for further use. But the data still has gaps: for some products, we only have the min price, not the actual prices. In the next lesson, we'll attempt to scrape more details from all the product pages.
209+
210+
---
211+
212+
## Exercises
213+
214+
In this lesson, we created export files in two formats. The following challenges are designed to help you empathize with the people who'd be working with them.
224215

225216
### Process your JSON
226217

@@ -243,3 +234,20 @@ Write a new Python program that reads `products.json`, finds all products with a
243234
```
244235

245236
</details>
237+
238+
### Process your CSV
239+
240+
Open the `products.csv` file we created in the lesson using a spreadsheet application. Then, in the app, find all products with a min price greater than $500.
241+
242+
<details>
243+
<summary>Solution</summary>
244+
245+
Let's use [Google Sheets](https://www.google.com/sheets/about/), which is free to use. After logging in with a Google account:
246+
247+
1. Go to **File > Import**, choose **Upload**, and select the file. Import the data using the default settings. You should see a table with all the data.
248+
2. Select the header row. Go to **Data > Create filter**.
249+
3. Use the filter icon that appears next to `min_price`. Choose **Filter by condition**, select **Greater than**, and enter **500** in the text field. Confirm the dialog. You should see only the filtered data.
250+
251+
![CSV in Google Sheets](images/csv-sheets.png)
252+
253+
</details>

0 commit comments

Comments
 (0)